Compare commits

...

8 Commits

Author SHA1 Message Date
teknium
d999d9876d Enhance async tool execution and error handling in Hermes agent for Atropos integration
- Updated `.gitignore` to exclude `testlogs` directory.
- Refactored `handle_web_function_call` in `model_tools.py` to support running async functions in existing event loops, improving compatibility with Atropos.
- Introduced a thread pool executor in `agent_loop.py` for running synchronous tool calls that internally use `asyncio.run()`, preventing deadlocks.
- Added `ToolError` class to track tool execution errors, enhancing error reporting during agent loops.
- Updated `wandb_log` method in `hermes_base_env.py` to log tool error statistics for better monitoring.
- Implemented patches in `patches.py` to ensure async-safe operation of tools within Atropos's event loop.
- Enhanced `ToolContext` and `terminal_tool.py` to utilize the new async handling, improving overall tool execution reliability.
2026-02-08 05:00:47 +00:00
teknium
a8809bbd3e Transition installation to uv for py version and speed to be easier to streamline
- Integrated `uv` as a fast Python package manager for automatic Python provisioning and dependency management.
- Updated installation scripts (`setup-hermes.sh`, `install.sh`, `install.ps1`) to utilize `uv` for installing Python and packages, streamlining the setup process.
- Revised `README.md` to reflect changes in installation steps, including symlinking `hermes` for global access and clarifying Python version requirements.
- Adjusted commands in `doctor.py` and other scripts to recommend `uv` for package installations, ensuring consistency across the project.
2026-02-07 23:54:53 +00:00
teknium
a478e44585 Increase max_token_length in TerminalTestEnv to 16000 for enhanced processing capacity 2026-02-07 21:11:07 +00:00
teknium
c0494b3558 Update pyproject.toml to refine dependency management
- Reorganized the 'all' dependencies to include specific optional groups for better modularity.
- Added support for 'hermes-agent' with distinct categories: modal, messaging, cron, cli, and dev.
2026-02-07 21:11:01 +00:00
teknium
07b615e96e Add support for Atropos Agentic RL environments (requires branch tool_call_support in Atropos atm)
- Added new environments for reinforcement learning, including `HermesSweEnv` for software engineering tasks and `TerminalTestEnv` for inline testing.
- Introduced `ToolContext` for unrestricted access to tools during reward computation.
- Updated `.gitignore` to exclude `wandb/` directory.
- Enhanced `README.md` with detailed architecture and usage instructions for Atropos environments.
- Added configuration files for SWE and terminal test environments to streamline setup.
- Removed unnecessary compiled Python files from `__pycache__`.
2026-02-07 09:17:16 +00:00
teknium
ac79725923 Update dependencies and enhance installation scripts
- Added `prompt_toolkit` as a direct dependency for interactive CLI support.
- Updated `modal` optional dependency to require `swe-rex[modal]>=1.4.0` for improved cloud execution capabilities.
- Enhanced `messaging` optional dependencies to include `aiohttp>=3.9.0` for WhatsApp bridge communication.
- Refined installation scripts to check for Python version requirements, emphasizing the need for Python 3.11+ for RL training tools.
- Improved setup scripts to ensure proper installation of submodules and dependencies, enhancing user experience during setup.
2026-02-07 00:05:04 +00:00
Teknium
8dd38318fc Merge pull request #15 from NousResearch/rl-capabilities
Rl capabilities && File Operator Tools
2026-02-05 03:50:42 -08:00
Teknium
8380895ae3 Update README.md 2026-02-04 00:35:45 -08:00
44 changed files with 4270 additions and 1344 deletions

4
.gitignore vendored
View File

@@ -39,6 +39,10 @@ agent-browser/
*.pem
privvy*
images/
__pycache__/
hermes_agent.egg-info/
wandb/
testlogs
# CLI config (may contain sensitive SSH paths)
cli-config.yaml

443
README.md
View File

@@ -15,11 +15,13 @@ irm https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/ins
```
The installer will:
- Clone to `~/.hermes-agent` (with submodules: mini-swe-agent, tinker-atropos)
- Create a virtual environment
- Install all dependencies
- Install [uv](https://docs.astral.sh/uv/) (fast Python package manager) if not present
- Install Python 3.11 via uv if not already available (no sudo needed)
- Clone to `~/.hermes/hermes-agent` (with submodules: mini-swe-agent, tinker-atropos)
- Create a virtual environment with Python 3.11
- Install all dependencies and submodule packages
- Symlink `hermes` into `~/.local/bin` so it works globally (no venv activation needed)
- Run the interactive setup wizard
- Add `hermes` to your PATH
After installation, reload your shell and run:
```bash
@@ -64,13 +66,13 @@ You need at least one LLM provider:
| Provider | Get Key | Env Variable |
|----------|---------|--------------|
| **OpenRouter** (recommended) | [openrouter.ai/keys](https://openrouter.ai/keys) | `OPENROUTER_API_KEY` |
| Anthropic | [console.anthropic.com](https://console.anthropic.com/) | `ANTHROPIC_API_KEY` |
| OpenAI | [platform.openai.com](https://platform.openai.com/api-keys) | `OPENAI_API_KEY` |
### Optional API Keys
| Feature | Provider | Env Variable |
|---------|----------|--------------|
| Custom OpenAI Endpoint (OAI or VLLM/SGLANG) | [platform.openai.com](https://platform.openai.com/api-keys) | `OPENAI_API_KEY` |
| Web scraping | [Firecrawl](https://firecrawl.dev/) | `FIRECRAWL_API_KEY` |
| Browser automation | [Browserbase](https://browserbase.com/) | `BROWSERBASE_API_KEY`, `BROWSERBASE_PROJECT_ID` |
| Image generation | [FAL](https://fal.ai/) | `FAL_KEY` |
@@ -179,8 +181,8 @@ hermes config set terminal.singularity_image ~/python.sif
**Modal** (serverless cloud):
```bash
pip install modal boto3
modal setup # Authenticate
uv pip install "swe-rex[modal]" # Installs swe-rex + modal + boto3
modal setup # Authenticate with Modal
hermes config set terminal.backend modal
```
@@ -275,16 +277,19 @@ See [docs/messaging.md](docs/messaging.md) for WhatsApp and advanced setup.
Train language models with reinforcement learning using the Tinker API and Atropos framework.
> **Note:** RL training tools require **Python 3.11+** (the upstream `tinker` package has this requirement). On Python 3.10, the RL toolset will be automatically disabled — all other features work fine.
#### Requirements
1. **API Keys:** Add to `~/.hermes/.env`:
1. **Python 3.11+** (check with `python3 --version`)
2. **API Keys:** Add to `~/.hermes/.env`:
```bash
TINKER_API_KEY=your-tinker-key # Get from https://tinker-console.thinkingmachines.ai/keys
WANDB_API_KEY=your-wandb-key # Get from https://wandb.ai/authorize
OPENROUTER_API_KEY=your-key # Optional: for rl_test_inference
```
2. **That's it!** tinker-atropos is included as a submodule - no separate installation needed.
3. **That's it!** tinker-atropos is included as a submodule — the installer handles it automatically.
#### Using RL Tools
@@ -320,6 +325,94 @@ For extended RL workflows with longer timeouts:
python rl_cli.py --model "anthropic/claude-sonnet-4-20250514"
```
### 🧪 Atropos RL Environments
Hermes-Agent integrates with the [Atropos](https://github.com/NousResearch/atropos) RL framework through a layered environment system. This allows training models with reinforcement learning on agentic tasks using hermes-agent's tools.
#### Architecture
The integration has three layers:
| Layer | File | Purpose |
|-------|------|---------|
| **Agent Loop** | `environments/agent_loop.py` | Reusable multi-turn tool-calling engine (standard OpenAI spec) |
| **Base Environment** | `environments/hermes_base_env.py` | Abstract Atropos `BaseEnv` subclass with toolset resolution, ToolContext, scoring |
| **Concrete Envs** | `environments/terminal_test_env.py`, `environments/hermes_swe_env.py` | Task-specific environments |
#### Two-Phase Operation
- **Phase 1 (OpenAI server type)**: Works with any OpenAI-compatible endpoint (VLLM, SGLang, OpenRouter, OpenAI API). The server handles tool call parsing natively. Good for **SFT data generation**, **verifier testing**, and **evaluation**.
- **Phase 2 (VLLM server type)**: Uses ManagedServer for exact token IDs + logprobs via `/generate`. Client-side tool call parser registry reconstructs structured `tool_calls` from raw output. Required for **full RL training**.
#### Quick Start
```bash
# 1. Launch VLLM with tool parser
vllm serve YourModel --tool-parser hermes
# 2. Start the Atropos API server
run-api
# 3. Run an environment
python environments/terminal_test_env.py serve \
--openai.base_url http://localhost:8000/v1 \
--openai.model_name YourModel \
--openai.server_type openai
```
#### ToolContext (Reward Functions)
Reward functions receive a `ToolContext` with unrestricted access to all hermes-agent tools, scoped to the rollout's sandbox:
```python
async def compute_reward(self, item, result, ctx: ToolContext) -> float:
# Run tests in the model's terminal sandbox
test = ctx.terminal("pytest -v")
if test["exit_code"] == 0:
return 1.0
# Or check a file, search the web, navigate a browser...
return 0.0
```
#### Creating Custom Environments
Subclass `HermesAgentBaseEnv` and implement 5 methods:
```python
from environments.hermes_base_env import HermesAgentBaseEnv
class MyEnv(HermesAgentBaseEnv):
name = "my-env"
async def setup(self): ... # Load data
async def get_next_item(self): ... # Return next item
def format_prompt(self, item): ... # Item -> prompt string
async def compute_reward(self, item, result, ctx): ... # Score with ToolContext
async def evaluate(self, *args, **kwargs): ... # Periodic eval
if __name__ == "__main__":
MyEnv.cli()
```
#### Toolset Distributions
Configure which tools are available per group, either explicitly or probabilistically:
```bash
# Explicit toolsets
--env.enabled_toolsets '["terminal","file","web"]'
# Probabilistic distribution (sampled per group)
--env.distribution development
```
#### Tool Call Parsers (Phase 2)
For VLLM server type, a parser registry extracts structured `tool_calls` from raw model output. Supported parsers: `hermes`, `mistral`, `llama3_json`, `qwen`, `deepseek_v3`, `deepseek_v3_1`, `kimi_k2`, `longcat`, `glm45`, `glm47`, `qwen3_coder`.
```bash
--env.tool_call_parser hermes # Match your VLLM --tool-parser flag
```
### ⏰ Scheduled Tasks (Cron)
Schedule tasks to run automatically:
@@ -425,26 +518,332 @@ skills/
## Manual Installation
If you prefer not to use the installer:
If you prefer full control over the installation process (or the quick-install script doesn't suit your environment), follow these steps to set everything up by hand.
### Prerequisites
| Requirement | Minimum Version | Check Command | Notes |
|-------------|----------------|---------------|-------|
| **Git** | Any recent | `git --version` | Required |
| **Node.js** | 18+ | `node --version` | Optional — needed for browser automation tools |
| **ripgrep** | Any | `rg --version` | Optional — faster file search in terminal tool (falls back to grep) |
> **Note:** Python and pip are **not** prerequisites. The installer uses [uv](https://docs.astral.sh/uv/) to provision Python 3.11 automatically (no sudo needed). If you already have Python 3.11+ installed, uv will use it.
<details>
<summary><strong>Installing prerequisites by platform</strong></summary>
**Ubuntu / Debian:**
```bash
sudo apt update && sudo apt install git
# Optional:
sudo apt install ripgrep nodejs npm
```
**macOS (Homebrew):**
```bash
brew install git
# Optional:
brew install ripgrep node
```
**Windows (WSL recommended):**
Use the [Windows Subsystem for Linux](https://learn.microsoft.com/en-us/windows/wsl/install) and follow the Ubuntu instructions above. Alternatively, use the PowerShell quick-install script at the top of this README.
</details>
---
### Step 1: Clone the Repository
Clone with `--recurse-submodules` to pull the required submodules ([mini-swe-agent](https://github.com/SWE-agent/mini-swe-agent) for the terminal tool backend and [tinker-atropos](https://github.com/nousresearch/tinker-atropos) for RL training):
```bash
# Clone the repository (with submodules)
git clone --recurse-submodules https://github.com/NousResearch/hermes-agent.git
cd hermes-agent
```
If you already cloned without `--recurse-submodules`, initialize them manually:
```bash
git submodule update --init --recursive
```
---
### Step 2: Install uv & Create Virtual Environment
[uv](https://docs.astral.sh/uv/) is a fast Python package manager that can also provision Python itself. Install it and create the venv in one go:
```bash
# Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Create venv with Python 3.11 (uv downloads it if not present — no sudo needed)
uv venv venv --python 3.11
```
> **Tip:** You do **not** need to activate the venv to use `hermes`. The entry point has a hardcoded shebang pointing to the venv Python, so it works globally once symlinked (see Step 8). For installing packages, uv can target the venv directly via `VIRTUAL_ENV`.
---
### Step 3: Install Python Dependencies
Install the main package in editable mode with all optional extras (messaging, cron, CLI menus, modal):
```bash
# Tell uv which venv to install into
export VIRTUAL_ENV="$(pwd)/venv"
# Install with all extras
uv pip install -e ".[all]"
```
If you only want the core agent (no Telegram/Discord/cron support):
```bash
uv pip install -e "."
```
<details>
<summary><strong>Optional extras breakdown</strong></summary>
| Extra | What it adds | Install command |
|-------|-------------|-----------------|
| `all` | Everything below | `uv pip install -e ".[all]"` |
| `messaging` | Telegram & Discord gateway | `uv pip install -e ".[messaging]"` |
| `cron` | Cron expression parsing for scheduled tasks | `uv pip install -e ".[cron]"` |
| `cli` | Terminal menu UI for setup wizard | `uv pip install -e ".[cli]"` |
| `modal` | Modal cloud execution backend (swe-rex + modal + boto3) | `uv pip install -e ".[modal]"` |
| `dev` | pytest & test utilities | `uv pip install -e ".[dev]"` |
You can combine extras: `uv pip install -e ".[messaging,cron]"`
</details>
---
### Step 4: Install Submodule Packages
These are local packages checked out as Git submodules. Install them in editable mode:
```bash
# Terminal tool backend (required for the terminal/command-execution tool)
uv pip install -e "./mini-swe-agent"
# RL training backend
uv pip install -e "./tinker-atropos"
```
Both are optional — if you skip them, the corresponding toolsets simply won't be available.
---
### Step 5: Install Node.js Dependencies (Optional)
Only needed if you plan to use the **browser automation** toolset (Browserbase-powered):
```bash
npm install
```
This installs the `agent-browser` package defined in `package.json`. Skip this step if you don't need browser tools.
---
### Step 6: Create the Configuration Directory
Hermes stores all user configuration in `~/.hermes/`:
```bash
# Create the directory structure
mkdir -p ~/.hermes/{cron,sessions,logs}
# Copy the example config file
cp cli-config.yaml.example ~/.hermes/config.yaml
# Create an empty .env file for API keys
touch ~/.hermes/.env
```
Your `~/.hermes/` directory should now look like:
```
~/.hermes/
├── config.yaml # Agent settings (model, terminal, toolsets, compression, etc.)
├── .env # API keys and secrets (one per line: KEY=value)
├── cron/ # Scheduled job data
├── sessions/ # Messaging gateway sessions
└── logs/ # Conversation logs
```
---
### Step 7: Add Your API Keys
Open `~/.hermes/.env` in your editor and add at minimum an LLM provider key:
```bash
# Required — at least one LLM provider:
OPENROUTER_API_KEY=sk-or-v1-your-key-here
# Optional — enable additional tools:
FIRECRAWL_API_KEY=fc-your-key # Web search & scraping
BROWSERBASE_API_KEY=bb-your-key # Browser automation
BROWSERBASE_PROJECT_ID=your-project-id # Browser automation
FAL_KEY=your-fal-key # Image generation (FLUX)
TINKER_API_KEY=your-tinker-key # RL training
WANDB_API_KEY=your-wandb-key # RL training metrics
# Optional — messaging gateway:
TELEGRAM_BOT_TOKEN=123456:ABC-DEF # From @BotFather
TELEGRAM_ALLOWED_USERS=your-user-id # Comma-separated
DISCORD_BOT_TOKEN=MTIz... # From Developer Portal
DISCORD_ALLOWED_USERS=your-user-id # Comma-separated
```
Or set them one at a time via the CLI:
```bash
hermes config set OPENROUTER_API_KEY sk-or-v1-your-key-here
```
---
### Step 8: Add `hermes` to Your PATH
The `hermes` entry point at `venv/bin/hermes` has a hardcoded shebang pointing to the venv's Python, so it works **without activating the venv**. The recommended approach is a symlink into `~/.local/bin` (most distributions already have this on PATH):
```bash
mkdir -p ~/.local/bin
ln -sf "$(pwd)/venv/bin/hermes" ~/.local/bin/hermes
```
If `~/.local/bin` isn't on your PATH yet, add it:
**Bash** (`~/.bashrc`):
```bash
echo '' >> ~/.bashrc
echo '# Hermes Agent' >> ~/.bashrc
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc
```
**Zsh** (`~/.zshrc`):
```bash
echo '' >> ~/.zshrc
echo '# Hermes Agent' >> ~/.zshrc
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.zshrc
source ~/.zshrc
```
**Fish** (`~/.config/fish/config.fish`):
```fish
fish_add_path $HOME/.local/bin
```
---
### Step 9: Run the Setup Wizard (Optional)
The interactive setup wizard walks you through configuring your API keys and preferences:
```bash
hermes setup
```
This is optional if you already configured `~/.hermes/.env` and `~/.hermes/config.yaml` manually in the steps above.
---
### Step 10: Verify the Installation
```bash
# Check that the command is available
hermes version
# Run diagnostics to verify everything is working
hermes doctor
# Check your configuration
hermes status
# Test with a quick query
hermes chat -q "Hello! What tools do you have available?"
```
If `hermes doctor` reports issues, it will tell you exactly what's missing and how to fix it.
---
### Quick-Reference: Manual Install (Condensed)
For those who just want the commands without the explanations:
```bash
# Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Clone & enter
git clone --recurse-submodules https://github.com/NousResearch/hermes-agent.git
cd hermes-agent
# Run setup script
./setup-hermes.sh
# Create venv with Python 3.11 (uv downloads it if needed)
uv venv venv --python 3.11
export VIRTUAL_ENV="$(pwd)/venv"
# Or manually:
python3 -m venv venv
source venv/bin/activate
pip install -e ".[all]"
# Install everything
uv pip install -e ".[all]"
uv pip install -e "./mini-swe-agent"
uv pip install -e "./tinker-atropos"
npm install # optional, for browser tools
# Install submodules (required for terminal and RL tools)
pip install -e "./mini-swe-agent" # Terminal tool backend
pip install -e "./tinker-atropos" # RL training backend
# Configure
mkdir -p ~/.hermes/{cron,sessions,logs}
cp cli-config.yaml.example ~/.hermes/config.yaml
touch ~/.hermes/.env
echo 'OPENROUTER_API_KEY=sk-or-v1-your-key' >> ~/.hermes/.env
hermes setup
# Make hermes available globally (no venv activation needed)
mkdir -p ~/.local/bin
ln -sf "$(pwd)/venv/bin/hermes" ~/.local/bin/hermes
# Verify
hermes doctor
hermes
```
---
### Updating a Manual Installation
To update an existing manual install to the latest version:
```bash
cd /path/to/hermes-agent
export VIRTUAL_ENV="$(pwd)/venv"
# Pull latest code and submodules
git pull origin main
git submodule update --init --recursive
# Reinstall (picks up new dependencies)
uv pip install -e ".[all]"
uv pip install -e "./mini-swe-agent"
uv pip install -e "./tinker-atropos"
# Check for new config options added since your last update
hermes config check
hermes config migrate # Interactively add any missing options
```
### Uninstalling a Manual Installation
```bash
# Remove the hermes symlink
rm -f ~/.local/bin/hermes
# Remove the cloned repository
rm -rf /path/to/hermes-agent
# Remove user configuration (optional — keep if you plan to reinstall)
rm -rf ~/.hermes
```
---

28
environments/__init__.py Normal file
View File

@@ -0,0 +1,28 @@
"""
Hermes-Agent Atropos Environments
Provides a layered integration between hermes-agent's tool-calling capabilities
and the Atropos RL training framework.
Layers:
- agent_loop: Reusable multi-turn agent loop with standard OpenAI-spec tool calling
- tool_context: Per-rollout tool access handle for reward/verification functions
- hermes_base_env: Abstract base environment (BaseEnv subclass) for Atropos
- tool_call_parsers: Client-side tool call parser registry for Phase 2 (VLLM /generate)
Concrete environments:
- terminal_test_env: Simple file-creation tasks for testing the stack
- hermes_swe_env: SWE-bench style tasks with Modal sandboxes
"""
from environments.agent_loop import AgentResult, HermesAgentLoop
from environments.tool_context import ToolContext
from environments.hermes_base_env import HermesAgentBaseEnv, HermesAgentEnvConfig
__all__ = [
"AgentResult",
"HermesAgentLoop",
"ToolContext",
"HermesAgentBaseEnv",
"HermesAgentEnvConfig",
]

372
environments/agent_loop.py Normal file
View File

@@ -0,0 +1,372 @@
"""
HermesAgentLoop -- Reusable Multi-Turn Agent Engine
Runs the hermes-agent tool-calling loop using standard OpenAI-spec tool calling.
Works with any server that returns ChatCompletion objects with tool_calls:
- Phase 1: OpenAI server type (VLLM, SGLang, OpenRouter, OpenAI API)
- Phase 2: ManagedServer with client-side tool call parser
The loop passes tools= and checks response.choices[0].message.tool_calls,
identical to hermes-agent's run_agent.py. Tool execution is dispatched via
handle_function_call() from model_tools.py.
"""
import asyncio
import concurrent.futures
import json
import logging
import uuid
from dataclasses import dataclass, field
from typing import Any, Dict, List, Optional, Set
from model_tools import handle_function_call
# Thread pool for running sync tool calls that internally use asyncio.run()
# (e.g., mini-swe-agent's modal/docker backends). Running them in a separate
# thread gives them a clean event loop so they don't deadlock inside Atropos's loop.
_tool_executor = concurrent.futures.ThreadPoolExecutor(max_workers=8)
logger = logging.getLogger(__name__)
@dataclass
class ToolError:
"""Record of a tool execution error during the agent loop."""
turn: int # Which turn the error occurred on
tool_name: str # Which tool was called
arguments: str # The arguments passed (truncated)
error: str # The error message
tool_result: str # The raw result returned to the model
@dataclass
class AgentResult:
"""Result of running the agent loop."""
# Full conversation history in OpenAI message format
messages: List[Dict[str, Any]]
# ManagedServer.get_state() if available (Phase 2), None otherwise
managed_state: Optional[Dict[str, Any]] = None
# How many LLM calls were made
turns_used: int = 0
# True if model stopped calling tools naturally (vs hitting max_turns)
finished_naturally: bool = False
# Extracted reasoning content per turn (from PR #297 helpers)
reasoning_per_turn: List[Optional[str]] = field(default_factory=list)
# Tool errors encountered during the loop
tool_errors: List[ToolError] = field(default_factory=list)
def _extract_reasoning_from_message(message) -> Optional[str]:
"""
Extract reasoning content from a ChatCompletion message.
Handles multiple provider formats:
1. message.reasoning_content field (some providers)
2. message.reasoning field (some providers)
3. message.reasoning_details[].text (OpenRouter style)
Note: <think> block extraction from content is NOT done here -- that's
handled by the response already in Phase 1 (server does it) or by
ManagedServer's patch in Phase 2.
Args:
message: The assistant message from ChatCompletion response
Returns:
Extracted reasoning text, or None if not found
"""
# Check reasoning_content field (common across providers)
if hasattr(message, "reasoning_content") and message.reasoning_content:
return message.reasoning_content
# Check reasoning field
if hasattr(message, "reasoning") and message.reasoning:
return message.reasoning
# Check reasoning_details (OpenRouter style)
if hasattr(message, "reasoning_details") and message.reasoning_details:
for detail in message.reasoning_details:
if hasattr(detail, "text") and detail.text:
return detail.text
if isinstance(detail, dict) and detail.get("text"):
return detail["text"]
return None
class HermesAgentLoop:
"""
Runs hermes-agent's tool-calling loop using standard OpenAI-spec tool calling.
Same pattern as run_agent.py:
- Pass tools= to the API
- Check response.choices[0].message.tool_calls
- Dispatch via handle_function_call()
Works identically with any server type -- OpenAI, VLLM, SGLang, OpenRouter,
or ManagedServer with a parser. The server determines how tool_calls get
populated on the response.
"""
def __init__(
self,
server,
tool_schemas: List[Dict[str, Any]],
valid_tool_names: Set[str],
max_turns: int = 30,
task_id: Optional[str] = None,
temperature: float = 1.0,
max_tokens: Optional[int] = None,
):
"""
Initialize the agent loop.
Args:
server: Server object with chat_completion() method (OpenAIServer,
ManagedServer, ServerManager, etc.)
tool_schemas: OpenAI-format tool definitions from get_tool_definitions()
valid_tool_names: Set of tool names the model is allowed to call
max_turns: Maximum number of LLM calls before stopping
task_id: Unique ID for terminal/browser session isolation
temperature: Sampling temperature for generation
max_tokens: Max tokens per generation (None for server default)
"""
self.server = server
self.tool_schemas = tool_schemas
self.valid_tool_names = valid_tool_names
self.max_turns = max_turns
self.task_id = task_id or str(uuid.uuid4())
self.temperature = temperature
self.max_tokens = max_tokens
async def run(self, messages: List[Dict[str, Any]]) -> AgentResult:
"""
Execute the full agent loop using standard OpenAI tool calling.
Args:
messages: Initial conversation messages (system + user).
Modified in-place as the conversation progresses.
Returns:
AgentResult with full conversation history, managed state, and metadata
"""
reasoning_per_turn = []
tool_errors: List[ToolError] = []
for turn in range(self.max_turns):
# Build the chat_completion kwargs
chat_kwargs = {
"messages": messages,
"n": 1,
"temperature": self.temperature,
}
# Only pass tools if we have them
if self.tool_schemas:
chat_kwargs["tools"] = self.tool_schemas
# Only pass max_tokens if explicitly set
if self.max_tokens is not None:
chat_kwargs["max_tokens"] = self.max_tokens
# Make the API call -- standard OpenAI spec
try:
response = await self.server.chat_completion(**chat_kwargs)
except Exception as e:
logger.error("API call failed on turn %d: %s", turn + 1, e)
return AgentResult(
messages=messages,
managed_state=self._get_managed_state(),
turns_used=turn + 1,
finished_naturally=False,
reasoning_per_turn=reasoning_per_turn,
tool_errors=tool_errors,
)
if not response or not response.choices:
logger.warning("Empty response on turn %d", turn + 1)
return AgentResult(
messages=messages,
managed_state=self._get_managed_state(),
turns_used=turn + 1,
finished_naturally=False,
reasoning_per_turn=reasoning_per_turn,
tool_errors=tool_errors,
)
assistant_msg = response.choices[0].message
# Extract reasoning content from the response (all provider formats)
reasoning = _extract_reasoning_from_message(assistant_msg)
reasoning_per_turn.append(reasoning)
# Check for tool calls -- standard OpenAI spec
if assistant_msg.tool_calls:
# Build the assistant message dict for conversation history
msg_dict: Dict[str, Any] = {
"role": "assistant",
"content": assistant_msg.content or "",
"tool_calls": [
{
"id": tc.id,
"type": "function",
"function": {
"name": tc.function.name,
"arguments": tc.function.arguments,
},
}
for tc in assistant_msg.tool_calls
],
}
# Preserve reasoning_content for multi-turn chat template handling
# (e.g., Kimi-K2's template renders <think> blocks differently
# for history vs. the latest turn based on this field)
if reasoning:
msg_dict["reasoning_content"] = reasoning
messages.append(msg_dict)
# Execute each tool call via hermes-agent's dispatch
for tc in assistant_msg.tool_calls:
tool_name = tc.function.name
tool_args_raw = tc.function.arguments
# Validate tool name
if tool_name not in self.valid_tool_names:
tool_result = json.dumps(
{
"error": f"Unknown tool '{tool_name}'. "
f"Available tools: {sorted(self.valid_tool_names)}"
}
)
tool_errors.append(ToolError(
turn=turn + 1, tool_name=tool_name,
arguments=tool_args_raw[:200],
error=f"Unknown tool '{tool_name}'",
tool_result=tool_result,
))
logger.warning(
"Model called unknown tool '%s' on turn %d",
tool_name, turn + 1,
)
else:
# Parse arguments and dispatch
try:
args = json.loads(tool_args_raw)
except json.JSONDecodeError:
args = {}
logger.warning(
"Invalid JSON in tool call arguments for '%s': %s",
tool_name, tool_args_raw[:200],
)
try:
if tool_name == "terminal":
import os
backend = os.getenv("TERMINAL_ENV", "local")
cmd_preview = args.get("command", "")[:80]
print(f" 🖥️ [{backend}] $ {cmd_preview}")
# Run tool calls in a thread pool so backends that use
# asyncio.run() internally (modal, docker) get a clean
# event loop instead of deadlocking inside Atropos's loop.
loop = asyncio.get_event_loop()
tool_result = await loop.run_in_executor(
_tool_executor,
lambda: handle_function_call(
tool_name, args, task_id=self.task_id
),
)
except Exception as e:
tool_result = json.dumps(
{"error": f"Tool execution failed: {type(e).__name__}: {str(e)}"}
)
tool_errors.append(ToolError(
turn=turn + 1, tool_name=tool_name,
arguments=tool_args_raw[:200],
error=f"{type(e).__name__}: {str(e)}",
tool_result=tool_result,
))
logger.error(
"Tool '%s' execution failed on turn %d: %s",
tool_name, turn + 1, e,
)
# Also check if the tool returned an error in its JSON result
try:
result_data = json.loads(tool_result)
if isinstance(result_data, dict):
err = result_data.get("error")
exit_code = result_data.get("exit_code")
if err and exit_code and exit_code < 0:
tool_errors.append(ToolError(
turn=turn + 1, tool_name=tool_name,
arguments=tool_args_raw[:200],
error=str(err),
tool_result=tool_result[:500],
))
except (json.JSONDecodeError, TypeError):
pass
# Add tool response to conversation
messages.append(
{
"role": "tool",
"tool_call_id": tc.id,
"content": tool_result,
}
)
logger.debug(
"Turn %d: %d tool calls executed",
turn + 1,
len(assistant_msg.tool_calls),
)
else:
# No tool calls -- model is done
msg_dict = {
"role": "assistant",
"content": assistant_msg.content or "",
}
if reasoning:
msg_dict["reasoning_content"] = reasoning
messages.append(msg_dict)
logger.debug(
"Turn %d: model finished naturally (no tool calls)", turn + 1
)
return AgentResult(
messages=messages,
managed_state=self._get_managed_state(),
turns_used=turn + 1,
finished_naturally=True,
reasoning_per_turn=reasoning_per_turn,
tool_errors=tool_errors,
)
# Hit max turns without the model stopping
logger.info("Agent hit max_turns (%d) without finishing", self.max_turns)
return AgentResult(
messages=messages,
managed_state=self._get_managed_state(),
turns_used=self.max_turns,
finished_naturally=False,
reasoning_per_turn=reasoning_per_turn,
tool_errors=tool_errors,
)
def _get_managed_state(self) -> Optional[Dict[str, Any]]:
"""
Get ManagedServer state if the server supports it.
Returns state dict with SequenceNodes containing tokens/logprobs/masks,
or None if the server doesn't support get_state() (e.g., regular OpenAI server).
"""
if hasattr(self.server, "get_state"):
return self.server.get_state()
return None

View File

@@ -0,0 +1,33 @@
# SWE Environment -- Default Configuration
#
# SWE-bench style tasks with Modal sandboxes for cloud isolation.
# Uses terminal + file + web toolsets.
#
# Usage:
# python environments/hermes_swe_env.py serve --config environments/configs/swe_default.yaml
env:
enabled_toolsets: ["terminal", "file", "web"]
max_agent_turns: 30
max_token_length: 4096
group_size: 4
terminal_backend: "modal"
tool_call_parser: "hermes"
tokenizer_name: "NousResearch/DeepHermes-3-Llama-3-3B-Preview"
dataset_name: "bigcode/humanevalpack"
dataset_split: "test"
prompt_field: "prompt"
steps_per_eval: 50
total_steps: 500
use_wandb: true
wandb_name: "hermes-swe"
system_prompt: >
You are a skilled software engineer. You have access to a terminal,
file tools, and web search. Use these tools to complete the coding task.
Write clean, working code and verify it runs correctly before finishing.
openai:
base_url: "http://localhost:8000/v1"
model_name: "NousResearch/DeepHermes-3-Llama-3-3B-Preview"
server_type: "openai"
api_key: ""

View File

@@ -0,0 +1,35 @@
# Terminal Test Environment -- Default Configuration
#
# Simple file-creation tasks for validating the full Atropos + hermes-agent stack.
# Uses Modal terminal backend and OpenRouter (Claude) for inference.
# API keys loaded from ~/hermes-agent/.env
#
# Usage:
# run-api
# python environments/terminal_test_env.py serve
# # Or with config file:
# python environments/terminal_test_env.py serve --config environments/configs/terminal_test_default.yaml
env:
enabled_toolsets: ["terminal", "file"]
max_agent_turns: 10
max_token_length: 2048
group_size: 3
total_steps: 3
steps_per_eval: 3
terminal_backend: "modal"
tool_call_parser: "hermes"
tokenizer_name: "NousResearch/DeepHermes-3-Llama-3-3B-Preview"
ensure_scores_are_not_same: false
use_wandb: false
system_prompt: >
You are a helpful assistant with access to a terminal and file tools.
Complete the user's request by using the available tools.
Be precise and follow instructions exactly.
openai:
base_url: "https://openrouter.ai/api/v1"
model_name: "anthropic/claude-opus-4.6"
server_type: "openai"
health_check: false
# api_key loaded from OPENROUTER_API_KEY in .env

View File

@@ -0,0 +1,615 @@
"""
HermesAgentBaseEnv -- Abstract Base Environment for Hermes-Agent + Atropos
Provides the Atropos integration plumbing that all hermes-agent environments share:
- Two-mode operation (OpenAI server for Phase 1, VLLM ManagedServer for Phase 2)
- Per-group toolset/distribution resolution
- Agent loop orchestration via HermesAgentLoop
- ToolContext creation for reward functions
- ScoredDataGroup construction from ManagedServer state
Subclasses only need to implement:
setup() -- Load dataset, initialize state
get_next_item() -- Return the next item from the dataset
format_prompt() -- Convert a dataset item into the user message
compute_reward() -- Score the rollout (has full ToolContext access)
evaluate() -- Periodic evaluation
"""
import asyncio
import json
import logging
import os
import sys
import uuid
from abc import abstractmethod
from pathlib import Path
from typing import Any, Dict, List, Optional, Set, Tuple, Union
# Ensure the hermes-agent repo root is on sys.path so that imports like
# `from model_tools import ...` and `from environments.X import ...` work
# regardless of where the script is invoked from.
_repo_root = Path(__file__).resolve().parent.parent
if str(_repo_root) not in sys.path:
sys.path.insert(0, str(_repo_root))
from dotenv import load_dotenv
from pydantic import Field
# Load API keys from hermes-agent/.env so all environments can access them
_env_path = _repo_root / ".env"
if _env_path.exists():
load_dotenv(dotenv_path=_env_path)
# Apply monkey patches for async-safe tool operation inside Atropos's event loop.
# This patches SwerexModalEnvironment to use a background thread instead of
# asyncio.run(), which would deadlock inside Atropos. Safe for normal CLI too.
from environments.patches import apply_patches
apply_patches()
from atroposlib.envs.base import (
BaseEnv,
BaseEnvConfig,
ScoredDataGroup,
ScoredDataItem,
)
from atroposlib.envs.server_handling.server_manager import (
APIServerConfig,
ServerBaseline,
ServerManager,
)
from atroposlib.type_definitions import Item
from environments.agent_loop import AgentResult, HermesAgentLoop
from environments.tool_context import ToolContext
# Import hermes-agent toolset infrastructure
from model_tools import get_tool_definitions
from toolset_distributions import sample_toolsets_from_distribution
logger = logging.getLogger(__name__)
class HermesAgentEnvConfig(BaseEnvConfig):
"""
Configuration for hermes-agent Atropos environments.
Extends BaseEnvConfig with agent-specific settings for toolsets,
terminal backend, dataset loading, and tool call parsing.
"""
# --- Toolset configuration ---
# Mutually exclusive: use either enabled_toolsets OR distribution
enabled_toolsets: Optional[List[str]] = Field(
default=None,
description="Explicit list of hermes toolsets to enable (e.g., ['terminal', 'file', 'web']). "
"If None and distribution is also None, all available toolsets are enabled.",
)
disabled_toolsets: Optional[List[str]] = Field(
default=None,
description="Toolsets to disable. Applied as a filter on top of enabled_toolsets or distribution.",
)
distribution: Optional[str] = Field(
default=None,
description="Name of a toolset distribution from toolset_distributions.py "
"(e.g., 'development', 'terminal_tasks'). Sampled once per group. "
"Mutually exclusive with enabled_toolsets.",
)
# --- Agent loop configuration ---
max_agent_turns: int = Field(
default=30,
description="Maximum number of LLM calls (tool-calling iterations) per rollout.",
)
system_prompt: Optional[str] = Field(
default=None,
description="System prompt for the agent. Tools are handled via the tools= parameter, "
"not embedded in the prompt text.",
)
agent_temperature: float = Field(
default=1.0,
description="Sampling temperature for agent generation during rollouts.",
)
# --- Terminal backend ---
terminal_backend: str = Field(
default="local",
description="Terminal backend: 'local', 'docker', 'modal', 'ssh', 'singularity'. "
"Modal recommended for production RL (cloud isolation per rollout).",
)
# --- Dataset ---
dataset_name: Optional[str] = Field(
default=None,
description="HuggingFace dataset name. Optional if tasks are defined inline.",
)
dataset_split: str = Field(
default="train",
description="Dataset split to use.",
)
prompt_field: str = Field(
default="prompt",
description="Which field in the dataset contains the prompt.",
)
# --- Phase 2: Tool call parsing ---
tool_call_parser: str = Field(
default="hermes",
description="Tool call parser name for Phase 2 (VLLM server type). "
"Ignored in Phase 1 (OpenAI server type where VLLM parses natively). "
"Options: hermes, mistral, llama3_json, qwen, deepseek_v3, etc.",
)
class HermesAgentBaseEnv(BaseEnv):
"""
Abstract base environment for hermes-agent Atropos integration.
Handles two modes of operation:
- Phase 1 (OpenAI server type): Uses server.chat_completion() directly.
The server (VLLM, SGLang, OpenRouter, OpenAI) handles tool call parsing
and reasoning extraction natively. DummyManagedServer provides placeholder
tokens. Good for SFT data gen, verifier testing, evaluation.
- Phase 2 (VLLM server type): Uses ManagedServer for exact token IDs + logprobs
via /generate. Client-side tool call parser reconstructs structured tool_calls
from raw output. Full RL training capability.
Subclasses must implement:
setup() -- Load dataset, initialize state
get_next_item() -- Return the next item to roll out
format_prompt() -- Convert a dataset item into the user message string
compute_reward() -- Score the rollout using ToolContext
evaluate() -- Periodic evaluation
"""
name: Optional[str] = "hermes-agent"
env_config_cls = HermesAgentEnvConfig
def __init__(
self,
config: HermesAgentEnvConfig,
server_configs: Union[ServerBaseline, List[APIServerConfig]],
slurm=False,
testing=False,
):
super().__init__(config, server_configs, slurm, testing)
# Set terminal backend environment variable so hermes tools pick it up
if config.terminal_backend:
os.environ["TERMINAL_ENV"] = config.terminal_backend
print(f"🖥️ Terminal backend: {config.terminal_backend}")
# Current group's resolved tools (set in collect_trajectories)
self._current_group_tools: Optional[Tuple[List[Dict], Set[str]]] = None
# Tool error tracking for wandb logging
self._tool_error_buffer: List[Dict[str, Any]] = []
# =========================================================================
# Toolset resolution (per-group)
# =========================================================================
def _resolve_tools_for_group(self) -> Tuple[List[Dict[str, Any]], Set[str]]:
"""
Resolve toolsets for a group. Called once in collect_trajectories(),
then shared by all collect_trajectory() calls in the group.
If distribution is set, samples probabilistically.
If enabled_toolsets is set, uses that explicit list.
disabled_toolsets is applied as a filter on top.
Returns:
(tool_schemas, valid_tool_names) tuple
"""
config = self.config
if config.distribution:
group_toolsets = sample_toolsets_from_distribution(config.distribution)
logger.info("Sampled toolsets from '%s': %s", config.distribution, group_toolsets)
else:
group_toolsets = config.enabled_toolsets # None means "all available"
tools = get_tool_definitions(
enabled_toolsets=group_toolsets,
disabled_toolsets=config.disabled_toolsets,
quiet_mode=True,
)
valid_names = {t["function"]["name"] for t in tools} if tools else set()
logger.info("Resolved %d tools for group: %s", len(valid_names), sorted(valid_names))
return tools, valid_names
# =========================================================================
# Server mode detection
# =========================================================================
def _use_managed_server(self) -> bool:
"""
Determine if we should use ManagedServer (Phase 2) or direct server (Phase 1).
Phase 2 (ManagedServer) is used when the server type is 'vllm' or 'sglang',
which go through the /generate endpoint for exact token tracking.
Phase 1 (direct server) is used for 'openai' server type, which uses
/v1/chat/completions with native tool call parsing.
"""
if not self.server.servers:
return False
server = self.server.servers[0]
# If the server is an OpenAI server (not VLLM/SGLang), use direct mode
from atroposlib.envs.server_handling.openai_server import OpenAIServer
return not isinstance(server, OpenAIServer)
# =========================================================================
# Core Atropos integration
# =========================================================================
async def collect_trajectories(
self, item: Item
) -> Tuple[
Union[Optional[ScoredDataGroup], List[Optional[ScoredDataGroup]]],
List[Item],
]:
"""
Override collect_trajectories to resolve toolsets once per group,
then delegate to the standard group-level collection.
The default BaseEnv.collect_trajectories() calls collect_trajectory()
group_size times in parallel. We resolve tools once here and store
them for all those calls to use.
"""
# Resolve toolsets for this group (shared by all rollouts in the group)
self._current_group_tools = self._resolve_tools_for_group()
# Delegate to the default implementation which calls collect_trajectory()
# group_size times via asyncio.gather
return await super().collect_trajectories(item)
# =========================================================================
# Wandb rollout display -- format trajectories nicely
# =========================================================================
@staticmethod
def _format_trajectory_for_display(messages: List[Dict[str, Any]]) -> str:
"""
Format a conversation's messages into a readable trajectory string
for wandb rollout tables. Shows tool calls, tool results, and reasoning
in a structured way instead of raw token decoding.
"""
parts = []
for msg in messages:
role = msg.get("role", "unknown")
content = msg.get("content", "")
if role == "system":
parts.append(f"[SYSTEM]\n{content}")
elif role == "user":
parts.append(f"[USER]\n{content}")
elif role == "assistant":
# Show reasoning if present
reasoning = msg.get("reasoning_content", "")
if reasoning:
# Truncate long reasoning for display
if len(reasoning) > 300:
reasoning = reasoning[:300] + "..."
parts.append(f"[ASSISTANT thinking]\n{reasoning}")
# Show content
if content:
parts.append(f"[ASSISTANT]\n{content}")
# Show tool calls
tool_calls = msg.get("tool_calls", [])
for tc in tool_calls:
func = tc.get("function", {})
name = func.get("name", "?")
args = func.get("arguments", "{}")
# Truncate long arguments for display
if len(args) > 200:
args = args[:200] + "..."
parts.append(f"[TOOL CALL] {name}({args})")
elif role == "tool":
tool_id = msg.get("tool_call_id", "")
result = content
# Truncate long tool results for display
if len(result) > 500:
result = result[:500] + "..."
parts.append(f"[TOOL RESULT] {result}")
return "\n\n".join(parts)
async def add_rollouts_for_wandb(
self,
scored_data,
item=None,
):
"""
Override to show formatted trajectories with tool calls visible,
instead of raw token decoding which loses all structure.
"""
num_keep = self.config.num_rollouts_per_group_for_logging
if num_keep == -1:
num_keep = self.config.group_size
group = []
for i in range(min(num_keep, len(scored_data.get("scores", [])))):
score = scored_data["scores"][i]
# Use messages if available for rich display
messages = None
if scored_data.get("messages") and i < len(scored_data["messages"]):
messages = scored_data["messages"][i]
if messages:
text = self._format_trajectory_for_display(messages)
elif scored_data.get("tokens") and i < len(scored_data["tokens"]):
text = self.tokenizer.decode(scored_data["tokens"][i])
else:
text = "(no data)"
group.append((text, score))
self.rollouts_for_wandb.append(group)
if len(self.rollouts_for_wandb) > self.config.num_rollouts_to_keep:
self.rollouts_for_wandb.pop(0)
async def wandb_log(self, wandb_metrics: Optional[Dict] = None):
"""Log base metrics including tool errors to wandb."""
if wandb_metrics is None:
wandb_metrics = {}
# Log tool error stats
if self._tool_error_buffer:
wandb_metrics["train/tool_errors_count"] = len(self._tool_error_buffer)
# Log error details as a summary string (tables can crash wandb on tmp cleanup)
error_summaries = []
for err in self._tool_error_buffer:
error_summaries.append(
f"[turn {err['turn']}] {err['tool']}({err['args'][:80]}) -> {err['error'][:150]}"
)
wandb_metrics["train/tool_error_details"] = "\n".join(error_summaries)
# Also print to stdout for immediate visibility
for summary in error_summaries:
print(f" Tool Error: {summary}")
self._tool_error_buffer = []
else:
wandb_metrics["train/tool_errors_count"] = 0
await super().wandb_log(wandb_metrics)
async def collect_trajectory(
self, item: Item
) -> Tuple[Optional[Union[ScoredDataItem, Any]], List[Item]]:
"""
Run a single rollout: agent loop + reward computation.
This is called group_size times in parallel by collect_trajectories().
Each call gets its own task_id for terminal/browser session isolation.
"""
task_id = str(uuid.uuid4())
# Get group-level tools (resolved once in collect_trajectories)
if self._current_group_tools is None:
# Fallback: resolve per-trajectory if called outside collect_trajectories
tools, valid_names = self._resolve_tools_for_group()
else:
tools, valid_names = self._current_group_tools
# Build initial messages
messages: List[Dict[str, Any]] = []
if self.config.system_prompt:
messages.append({"role": "system", "content": self.config.system_prompt})
messages.append({"role": "user", "content": self.format_prompt(item)})
# Run the agent loop
result: AgentResult
if self._use_managed_server():
# Phase 2: ManagedServer with parser -- exact tokens + logprobs
# Load the tool call parser from registry based on config
from environments.tool_call_parsers import get_parser
try:
tc_parser = get_parser(self.config.tool_call_parser)
except KeyError:
logger.warning(
"Tool call parser '%s' not found, falling back to 'hermes'",
self.config.tool_call_parser,
)
tc_parser = get_parser("hermes")
try:
async with self.server.managed_server(
tokenizer=self.tokenizer,
tool_call_parser=tc_parser,
) as managed:
agent = HermesAgentLoop(
server=managed,
tool_schemas=tools,
valid_tool_names=valid_names,
max_turns=self.config.max_agent_turns,
task_id=task_id,
temperature=self.config.agent_temperature,
max_tokens=self.config.max_token_length,
)
result = await agent.run(messages)
except NotImplementedError:
# DummyManagedServer not allowed -- fall back to Phase 1
logger.warning(
"ManagedServer not available (OpenAI server?). "
"Falling back to direct server mode."
)
agent = HermesAgentLoop(
server=self.server,
tool_schemas=tools,
valid_tool_names=valid_names,
max_turns=self.config.max_agent_turns,
task_id=task_id,
temperature=self.config.agent_temperature,
max_tokens=self.config.max_token_length,
)
result = await agent.run(messages)
else:
# Phase 1: OpenAI server -- native tool_calls, placeholder tokens
agent = HermesAgentLoop(
server=self.server,
tool_schemas=tools,
valid_tool_names=valid_names,
max_turns=self.config.max_agent_turns,
task_id=task_id,
temperature=self.config.agent_temperature,
max_tokens=self.config.max_token_length,
)
result = await agent.run(messages)
# Skip reward computation if the agent loop produced no meaningful work
# (e.g., API call failed on turn 1). No point spinning up a Modal sandbox
# just to verify files that were never created.
only_system_and_user = all(
msg.get("role") in ("system", "user") for msg in result.messages
)
if result.turns_used == 0 or only_system_and_user:
logger.warning(
"Agent loop produced no output (turns=%d, msgs=%d). Skipping reward.",
result.turns_used, len(result.messages),
)
reward = 0.0
else:
# Compute reward using ToolContext (gives verifier full tool access)
ctx = ToolContext(task_id)
try:
reward = await self.compute_reward(item, result, ctx)
except Exception as e:
logger.error("compute_reward failed: %s", e)
reward = 0.0
finally:
ctx.cleanup()
# Track tool errors for wandb logging
if result.tool_errors:
for err in result.tool_errors:
self._tool_error_buffer.append({
"turn": err.turn,
"tool": err.tool_name,
"args": err.arguments[:150],
"error": err.error[:300],
"result": err.tool_result[:300],
})
# Build ScoredDataItem from ManagedServer state
# Phase 2: real tokens/masks/logprobs from SequenceNodes
# Phase 1: placeholder tokens (still need a valid ScoredDataItem for the pipeline)
nodes = (result.managed_state or {}).get("nodes", [])
if nodes:
# Phase 2 (or DummyManagedServer): use actual node data
node = nodes[-1] # Final sequence node = full trajectory
scored_item: Dict[str, Any] = {
"tokens": node.tokens,
"masks": node.masked_tokens,
"scores": reward,
}
# Include logprobs if available (Phase 2)
if hasattr(node, "logprobs") and node.logprobs:
scored_item["advantages"] = None # Computed by trainer
scored_item["ref_logprobs"] = None
else:
# Phase 1 with no managed state: create placeholder tokens
# so the data pipeline doesn't break. These are NOT suitable
# for training but allow process mode (SFT data gen) to work.
# Tokenize the full conversation to get approximate tokens.
full_text = "\n".join(
msg.get("content", "") for msg in result.messages if msg.get("content")
)
if self.tokenizer:
tokens = self.tokenizer.encode(full_text, add_special_tokens=True)
else:
tokens = list(range(min(len(full_text) // 4, 128)))
scored_item = {
"tokens": tokens,
"masks": [-100] + tokens[1:], # Mask first token as prompt
"scores": reward,
}
# Always include messages for wandb rollout display and data logging
scored_item["messages"] = result.messages
return scored_item, []
# =========================================================================
# Abstract methods -- subclasses must implement
# =========================================================================
@abstractmethod
async def setup(self):
"""
Load dataset, initialize state.
Called once when the environment starts. Typical implementation:
self.dataset = load_dataset(self.config.dataset_name, split=self.config.dataset_split)
self.iter = 0
"""
raise NotImplementedError
@abstractmethod
async def get_next_item(self) -> Item:
"""
Return the next item from the dataset for rollout.
Called by the base env's main loop to get items for workers.
Should cycle through the dataset.
"""
raise NotImplementedError
@abstractmethod
def format_prompt(self, item: Item) -> str:
"""
Convert a dataset item into the user message for the agent.
Args:
item: Dataset item (dict, tuple, etc.)
Returns:
The prompt string to send to the agent
"""
raise NotImplementedError
@abstractmethod
async def compute_reward(
self, item: Item, result: AgentResult, ctx: ToolContext
) -> float:
"""
Score the rollout. Has full access to:
- item: the original dataset item (ground truth, test commands, etc.)
- result: AgentResult with full messages, turn count, reasoning, etc.
- ctx: ToolContext -- call ANY hermes-agent tool (terminal, file, web,
browser, vision...) scoped to this rollout's sandbox. Nothing
is off-limits.
Args:
item: The dataset item that was rolled out
result: The agent's rollout result
ctx: ToolContext with full tool access for verification
Returns:
Reward float (typically 0.0 to 1.0, but any float is valid)
"""
raise NotImplementedError
@abstractmethod
async def evaluate(self, *args, **kwargs):
"""
Periodic evaluation. Called every steps_per_eval steps.
Typical implementation runs the agent on a held-out eval set
and logs metrics via wandb/evaluate_log.
"""
raise NotImplementedError

View File

@@ -0,0 +1,229 @@
"""
HermesSweEnv -- SWE-Bench Style Environment with Modal Sandboxes
A concrete environment for software engineering tasks where the model writes code
and the reward function runs tests to verify correctness. Uses Modal terminal
backend for cloud-isolated sandboxes per rollout.
The reward function uses ToolContext.terminal() to run test commands in the same
Modal sandbox the model used during its agentic loop. All filesystem state from
the model's tool calls is preserved for verification.
Usage:
# Phase 1: OpenAI server type
vllm serve YourModel --tool-parser hermes
run-api
python environments/hermes_swe_env.py serve \\
--openai.base_url http://localhost:8000/v1 \\
--openai.model_name YourModel \\
--openai.server_type openai \\
--env.dataset_name bigcode/humanevalpack \\
--env.terminal_backend modal
# Phase 2: VLLM server type (full RL training)
python environments/hermes_swe_env.py serve \\
--openai.base_url http://localhost:8000/v1 \\
--openai.model_name YourModel \\
--openai.server_type vllm \\
--env.tool_call_parser hermes \\
--env.terminal_backend modal
"""
import logging
import sys
import time
from pathlib import Path
from typing import Any, Dict, List, Optional, Tuple, Union
# Ensure repo root is on sys.path for imports
_repo_root = Path(__file__).resolve().parent.parent
if str(_repo_root) not in sys.path:
sys.path.insert(0, str(_repo_root))
from datasets import load_dataset
from atroposlib.envs.base import ScoredDataGroup
from atroposlib.envs.server_handling.server_manager import APIServerConfig
from atroposlib.type_definitions import Item
from environments.agent_loop import AgentResult
from environments.hermes_base_env import HermesAgentBaseEnv, HermesAgentEnvConfig
from environments.tool_context import ToolContext
logger = logging.getLogger(__name__)
class HermesSweEnvConfig(HermesAgentEnvConfig):
"""Config with defaults for SWE-bench style tasks."""
pass # Inherits all fields, overrides defaults in config_init
class HermesSweEnv(HermesAgentBaseEnv):
"""
SWE-bench style environment using Modal terminal backend.
The model gets a coding task, uses terminal + file + web tools to solve it,
and the reward function runs tests in the same Modal sandbox to verify.
Subclass this for specific SWE datasets (HumanEval, SWE-bench, etc.)
and customize format_prompt() and compute_reward() as needed.
"""
name = "hermes-swe"
env_config_cls = HermesSweEnvConfig
@classmethod
def config_init(cls) -> Tuple[HermesSweEnvConfig, List[APIServerConfig]]:
"""
Default configuration for the SWE environment.
Uses Modal terminal backend for cloud isolation and terminal + file + web toolsets.
"""
env_config = HermesSweEnvConfig(
# Toolsets: terminal for running code, file for reading/writing, web for docs
enabled_toolsets=["terminal", "file", "web"],
disabled_toolsets=None,
distribution=None,
# Agent settings -- SWE tasks need more turns
max_agent_turns=30,
max_token_length=4096,
agent_temperature=1.0,
system_prompt=(
"You are a skilled software engineer. You have access to a terminal, "
"file tools, and web search. Use these tools to complete the coding task. "
"Write clean, working code and verify it runs correctly before finishing."
),
# Modal backend for cloud-isolated sandboxes
terminal_backend="modal",
# Dataset -- override via CLI for your specific SWE dataset
dataset_name="bigcode/humanevalpack",
dataset_split="test",
prompt_field="prompt",
# Atropos settings
group_size=4,
tokenizer_name="NousResearch/DeepHermes-3-Llama-3-3B-Preview",
tool_call_parser="hermes",
steps_per_eval=50,
total_steps=500,
use_wandb=True,
wandb_name="hermes-swe",
)
server_configs = [
APIServerConfig(
base_url="http://localhost:8000/v1",
model_name="NousResearch/DeepHermes-3-Llama-3-3B-Preview",
server_type="openai", # Phase 1; switch to "vllm" for Phase 2
api_key="",
)
]
return env_config, server_configs
async def setup(self):
"""Load the SWE dataset."""
if self.config.dataset_name:
self.dataset = load_dataset(
self.config.dataset_name, split=self.config.dataset_split
)
else:
# Placeholder if no dataset specified
self.dataset = []
self.iter = 0
self.reward_buffer: List[float] = []
async def get_next_item(self) -> Dict[str, Any]:
"""Cycle through the SWE dataset."""
if not self.dataset:
raise ValueError("No dataset loaded. Set dataset_name in config.")
item = self.dataset[self.iter % len(self.dataset)]
self.iter += 1
return item
def format_prompt(self, item: Dict[str, Any]) -> str:
"""
Format the SWE task prompt.
Override this in subclasses for different dataset formats.
Default assumes the dataset has a 'prompt' field and optionally a 'test' field.
"""
prompt = item.get(self.config.prompt_field, "")
# If the dataset has test information, include it in the prompt
test_info = item.get("test", item.get("test_code", item.get("tests", "")))
if test_info:
prompt += f"\n\nTests to pass:\n{test_info}"
return prompt
async def compute_reward(
self, item: Dict[str, Any], result: AgentResult, ctx: ToolContext
) -> float:
"""
Score by running tests in the model's Modal sandbox.
Default implementation:
- If the dataset item has a 'test' or 'test_code' field, run it
- Check exit code: 0 = pass, non-zero = fail
- Partial credit for file creation
Override this in subclasses for more sophisticated reward logic.
"""
# Find the test command from the dataset item
test_code = item.get("test", item.get("test_code", item.get("tests", "")))
if test_code:
# Run the test in the model's sandbox
test_result = ctx.terminal(
f'cd /workspace && python3 -c "{test_code}"', timeout=60
)
if test_result["exit_code"] == 0:
self.reward_buffer.append(1.0)
return 1.0
# Partial credit: check if the model created any Python files
file_check = ctx.terminal("find /workspace -name '*.py' -newer /tmp/.start_marker 2>/dev/null | head -5")
if file_check["exit_code"] == 0 and file_check.get("output", "").strip():
self.reward_buffer.append(0.1)
return 0.1
self.reward_buffer.append(0.0)
return 0.0
async def evaluate(self, *args, **kwargs):
"""
Run evaluation on a held-out set.
Override for dataset-specific evaluation logic.
"""
start_time = time.time()
end_time = time.time()
eval_metrics = {"eval/placeholder": 0.0}
await self.evaluate_log(
metrics=eval_metrics,
start_time=start_time,
end_time=end_time,
)
async def wandb_log(self, wandb_metrics: Optional[Dict] = None):
"""Log SWE-specific metrics."""
if wandb_metrics is None:
wandb_metrics = {}
if self.reward_buffer:
wandb_metrics["train/avg_reward"] = sum(self.reward_buffer) / len(
self.reward_buffer
)
wandb_metrics["train/pass_rate"] = sum(
1 for r in self.reward_buffer if r == 1.0
) / len(self.reward_buffer)
self.reward_buffer = []
await super().wandb_log(wandb_metrics)
if __name__ == "__main__":
HermesSweEnv.cli()

188
environments/patches.py Normal file
View File

@@ -0,0 +1,188 @@
"""
Monkey patches for making hermes-agent tools work inside async frameworks (Atropos).
Problem:
Some tools use asyncio.run() internally (e.g., mini-swe-agent's Modal backend,
web_extract). This crashes when called from inside Atropos's event loop because
asyncio.run() can't be nested.
Solution:
Replace the problematic methods with versions that use a dedicated background
thread with its own event loop. The calling code sees the same sync interface --
call a function, get a result -- but internally the async work happens on a
separate thread that doesn't conflict with Atropos's loop.
These patches are safe for normal CLI use too: when there's no running event
loop, the behavior is identical (the background thread approach works regardless).
What gets patched:
- SwerexModalEnvironment.__init__ -- creates Modal deployment on a background thread
- SwerexModalEnvironment.execute -- runs commands on the same background thread
- SwerexModalEnvironment.stop -- stops deployment on the background thread
Usage:
Call apply_patches() once at import time (done automatically by hermes_base_env.py).
This is idempotent -- calling it multiple times is safe.
"""
import asyncio
import logging
import threading
from typing import Any
logger = logging.getLogger(__name__)
_patches_applied = False
class _AsyncWorker:
"""
A dedicated background thread with its own event loop.
Allows sync code to submit async coroutines and block for results,
even when called from inside another running event loop. Used to
bridge sync tool interfaces with async backends (Modal, SWE-ReX).
"""
def __init__(self):
self._loop: asyncio.AbstractEventLoop = None
self._thread: threading.Thread = None
self._started = threading.Event()
def start(self):
"""Start the background event loop thread."""
self._thread = threading.Thread(target=self._run_loop, daemon=True)
self._thread.start()
self._started.wait(timeout=30)
def _run_loop(self):
"""Background thread entry point -- runs the event loop forever."""
self._loop = asyncio.new_event_loop()
asyncio.set_event_loop(self._loop)
self._started.set()
self._loop.run_forever()
def run_coroutine(self, coro, timeout=600):
"""
Submit a coroutine to the background loop and block until it completes.
Safe to call from any thread, including threads that already have
a running event loop.
"""
if self._loop is None or self._loop.is_closed():
raise RuntimeError("AsyncWorker loop is not running")
future = asyncio.run_coroutine_threadsafe(coro, self._loop)
return future.result(timeout=timeout)
def stop(self):
"""Stop the background event loop and join the thread."""
if self._loop and self._loop.is_running():
self._loop.call_soon_threadsafe(self._loop.stop)
if self._thread:
self._thread.join(timeout=10)
def _patch_swerex_modal():
"""
Monkey patch SwerexModalEnvironment to use a background thread event loop
instead of asyncio.run(). This makes it safe to call from inside Atropos's
async event loop.
The patched methods have the exact same interface and behavior -- the only
difference is HOW the async work is executed internally.
"""
try:
from minisweagent.environments.extra.swerex_modal import (
SwerexModalEnvironment,
SwerexModalEnvironmentConfig,
)
from swerex.deployment.modal import ModalDeployment
from swerex.runtime.abstract import Command as RexCommand
except ImportError:
# mini-swe-agent or swe-rex not installed -- nothing to patch
logger.debug("mini-swe-agent Modal backend not available, skipping patch")
return
# Save original methods so we can refer to config handling
_original_init = SwerexModalEnvironment.__init__
def _patched_init(self, **kwargs):
"""Patched __init__: creates Modal deployment on a background thread."""
self.config = SwerexModalEnvironmentConfig(**kwargs)
# Start a dedicated event loop thread for all Modal async operations
self._worker = _AsyncWorker()
self._worker.start()
# Create AND start the deployment entirely on the worker's loop/thread
# so all gRPC channels and async state are bound to that loop
async def _create_and_start():
deployment = ModalDeployment(
image=self.config.image,
startup_timeout=self.config.startup_timeout,
runtime_timeout=self.config.runtime_timeout,
deployment_timeout=self.config.deployment_timeout,
install_pipx=self.config.install_pipx,
modal_sandbox_kwargs=self.config.modal_sandbox_kwargs,
)
await deployment.start()
return deployment
self.deployment = self._worker.run_coroutine(_create_and_start())
def _patched_execute(self, command: str, cwd: str = "", *, timeout: int | None = None) -> dict[str, Any]:
"""Patched execute: runs commands on the background thread's loop."""
async def _do_execute():
return await self.deployment.runtime.execute(
RexCommand(
command=command,
shell=True,
check=False,
cwd=cwd or self.config.cwd,
timeout=timeout or self.config.timeout,
merge_output_streams=True,
env=self.config.env if self.config.env else None,
)
)
output = self._worker.run_coroutine(_do_execute())
return {
"output": output.stdout,
"returncode": output.exit_code,
}
def _patched_stop(self):
"""Patched stop: stops deployment on the background thread, then stops the thread."""
try:
self._worker.run_coroutine(
asyncio.wait_for(self.deployment.stop(), timeout=10),
timeout=15,
)
except Exception:
pass
finally:
self._worker.stop()
# Apply the patches
SwerexModalEnvironment.__init__ = _patched_init
SwerexModalEnvironment.execute = _patched_execute
SwerexModalEnvironment.stop = _patched_stop
logger.debug("Patched SwerexModalEnvironment for async-safe operation")
def apply_patches():
"""
Apply all monkey patches needed for Atropos compatibility.
Safe to call multiple times -- patches are only applied once.
Safe for normal CLI use -- patched code works identically when
there is no running event loop.
"""
global _patches_applied
if _patches_applied:
return
_patch_swerex_modal()
_patches_applied = True

View File

@@ -0,0 +1,292 @@
"""
TerminalTestEnv -- Simple Test Environment for Validating the Stack
A self-contained environment with inline tasks (no external dataset needed).
Each task asks the model to create a file at a known path with specific content.
The reward verifier cats the file and checks if the content matches.
Enables only terminal + file toolsets. Uses Modal terminal backend with
OpenRouter (Claude) by default.
Training tasks (3):
1. Create ~/greeting.txt with "Hello from Hermes Agent"
2. Create ~/count.txt with numbers 1-5, one per line
3. Create ~/answer.txt with the result of 123 + 456
Eval task (1):
1. Create ~/result.txt with the result of 6 * 7
Usage:
# Start Atropos API server
run-api
# Run environment (uses OpenRouter + Modal by default)
python environments/terminal_test_env.py serve
# Process mode (no run-api needed, saves to JSONL)
python environments/terminal_test_env.py process \\
--env.data_path_to_save_groups terminal_test_output.jsonl
"""
import logging
import os
import sys
import time
from pathlib import Path
from typing import Any, Dict, List, Optional, Tuple, Union
# Ensure repo root is on sys.path for imports
_repo_root = Path(__file__).resolve().parent.parent
if str(_repo_root) not in sys.path:
sys.path.insert(0, str(_repo_root))
from atroposlib.envs.base import ScoredDataGroup
from atroposlib.envs.server_handling.server_manager import APIServerConfig
from atroposlib.type_definitions import Item
from environments.agent_loop import AgentResult
from environments.hermes_base_env import HermesAgentBaseEnv, HermesAgentEnvConfig
from environments.tool_context import ToolContext
logger = logging.getLogger(__name__)
# =============================================================================
# Inline task definitions -- no external dataset needed
# =============================================================================
TRAIN_TASKS = [
{
"prompt": "Create a file at ~/greeting.txt containing exactly the text: Hello from Hermes Agent",
"verify_path": "~/greeting.txt",
"expected_content": "Hello from Hermes Agent",
},
{
"prompt": "Create a file at ~/count.txt containing the numbers 1 through 5, one per line",
"verify_path": "~/count.txt",
"expected_content": "1\n2\n3\n4\n5",
},
{
"prompt": "Create a file at ~/answer.txt containing the result of 123 + 456",
"verify_path": "~/answer.txt",
"expected_content": "579",
},
]
EVAL_TASKS = [
{
"prompt": "Create a file at ~/result.txt containing the result of 6 * 7",
"verify_path": "~/result.txt",
"expected_content": "42",
},
]
class TerminalTestEnvConfig(HermesAgentEnvConfig):
"""Config with defaults suitable for terminal testing."""
pass # Inherits all fields, overrides defaults in config_init
class TerminalTestEnv(HermesAgentBaseEnv):
"""
Simple test environment with inline file-creation tasks.
All tasks follow the same pattern: "create a file at ~/X.txt with content Y".
The verifier runs `cat ~/X.txt` in the rollout's terminal and checks the output
against the expected string. Same verifier logic for all tasks.
This environment is designed to validate the full stack end-to-end:
- Agent loop executes tool calls (terminal/file)
- ToolContext provides terminal access to the reward function
- Reward function verifies file content via cat
- Scored data flows through the Atropos pipeline
"""
name = "terminal-test"
env_config_cls = TerminalTestEnvConfig
@classmethod
def config_init(cls) -> Tuple[TerminalTestEnvConfig, List[APIServerConfig]]:
"""
Default configuration for the terminal test environment.
Uses Modal terminal backend for cloud isolation and OpenRouter with
Claude for inference. API keys loaded from ~/hermes-agent/.env.
"""
env_config = TerminalTestEnvConfig(
# Terminal + file tools only
enabled_toolsets=["terminal", "file"],
disabled_toolsets=None,
distribution=None,
# Agent settings
max_agent_turns=10, # Simple tasks, don't need many turns
max_token_length=16000,
agent_temperature=1.0,
system_prompt=(
"You are a helpful assistant with access to a terminal and file tools. "
"Complete the user's request by using the available tools. "
"Be precise and follow instructions exactly."
),
# Modal terminal backend for cloud-isolated sandboxes per rollout
terminal_backend="modal",
# Atropos settings
group_size=3, # 3 rollouts per group
tokenizer_name="NousResearch/q-30b-t-h45-e1",
tool_call_parser="hermes",
steps_per_eval=3, # Eval after all 3 steps
total_steps=3, # 3 groups total (1 group per step)
use_wandb=True,
wandb_name="terminal-test",
ensure_scores_are_not_same=False, # Allow all-same scores for simple tasks
# No external dataset
dataset_name=None,
)
# OpenRouter with Claude -- API key loaded from .env (OPENROUTER_API_KEY)
server_configs = [
APIServerConfig(
base_url="https://openrouter.ai/api/v1",
model_name="anthropic/claude-opus-4.6",
server_type="openai",
api_key=os.getenv("OPENROUTER_API_KEY", ""),
health_check=False, # OpenRouter doesn't have a /health endpoint
)
]
return env_config, server_configs
async def setup(self):
"""Initialize inline task lists."""
self.train_tasks = list(TRAIN_TASKS)
self.eval_tasks = list(EVAL_TASKS)
self.iter = 0
# Track reward stats for wandb logging
self.reward_buffer: List[float] = []
async def get_next_item(self) -> Dict[str, str]:
"""Cycle through training tasks."""
item = self.train_tasks[self.iter % len(self.train_tasks)]
self.iter += 1
return item
def format_prompt(self, item: Dict[str, str]) -> str:
"""The prompt is directly in the task item."""
return item["prompt"]
async def compute_reward(
self, item: Dict[str, str], result: AgentResult, ctx: ToolContext
) -> float:
"""
Verify by cat-ing the expected file path and checking content matches.
Same verifier for all tasks -- they all write a file at a known path.
Scoring:
1.0 = exact match
0.5 = expected content is present but has extra stuff
0.0 = file doesn't exist or content doesn't match
"""
verify_result = ctx.terminal(f"cat {item['verify_path']}")
# File doesn't exist or can't be read
if verify_result["exit_code"] != 0:
self.reward_buffer.append(0.0)
return 0.0
actual = verify_result.get("output", "").strip()
expected = item["expected_content"].strip()
# Exact match
if actual == expected:
self.reward_buffer.append(1.0)
return 1.0
# Partial credit: expected content is present but has extra stuff
if expected in actual:
self.reward_buffer.append(0.5)
return 0.5
self.reward_buffer.append(0.0)
return 0.0
async def evaluate(self, *args, **kwargs):
"""
Run eval tasks using the agent loop and verify results.
Logs accuracy metrics.
"""
start_time = time.time()
correct = 0
total = len(self.eval_tasks)
samples = []
for eval_item in self.eval_tasks:
try:
# For eval, we do a simple single-turn completion (not full agent loop)
# to keep eval fast. The agent loop is tested via training.
completion = await self.server.chat_completion(
messages=[
{"role": "system", "content": self.config.system_prompt or ""},
{"role": "user", "content": eval_item["prompt"]},
],
n=1,
max_tokens=self.config.max_token_length,
temperature=0.0,
split="eval",
)
response_content = (
completion.choices[0].message.content if completion.choices else ""
)
samples.append(
{
"prompt": eval_item["prompt"],
"response": response_content,
"expected": eval_item["expected_content"],
}
)
except Exception as e:
logger.error("Eval failed for item: %s", e)
samples.append(
{
"prompt": eval_item["prompt"],
"response": f"ERROR: {e}",
"expected": eval_item["expected_content"],
}
)
end_time = time.time()
eval_metrics = {
"eval/num_samples": total,
}
await self.evaluate_log(
metrics=eval_metrics,
samples=samples,
start_time=start_time,
end_time=end_time,
)
async def wandb_log(self, wandb_metrics: Optional[Dict] = None):
"""Log training metrics including reward stats and accuracy."""
if wandb_metrics is None:
wandb_metrics = {}
if self.reward_buffer:
total = len(self.reward_buffer)
correct = sum(1 for r in self.reward_buffer if r == 1.0)
partial = sum(1 for r in self.reward_buffer if r == 0.5)
wandb_metrics["train/avg_reward"] = sum(self.reward_buffer) / total
wandb_metrics["train/accuracy"] = correct / total
wandb_metrics["train/partial_match_rate"] = partial / total
wandb_metrics["train/total_rollouts"] = total
self.reward_buffer = []
await super().wandb_log(wandb_metrics)
if __name__ == "__main__":
TerminalTestEnv.cli()

View File

@@ -0,0 +1,120 @@
"""
Tool Call Parser Registry
Client-side parsers that extract structured tool_calls from raw model output text.
Used in Phase 2 (VLLM server type) where ManagedServer's /generate endpoint returns
raw text without tool call parsing.
Each parser is a standalone reimplementation of the corresponding VLLM parser's
non-streaming extract_tool_calls() logic. No VLLM dependency -- only standard library
(re, json, uuid) and openai types.
Usage:
from environments.tool_call_parsers import get_parser
parser = get_parser("hermes")
content, tool_calls = parser.parse(raw_model_output)
# content = text with tool call markup stripped
# tool_calls = list of ChatCompletionMessageToolCall objects, or None
"""
import logging
from abc import ABC, abstractmethod
from typing import Dict, List, Optional, Tuple, Type
from openai.types.chat.chat_completion_message_tool_call import (
ChatCompletionMessageToolCall,
)
logger = logging.getLogger(__name__)
# Type alias for parser return value
ParseResult = Tuple[Optional[str], Optional[List[ChatCompletionMessageToolCall]]]
class ToolCallParser(ABC):
"""
Base class for tool call parsers.
Each parser knows how to extract structured tool_calls from a specific
model family's raw output text format.
"""
@abstractmethod
def parse(self, text: str) -> ParseResult:
"""
Parse raw model output text for tool calls.
Args:
text: Raw decoded text from the model's completion
Returns:
Tuple of (content, tool_calls) where:
- content: text with tool call markup stripped (the message 'content' field),
or None if the entire output was tool calls
- tool_calls: list of ChatCompletionMessageToolCall objects,
or None if no tool calls were found
"""
raise NotImplementedError
# Global parser registry: name -> parser class
PARSER_REGISTRY: Dict[str, Type[ToolCallParser]] = {}
def register_parser(name: str):
"""
Decorator to register a parser class under a given name.
Usage:
@register_parser("hermes")
class HermesToolCallParser(ToolCallParser):
...
"""
def decorator(cls: Type[ToolCallParser]) -> Type[ToolCallParser]:
PARSER_REGISTRY[name] = cls
return cls
return decorator
def get_parser(name: str) -> ToolCallParser:
"""
Get a parser instance by name.
Args:
name: Parser name (e.g., "hermes", "mistral", "llama3_json")
Returns:
Instantiated parser
Raises:
KeyError: If parser name is not found in registry
"""
if name not in PARSER_REGISTRY:
available = sorted(PARSER_REGISTRY.keys())
raise KeyError(
f"Tool call parser '{name}' not found. Available parsers: {available}"
)
return PARSER_REGISTRY[name]()
def list_parsers() -> List[str]:
"""Return sorted list of registered parser names."""
return sorted(PARSER_REGISTRY.keys())
# Import all parser modules to trigger registration via @register_parser decorators
# Each module registers itself when imported
from environments.tool_call_parsers.hermes_parser import HermesToolCallParser # noqa: E402, F401
from environments.tool_call_parsers.longcat_parser import LongcatToolCallParser # noqa: E402, F401
from environments.tool_call_parsers.mistral_parser import MistralToolCallParser # noqa: E402, F401
from environments.tool_call_parsers.llama_parser import LlamaToolCallParser # noqa: E402, F401
from environments.tool_call_parsers.qwen_parser import QwenToolCallParser # noqa: E402, F401
from environments.tool_call_parsers.deepseek_v3_parser import DeepSeekV3ToolCallParser # noqa: E402, F401
from environments.tool_call_parsers.deepseek_v3_1_parser import DeepSeekV31ToolCallParser # noqa: E402, F401
from environments.tool_call_parsers.kimi_k2_parser import KimiK2ToolCallParser # noqa: E402, F401
from environments.tool_call_parsers.glm45_parser import Glm45ToolCallParser # noqa: E402, F401
from environments.tool_call_parsers.glm47_parser import Glm47ToolCallParser # noqa: E402, F401
from environments.tool_call_parsers.qwen3_coder_parser import Qwen3CoderToolCallParser # noqa: E402, F401

View File

@@ -0,0 +1,71 @@
"""
DeepSeek V3.1 tool call parser.
Similar to V3 but with a slightly different format:
<tool▁call▁begin>function_name<tool▁sep>arguments<tool▁call▁end>
Note: V3 has type+name before the separator, V3.1 has name before and args after.
Based on VLLM's DeepSeekV31ToolParser.extract_tool_calls()
"""
import re
import uuid
from typing import List, Optional
from openai.types.chat.chat_completion_message_tool_call import (
ChatCompletionMessageToolCall,
Function,
)
from environments.tool_call_parsers import ParseResult, ToolCallParser, register_parser
@register_parser("deepseek_v3_1")
@register_parser("deepseek_v31")
class DeepSeekV31ToolCallParser(ToolCallParser):
"""
Parser for DeepSeek V3.1 tool calls.
Slightly different regex than V3: function_name comes before the separator,
arguments come after (no type field, no json code block wrapper).
"""
START_TOKEN = "<tool▁calls▁begin>"
# Regex captures: function_name, function_arguments
PATTERN = re.compile(
r"<tool▁call▁begin>(?P<function_name>.*?)<tool▁sep>(?P<function_arguments>.*?)<tool▁call▁end>"
)
def parse(self, text: str) -> ParseResult:
if self.START_TOKEN not in text:
return text, None
try:
matches = self.PATTERN.findall(text)
if not matches:
return text, None
tool_calls: List[ChatCompletionMessageToolCall] = []
for match in matches:
func_name, func_args = match
tool_calls.append(
ChatCompletionMessageToolCall(
id=f"call_{uuid.uuid4().hex[:8]}",
type="function",
function=Function(
name=func_name.strip(),
arguments=func_args.strip(),
),
)
)
if not tool_calls:
return text, None
content = text[: text.find(self.START_TOKEN)].strip()
return content if content else None, tool_calls
except Exception:
return text, None

View File

@@ -0,0 +1,75 @@
"""
DeepSeek V3 tool call parser.
Format uses special unicode tokens:
<tool▁calls▁begin>
<tool▁call▁begin>type<tool▁sep>function_name
```json
{"arg": "value"}
```
<tool▁call▁end>
<tool▁calls▁end>
Based on VLLM's DeepSeekV3ToolParser.extract_tool_calls()
"""
import re
import uuid
from typing import List, Optional
from openai.types.chat.chat_completion_message_tool_call import (
ChatCompletionMessageToolCall,
Function,
)
from environments.tool_call_parsers import ParseResult, ToolCallParser, register_parser
@register_parser("deepseek_v3")
class DeepSeekV3ToolCallParser(ToolCallParser):
"""
Parser for DeepSeek V3 tool calls.
Uses special unicode tokens with fullwidth angle brackets and block elements.
Extracts type, function name, and JSON arguments from the structured format.
"""
START_TOKEN = "<tool▁calls▁begin>"
# Regex captures: type, function_name, function_arguments
PATTERN = re.compile(
r"<tool▁call▁begin>(?P<type>.*)<tool▁sep>(?P<function_name>.*)\n```json\n(?P<function_arguments>.*)\n```<tool▁call▁end>"
)
def parse(self, text: str) -> ParseResult:
if self.START_TOKEN not in text:
return text, None
try:
matches = self.PATTERN.findall(text)
if not matches:
return text, None
tool_calls: List[ChatCompletionMessageToolCall] = []
for match in matches:
tc_type, func_name, func_args = match
tool_calls.append(
ChatCompletionMessageToolCall(
id=f"call_{uuid.uuid4().hex[:8]}",
type="function",
function=Function(
name=func_name.strip(),
arguments=func_args.strip(),
),
)
)
if not tool_calls:
return text, None
# Content is everything before the tool calls section
content = text[: text.find(self.START_TOKEN)].strip()
return content if content else None, tool_calls
except Exception:
return text, None

View File

@@ -0,0 +1,109 @@
"""
GLM 4.5 (GLM-4-MoE) tool call parser.
Format uses custom arg_key/arg_value tags rather than standard JSON:
<tool_call>function_name
<arg_key>param1</arg_key><arg_value>value1</arg_value>
<arg_key>param2</arg_key><arg_value>value2</arg_value>
</tool_call>
Values are deserialized using json.loads -> ast.literal_eval -> raw string fallback.
Based on VLLM's Glm4MoeModelToolParser.extract_tool_calls()
"""
import ast
import json
import re
import uuid
from typing import Any, Dict, List, Optional
from openai.types.chat.chat_completion_message_tool_call import (
ChatCompletionMessageToolCall,
Function,
)
from environments.tool_call_parsers import ParseResult, ToolCallParser, register_parser
def _deserialize_value(value: str) -> Any:
"""
Try to deserialize a string value to its native Python type.
Attempts json.loads, then ast.literal_eval, then returns raw string.
"""
try:
return json.loads(value)
except (json.JSONDecodeError, TypeError):
pass
try:
return ast.literal_eval(value)
except (ValueError, SyntaxError, TypeError):
pass
return value
@register_parser("glm45")
class Glm45ToolCallParser(ToolCallParser):
"""
Parser for GLM 4.5 (GLM-4-MoE) tool calls.
Uses <tool_call>...</tool_call> tags with <arg_key>/<arg_value> pairs
instead of standard JSON arguments.
"""
FUNC_CALL_REGEX = re.compile(r"<tool_call>.*?</tool_call>", re.DOTALL)
FUNC_DETAIL_REGEX = re.compile(r"<tool_call>([^\n]*)\n(.*)</tool_call>", re.DOTALL)
FUNC_ARG_REGEX = re.compile(
r"<arg_key>(.*?)</arg_key>\s*<arg_value>(.*?)</arg_value>", re.DOTALL
)
START_TOKEN = "<tool_call>"
def parse(self, text: str) -> ParseResult:
if self.START_TOKEN not in text:
return text, None
try:
matched_calls = self.FUNC_CALL_REGEX.findall(text)
if not matched_calls:
return text, None
tool_calls: List[ChatCompletionMessageToolCall] = []
for match in matched_calls:
detail = self.FUNC_DETAIL_REGEX.search(match)
if not detail:
continue
func_name = detail.group(1).strip()
func_args_raw = detail.group(2)
# Parse arg_key/arg_value pairs
pairs = self.FUNC_ARG_REGEX.findall(func_args_raw) if func_args_raw else []
arg_dict: Dict[str, Any] = {}
for key, value in pairs:
arg_key = key.strip()
arg_val = _deserialize_value(value.strip())
arg_dict[arg_key] = arg_val
tool_calls.append(
ChatCompletionMessageToolCall(
id=f"call_{uuid.uuid4().hex[:8]}",
type="function",
function=Function(
name=func_name,
arguments=json.dumps(arg_dict, ensure_ascii=False),
),
)
)
if not tool_calls:
return text, None
content = text[: text.find(self.START_TOKEN)].strip()
return content if content else None, tool_calls
except Exception:
return text, None

View File

@@ -0,0 +1,35 @@
"""
GLM 4.7 tool call parser.
Same as GLM 4.5 but with slightly different regex patterns.
The tool_call tags may wrap differently and arg parsing handles
newlines between key/value pairs.
Based on VLLM's Glm47MoeModelToolParser (extends Glm4MoeModelToolParser).
"""
import re
from environments.tool_call_parsers import ParseResult, register_parser
from environments.tool_call_parsers.glm45_parser import Glm45ToolCallParser
@register_parser("glm47")
class Glm47ToolCallParser(Glm45ToolCallParser):
"""
Parser for GLM 4.7 tool calls.
Extends GLM 4.5 with updated regex patterns.
"""
def __init__(self):
super().__init__()
# GLM 4.7 uses a slightly different detail regex that includes
# the <tool_call> wrapper and optional arg_key content
self.FUNC_DETAIL_REGEX = re.compile(
r"<tool_call>(.*?)(<arg_key>.*?)?</tool_call>", re.DOTALL
)
# GLM 4.7 handles newlines between arg_key and arg_value tags
self.FUNC_ARG_REGEX = re.compile(
r"<arg_key>(.*?)</arg_key>(?:\\n|\s)*<arg_value>(.*?)</arg_value>",
re.DOTALL,
)

View File

@@ -0,0 +1,73 @@
"""
Hermes tool call parser.
Format: <tool_call>{"name": "func", "arguments": {...}}</tool_call>
Based on VLLM's Hermes2ProToolParser.extract_tool_calls()
"""
import json
import re
import uuid
from typing import List, Optional, Tuple
from openai.types.chat.chat_completion_message_tool_call import (
ChatCompletionMessageToolCall,
Function,
)
from environments.tool_call_parsers import ParseResult, ToolCallParser, register_parser
@register_parser("hermes")
class HermesToolCallParser(ToolCallParser):
"""
Parser for Hermes-format tool calls.
Matches <tool_call>...</tool_call> tags containing JSON with "name" and "arguments".
Also handles unclosed <tool_call> at end-of-string (truncated generation).
"""
# Matches both closed and unclosed tool_call tags
PATTERN = re.compile(
r"<tool_call>\s*(.*?)\s*</tool_call>|<tool_call>\s*(.*)", re.DOTALL
)
def parse(self, text: str) -> ParseResult:
if "<tool_call>" not in text:
return text, None
try:
matches = self.PATTERN.findall(text)
if not matches:
return text, None
tool_calls: List[ChatCompletionMessageToolCall] = []
for match in matches:
# match is a tuple: (closed_content, unclosed_content)
raw_json = match[0] if match[0] else match[1]
if not raw_json.strip():
continue
tc_data = json.loads(raw_json)
tool_calls.append(
ChatCompletionMessageToolCall(
id=f"call_{uuid.uuid4().hex[:8]}",
type="function",
function=Function(
name=tc_data["name"],
arguments=json.dumps(
tc_data.get("arguments", {}), ensure_ascii=False
),
),
)
)
if not tool_calls:
return text, None
# Content is everything before the first <tool_call> tag
content = text[: text.find("<tool_call>")].strip()
return content if content else None, tool_calls
except Exception:
return text, None

View File

@@ -0,0 +1,93 @@
"""
Kimi K2 tool call parser.
Format:
<|tool_calls_section_begin|>
<|tool_call_begin|>function_id:0<|tool_call_argument_begin|>{"arg": "val"}<|tool_call_end|>
<|tool_calls_section_end|>
The function_id format is typically "functions.func_name:index" or "func_name:index".
Based on VLLM's KimiK2ToolParser.extract_tool_calls()
"""
import re
import uuid
from typing import List, Optional
from openai.types.chat.chat_completion_message_tool_call import (
ChatCompletionMessageToolCall,
Function,
)
from environments.tool_call_parsers import ParseResult, ToolCallParser, register_parser
@register_parser("kimi_k2")
class KimiK2ToolCallParser(ToolCallParser):
"""
Parser for Kimi K2 tool calls.
Uses section begin/end tokens wrapping individual tool call begin/end tokens.
The tool_call_id contains the function name (after last dot, before colon).
"""
# Support both singular and plural variants
START_TOKENS = [
"<|tool_calls_section_begin|>",
"<|tool_call_section_begin|>",
]
# Regex captures: tool_call_id (e.g., "functions.get_weather:0"), function_arguments
PATTERN = re.compile(
r"<\|tool_call_begin\|>\s*(?P<tool_call_id>[^<]+:\d+)\s*"
r"<\|tool_call_argument_begin\|>\s*"
r"(?P<function_arguments>(?:(?!<\|tool_call_begin\|>).)*?)\s*"
r"<\|tool_call_end\|>",
re.DOTALL,
)
def parse(self, text: str) -> ParseResult:
# Check for any variant of the start token
has_start = any(token in text for token in self.START_TOKENS)
if not has_start:
return text, None
try:
matches = self.PATTERN.findall(text)
if not matches:
return text, None
tool_calls: List[ChatCompletionMessageToolCall] = []
for match in matches:
function_id, function_args = match
# Extract function name from ID format: "functions.get_weather:0" -> "get_weather"
function_name = function_id.split(":")[0].split(".")[-1]
tool_calls.append(
ChatCompletionMessageToolCall(
id=function_id, # Preserve the original ID format
type="function",
function=Function(
name=function_name,
arguments=function_args.strip(),
),
)
)
if not tool_calls:
return text, None
# Content is everything before the tool calls section
earliest_start = len(text)
for token in self.START_TOKENS:
idx = text.find(token)
if idx >= 0 and idx < earliest_start:
earliest_start = idx
content = text[:earliest_start].strip()
return content if content else None, tool_calls
except Exception:
return text, None

View File

@@ -0,0 +1,96 @@
"""
Llama 3.x / 4 tool call parser.
Format: The model outputs JSON objects with "name" and "arguments" (or "parameters") keys.
May be preceded by <|python_tag|> token. Supports multiple JSON objects separated
by content or semicolons.
Based on VLLM's Llama3JsonToolParser.extract_tool_calls()
"""
import json
import re
import uuid
from typing import List, Optional
from openai.types.chat.chat_completion_message_tool_call import (
ChatCompletionMessageToolCall,
Function,
)
from environments.tool_call_parsers import ParseResult, ToolCallParser, register_parser
@register_parser("llama3_json")
@register_parser("llama4_json")
class LlamaToolCallParser(ToolCallParser):
"""
Parser for Llama 3.x and 4 JSON-format tool calls.
Finds JSON objects containing "name" + ("arguments" or "parameters") keys.
Uses Python's json.JSONDecoder.raw_decode for robust extraction of
JSON objects from mixed text.
"""
BOT_TOKEN = "<|python_tag|>"
# Regex to find the start of potential JSON objects
JSON_START = re.compile(r"\{")
def parse(self, text: str) -> ParseResult:
# Quick check: need either the bot token or a JSON brace
if self.BOT_TOKEN not in text and "{" not in text:
return text, None
try:
decoder = json.JSONDecoder()
tool_calls: List[ChatCompletionMessageToolCall] = []
end_index = -1 # Track where the last parsed JSON ended
for match in self.JSON_START.finditer(text):
start = match.start()
# Skip if this brace is inside a previously parsed JSON object
if start <= end_index:
continue
try:
obj, json_end = decoder.raw_decode(text[start:])
end_index = start + json_end
# Must have "name" and either "arguments" or "parameters"
name = obj.get("name")
args = obj.get("arguments", obj.get("parameters"))
if not name or args is None:
continue
# Normalize arguments to JSON string
if isinstance(args, dict):
args = json.dumps(args, ensure_ascii=False)
elif not isinstance(args, str):
args = json.dumps(args, ensure_ascii=False)
tool_calls.append(
ChatCompletionMessageToolCall(
id=f"call_{uuid.uuid4().hex[:8]}",
type="function",
function=Function(name=name, arguments=args),
)
)
except (json.JSONDecodeError, KeyError, ValueError):
continue
if not tool_calls:
return text, None
# Content is everything before the first tool call JSON
# Find where the first tool call starts in the text
first_tc_start = text.find("{")
if self.BOT_TOKEN in text:
first_tc_start = text.find(self.BOT_TOKEN)
content = text[:first_tc_start].strip() if first_tc_start > 0 else None
return content, tool_calls
except Exception:
return text, None

View File

@@ -0,0 +1,69 @@
"""
Longcat Flash Chat tool call parser.
Same as Hermes but uses <longcat_tool_call> tags instead of <tool_call>.
Based on VLLM's LongcatFlashToolParser (extends Hermes2ProToolParser).
"""
import json
import re
import uuid
from typing import List, Optional
from openai.types.chat.chat_completion_message_tool_call import (
ChatCompletionMessageToolCall,
Function,
)
from environments.tool_call_parsers import ParseResult, ToolCallParser, register_parser
@register_parser("longcat")
class LongcatToolCallParser(ToolCallParser):
"""
Parser for Longcat Flash Chat tool calls.
Identical logic to Hermes, just different tag names.
"""
PATTERN = re.compile(
r"<longcat_tool_call>\s*(.*?)\s*</longcat_tool_call>|<longcat_tool_call>\s*(.*)",
re.DOTALL,
)
def parse(self, text: str) -> ParseResult:
if "<longcat_tool_call>" not in text:
return text, None
try:
matches = self.PATTERN.findall(text)
if not matches:
return text, None
tool_calls: List[ChatCompletionMessageToolCall] = []
for match in matches:
raw_json = match[0] if match[0] else match[1]
if not raw_json.strip():
continue
tc_data = json.loads(raw_json)
tool_calls.append(
ChatCompletionMessageToolCall(
id=f"call_{uuid.uuid4().hex[:8]}",
type="function",
function=Function(
name=tc_data["name"],
arguments=json.dumps(
tc_data.get("arguments", {}), ensure_ascii=False
),
),
)
)
if not tool_calls:
return text, None
content = text[: text.find("<longcat_tool_call>")].strip()
return content if content else None, tool_calls
except Exception:
return text, None

View File

@@ -0,0 +1,130 @@
"""
Mistral tool call parser.
Supports two formats depending on tokenizer version:
- Pre-v11: content[TOOL_CALLS] [{"name": ..., "arguments": {...}}, ...]
- v11+: content[TOOL_CALLS]tool_name1{"arg": "val"}[TOOL_CALLS]tool_name2{"arg": "val"}
Based on VLLM's MistralToolParser.extract_tool_calls()
The [TOOL_CALLS] token is the bot_token used by Mistral models.
"""
import json
import re
import uuid
from typing import List, Optional
from openai.types.chat.chat_completion_message_tool_call import (
ChatCompletionMessageToolCall,
Function,
)
from environments.tool_call_parsers import ParseResult, ToolCallParser, register_parser
def _generate_mistral_id() -> str:
"""Mistral tool call IDs are 9-char alphanumeric strings."""
import random
import string
return "".join(random.choices(string.ascii_letters + string.digits, k=9))
@register_parser("mistral")
class MistralToolCallParser(ToolCallParser):
"""
Parser for Mistral-format tool calls.
Detects format by checking if the content after [TOOL_CALLS] starts with '['
(pre-v11 JSON array) or with a tool name (v11+ format).
"""
# The [TOOL_CALLS] token -- may appear as different strings depending on tokenizer
BOT_TOKEN = "[TOOL_CALLS]"
# Fallback regex for pre-v11 format when JSON parsing fails
TOOL_CALL_REGEX = re.compile(r"\[?\s*(\{.*?\})\s*\]?", re.DOTALL)
def parse(self, text: str) -> ParseResult:
if self.BOT_TOKEN not in text:
return text, None
try:
parts = text.split(self.BOT_TOKEN)
content = parts[0].strip()
raw_tool_calls = parts[1:]
# Detect format: if the first raw part starts with '[', it's pre-v11
first_raw = raw_tool_calls[0].strip() if raw_tool_calls else ""
is_pre_v11 = first_raw.startswith("[") or first_raw.startswith("{")
tool_calls: List[ChatCompletionMessageToolCall] = []
if not is_pre_v11:
# v11+ format: [TOOL_CALLS]tool_name{args}[TOOL_CALLS]tool_name2{args2}
for raw in raw_tool_calls:
raw = raw.strip()
if not raw or "{" not in raw:
continue
brace_idx = raw.find("{")
tool_name = raw[:brace_idx].strip()
args_str = raw[brace_idx:]
tool_calls.append(
ChatCompletionMessageToolCall(
id=_generate_mistral_id(),
type="function",
function=Function(name=tool_name, arguments=args_str),
)
)
else:
# Pre-v11 format: [TOOL_CALLS] [{"name": ..., "arguments": {...}}]
try:
parsed = json.loads(first_raw)
if isinstance(parsed, dict):
parsed = [parsed]
for tc in parsed:
args = tc.get("arguments", {})
if isinstance(args, dict):
args = json.dumps(args, ensure_ascii=False)
tool_calls.append(
ChatCompletionMessageToolCall(
id=_generate_mistral_id(),
type="function",
function=Function(
name=tc["name"], arguments=args
),
)
)
except json.JSONDecodeError:
# Fallback regex extraction
match = self.TOOL_CALL_REGEX.findall(first_raw)
if match:
for raw_json in match:
try:
tc = json.loads(raw_json)
args = tc.get("arguments", {})
if isinstance(args, dict):
args = json.dumps(args, ensure_ascii=False)
tool_calls.append(
ChatCompletionMessageToolCall(
id=_generate_mistral_id(),
type="function",
function=Function(
name=tc["name"], arguments=args
),
)
)
except (json.JSONDecodeError, KeyError):
continue
if not tool_calls:
return text, None
return content if content else None, tool_calls
except Exception:
return text, None

View File

@@ -0,0 +1,163 @@
"""
Qwen3-Coder tool call parser.
Format uses XML-style nested tags:
<tool_call>
<function=function_name>
<parameter=param_name>value</parameter>
<parameter=param_name2>value2</parameter>
</function>
</tool_call>
Parameters are extracted from <parameter=name>value</parameter> tags and
type-converted using the schema if available, otherwise treated as strings.
Based on VLLM's Qwen3CoderToolParser.extract_tool_calls()
"""
import ast
import json
import re
import uuid
from typing import Any, Dict, List, Optional
from openai.types.chat.chat_completion_message_tool_call import (
ChatCompletionMessageToolCall,
Function,
)
from environments.tool_call_parsers import ParseResult, ToolCallParser, register_parser
def _try_convert_value(value: str) -> Any:
"""
Try to convert a parameter value string to a native Python type.
Handles null, numbers, booleans, JSON objects/arrays, and falls back to string.
"""
stripped = value.strip()
# Handle null
if stripped.lower() == "null":
return None
# Try JSON first (handles objects, arrays, strings, numbers, booleans)
try:
return json.loads(stripped)
except (json.JSONDecodeError, TypeError):
pass
# Try Python literal eval (handles tuples, etc.)
try:
return ast.literal_eval(stripped)
except (ValueError, SyntaxError, TypeError):
pass
# Return as string
return stripped
@register_parser("qwen3_coder")
class Qwen3CoderToolCallParser(ToolCallParser):
"""
Parser for Qwen3-Coder XML-format tool calls.
Uses nested XML tags: <tool_call><function=name><parameter=key>val</parameter></function></tool_call>
"""
START_TOKEN = "<tool_call>"
FUNCTION_PREFIX = "<function="
# Find complete tool_call blocks (or unclosed at end)
TOOL_CALL_REGEX = re.compile(
r"<tool_call>(.*?)</tool_call>|<tool_call>(.*?)$", re.DOTALL
)
# Find function blocks within a tool_call
FUNCTION_REGEX = re.compile(
r"<function=(.*?)</function>|<function=(.*)$", re.DOTALL
)
# Find parameter blocks within a function
PARAMETER_REGEX = re.compile(
r"<parameter=(.*?)(?:</parameter>|(?=<parameter=)|(?=</function>)|$)",
re.DOTALL,
)
def _parse_function_call(self, function_str: str) -> Optional[ChatCompletionMessageToolCall]:
"""Parse a single <function=name>...</function> block into a ToolCall."""
try:
# Extract function name: everything before the first '>'
gt_idx = function_str.index(">")
func_name = function_str[:gt_idx].strip()
params_str = function_str[gt_idx + 1:]
# Extract parameters
param_dict: Dict[str, Any] = {}
for match_text in self.PARAMETER_REGEX.findall(params_str):
if ">" not in match_text:
continue
eq_idx = match_text.index(">")
param_name = match_text[:eq_idx].strip()
param_value = match_text[eq_idx + 1:]
# Clean up whitespace
if param_value.startswith("\n"):
param_value = param_value[1:]
if param_value.endswith("\n"):
param_value = param_value[:-1]
param_dict[param_name] = _try_convert_value(param_value)
return ChatCompletionMessageToolCall(
id=f"call_{uuid.uuid4().hex[:24]}",
type="function",
function=Function(
name=func_name,
arguments=json.dumps(param_dict, ensure_ascii=False),
),
)
except (ValueError, IndexError):
return None
def parse(self, text: str) -> ParseResult:
if self.FUNCTION_PREFIX not in text:
return text, None
try:
# Find all tool_call blocks
tc_matches = self.TOOL_CALL_REGEX.findall(text)
raw_blocks = [m[0] if m[0] else m[1] for m in tc_matches]
# Fallback: if no tool_call tags, try the whole text
if not raw_blocks:
raw_blocks = [text]
# Find function blocks within each tool_call
function_strs: List[str] = []
for block in raw_blocks:
func_matches = self.FUNCTION_REGEX.findall(block)
function_strs.extend(m[0] if m[0] else m[1] for m in func_matches)
if not function_strs:
return text, None
# Parse each function call
tool_calls: List[ChatCompletionMessageToolCall] = []
for func_str in function_strs:
tc = self._parse_function_call(func_str)
if tc is not None:
tool_calls.append(tc)
if not tool_calls:
return text, None
# Content before tool calls
first_tc = text.find(self.START_TOKEN)
if first_tc < 0:
first_tc = text.find(self.FUNCTION_PREFIX)
content = text[:first_tc].strip() if first_tc > 0 else None
return content, tool_calls
except Exception:
return text, None

View File

@@ -0,0 +1,19 @@
"""
Qwen 2.5 tool call parser.
Uses the same <tool_call> format as Hermes.
Registered as a separate parser name for clarity when using --tool-parser=qwen.
"""
from environments.tool_call_parsers import register_parser
from environments.tool_call_parsers.hermes_parser import HermesToolCallParser
@register_parser("qwen")
class QwenToolCallParser(HermesToolCallParser):
"""
Parser for Qwen 2.5 tool calls.
Same <tool_call>{"name": ..., "arguments": ...}</tool_call> format as Hermes.
"""
pass # Identical format -- inherits everything from Hermes

View File

@@ -0,0 +1,289 @@
"""
ToolContext -- Unrestricted Tool Access for Reward Functions
A per-rollout handle that gives reward/verification functions direct access to
ALL hermes-agent tools, scoped to the rollout's task_id. The same task_id means
the terminal/browser session is the SAME one the model used during its rollout --
all state (files, processes, browser tabs) is preserved.
The verifier author decides which tools to use. Nothing is hardcoded or gated.
Example usage in a compute_reward():
async def compute_reward(self, item, result, ctx):
# Run tests in the model's terminal sandbox
test = ctx.terminal("pytest -v")
if test["exit_code"] == 0:
return 1.0
# Check if a file was created
content = ctx.read_file("/workspace/solution.py")
if content.get("content"):
return 0.5
return 0.0
"""
import json
import logging
import os
from typing import Any, Dict, List, Optional
import asyncio
import concurrent.futures
from model_tools import handle_function_call
from tools.terminal_tool import cleanup_vm
from tools.browser_tool import cleanup_browser
logger = logging.getLogger(__name__)
# Thread pool for running sync tool calls that internally use asyncio.run()
_tool_executor = concurrent.futures.ThreadPoolExecutor(max_workers=4)
def _run_tool_in_thread(tool_name: str, arguments: Dict[str, Any], task_id: str) -> str:
"""
Run a tool call in a thread pool executor so backends that use asyncio.run()
internally (modal, docker) get a clean event loop.
If we're already in an async context, uses run_in_executor.
If not (e.g., called from sync code), runs directly.
"""
try:
loop = asyncio.get_running_loop()
# We're in an async context -- need to run in thread
import concurrent.futures
with concurrent.futures.ThreadPoolExecutor(max_workers=1) as pool:
future = pool.submit(
handle_function_call, tool_name, arguments, task_id
)
return future.result(timeout=300)
except RuntimeError:
# No running event loop -- safe to call directly
return handle_function_call(tool_name, arguments, task_id)
class ToolContext:
"""
Open-ended access to all hermes-agent tools for a specific rollout.
Passed to compute_reward() so verifiers can use any tool they need:
terminal commands, file reads/writes, web searches, browser automation, etc.
All calls share the rollout's task_id for session isolation.
"""
def __init__(self, task_id: str):
self.task_id = task_id
# -------------------------------------------------------------------------
# Terminal tools
# -------------------------------------------------------------------------
def terminal(self, command: str, timeout: int = 180) -> Dict[str, Any]:
"""
Run a command in the rollout's terminal session.
Args:
command: Shell command to execute
timeout: Command timeout in seconds
Returns:
Dict with 'exit_code' (int) and 'output' (str)
"""
import os
backend = os.getenv("TERMINAL_ENV", "local")
logger.debug("ToolContext.terminal [%s backend] task=%s: %s", backend, self.task_id[:8], command[:100])
# Run in thread pool so modal/docker backends' asyncio.run() doesn't deadlock
result = _run_tool_in_thread(
"terminal",
{"command": command, "timeout": timeout},
self.task_id,
)
try:
return json.loads(result)
except json.JSONDecodeError:
return {"exit_code": -1, "output": result}
# -------------------------------------------------------------------------
# File tools
# -------------------------------------------------------------------------
def read_file(self, path: str) -> Dict[str, Any]:
"""
Read a file from the rollout's filesystem.
Args:
path: File path to read
Returns:
Dict with file content or error
"""
result = handle_function_call(
"read_file", {"path": path}, task_id=self.task_id
)
try:
return json.loads(result)
except json.JSONDecodeError:
return {"error": result}
def write_file(self, path: str, content: str) -> Dict[str, Any]:
"""
Write a file in the rollout's filesystem.
Args:
path: File path to write
content: Content to write
Returns:
Dict with success status or error
"""
result = handle_function_call(
"write_file", {"path": path, "content": content}, task_id=self.task_id
)
try:
return json.loads(result)
except json.JSONDecodeError:
return {"error": result}
def search(self, query: str, path: str = ".") -> Dict[str, Any]:
"""
Search for text in the rollout's filesystem.
Args:
query: Search query
path: Directory to search in
Returns:
Dict with search results
"""
result = handle_function_call(
"search", {"query": query, "path": path}, task_id=self.task_id
)
try:
return json.loads(result)
except json.JSONDecodeError:
return {"error": result}
# -------------------------------------------------------------------------
# Web tools
# -------------------------------------------------------------------------
def web_search(self, query: str) -> Dict[str, Any]:
"""
Search the web.
Args:
query: Search query
Returns:
Dict with search results
"""
result = handle_function_call("web_search", {"query": query})
try:
return json.loads(result)
except json.JSONDecodeError:
return {"error": result}
def web_extract(self, urls: List[str]) -> Dict[str, Any]:
"""
Extract content from URLs.
Args:
urls: List of URLs to extract content from
Returns:
Dict with extracted content
"""
result = handle_function_call("web_extract", {"urls": urls})
try:
return json.loads(result)
except json.JSONDecodeError:
return {"error": result}
# -------------------------------------------------------------------------
# Browser tools
# -------------------------------------------------------------------------
def browser_navigate(self, url: str) -> Dict[str, Any]:
"""
Navigate the rollout's browser session to a URL.
Args:
url: URL to navigate to
Returns:
Dict with page snapshot or error
"""
result = handle_function_call(
"browser_navigate", {"url": url}, task_id=self.task_id
)
try:
return json.loads(result)
except json.JSONDecodeError:
return {"error": result}
def browser_snapshot(self) -> Dict[str, Any]:
"""
Take a snapshot of the current browser page.
Returns:
Dict with page content/accessibility snapshot
"""
result = handle_function_call(
"browser_snapshot", {}, task_id=self.task_id
)
try:
return json.loads(result)
except json.JSONDecodeError:
return {"error": result}
# -------------------------------------------------------------------------
# Generic tool access
# -------------------------------------------------------------------------
def call_tool(self, tool_name: str, arguments: Dict[str, Any]) -> str:
"""
Call any hermes-agent tool by name.
This is the generic escape hatch -- if a tool doesn't have a convenience
wrapper above, you can call it directly here.
Args:
tool_name: Name of the tool (e.g., "vision_analyze", "skills_list")
arguments: Dict of arguments for the tool
Returns:
Raw JSON string result from the tool
"""
return _run_tool_in_thread(tool_name, arguments, self.task_id)
# -------------------------------------------------------------------------
# Cleanup
# -------------------------------------------------------------------------
def cleanup(self):
"""
Release all resources (terminal VMs, browser sessions) for this rollout.
Called automatically by the base environment via try/finally after
compute_reward() completes. You generally don't need to call this yourself.
"""
try:
cleanup_vm(self.task_id)
except Exception as e:
logger.debug("VM cleanup for task %s: %s", self.task_id, e)
# Suppress browser_tool's noisy debug prints during cleanup.
# The cleanup still runs (safe), it just doesn't spam the console.
_prev_quiet = os.environ.get("HERMES_QUIET")
os.environ["HERMES_QUIET"] = "1"
try:
cleanup_browser(self.task_id)
except Exception as e:
logger.debug("Browser cleanup for task %s: %s", self.task_id, e)
finally:
if _prev_quiet is None:
os.environ.pop("HERMES_QUIET", None)
else:
os.environ["HERMES_QUIET"] = _prev_quiet

View File

@@ -1,70 +0,0 @@
---
name: example-skill
description: An example skill demonstrating the skill file format and structure
---
# Example Skill
This is an example skill file that demonstrates how to create skills for the Hermes Agent.
## Skill File Format
Skills are markdown files with YAML frontmatter at the top:
```yaml
---
name: your-skill-name
description: A brief one-line description of what this skill does
---
```
The frontmatter fields:
- **name**: The identifier used to reference this skill (lowercase, hyphens for spaces)
- **description**: A brief description shown when listing skills (keep under 200 chars)
## Writing Effective Skills
### 1. Be Specific and Actionable
Good skills provide clear, actionable instructions:
```
When reviewing code:
1. Check for security vulnerabilities first
2. Verify error handling is comprehensive
3. Ensure tests cover edge cases
```
### 2. Include Examples
Show concrete examples of what you want:
```python
# Good: Descriptive variable names
user_authentication_token = get_token()
# Bad: Cryptic abbreviations
uat = gt()
```
### 3. Define When to Use
Help the agent understand when this skill applies:
> Use this skill when: reviewing pull requests, auditing security, or checking code quality.
## Skill Categories
Consider organizing skills by purpose:
- **Conventions**: Coding standards, API patterns, naming rules
- **Workflows**: Step-by-step processes for deployments, reviews, releases
- **Knowledge**: Domain-specific information, system architecture, gotchas
- **Templates**: Boilerplate for common tasks, response formats
## Tips
1. Keep the description concise - it's shown in the skills list
2. Use headers to organize longer skills
3. Include code examples where helpful
4. Reference other skills if they're related

View File

@@ -1,868 +0,0 @@
Metadata-Version: 2.4
Name: hermes-agent
Version: 0.1.0
Summary: AI agent with advanced tool-calling and toolsets
Author: Nous Research
License: MIT
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: openai
Requires-Dist: python-dotenv
Requires-Dist: fire
Requires-Dist: httpx
Requires-Dist: rich
Requires-Dist: tenacity
Requires-Dist: pyyaml
Requires-Dist: requests
Requires-Dist: jinja2
Requires-Dist: pydantic>=2.0
Requires-Dist: firecrawl-py
Requires-Dist: fal-client
Requires-Dist: litellm>=1.75.5
Requires-Dist: typer
Requires-Dist: platformdirs
Provides-Extra: modal
Requires-Dist: modal; extra == "modal"
Requires-Dist: boto3; extra == "modal"
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-asyncio; extra == "dev"
Provides-Extra: messaging
Requires-Dist: python-telegram-bot>=20.0; extra == "messaging"
Requires-Dist: discord.py>=2.0; extra == "messaging"
Provides-Extra: cron
Requires-Dist: croniter; extra == "cron"
Provides-Extra: all
Requires-Dist: croniter; extra == "all"
Requires-Dist: python-telegram-bot>=20.0; extra == "all"
Requires-Dist: discord.py>=2.0; extra == "all"
# Hermes Agent
An AI agent with advanced tool-calling capabilities, featuring a flexible toolsets system for organizing and managing tools.
## Features
- **Interactive CLI**: Beautiful terminal interface with animated feedback, personalities, and session management
- **Messaging Gateway**: Connect to Telegram, Discord, and WhatsApp for conversational AI anywhere
- **Web Tools**: Search, extract content, and crawl websites
- **Terminal Tools**: Execute commands via local, Docker, Singularity, Modal, or SSH backends
- **Browser Tools**: Automate web browsers to navigate, click, type, and extract content
- **Vision Tools**: Analyze images from URLs
- **Reasoning Tools**: Advanced multi-model reasoning (Mixture of Agents)
- **Creative Tools**: Generate images from text prompts
- **Skills Tools**: On-demand knowledge documents with progressive disclosure
- **Toolsets System**: Organize tools into logical groups for different scenarios
- **Scheduled Tasks**: Cron jobs for automated agent tasks with delivery to platforms
- **Context Compression**: Automatic summarization when approaching context limits
- **Batch Processing**: Process datasets in parallel with checkpointing and statistics tracking
- **Ephemeral System Prompts**: Guide model behavior without polluting training datasets
## Installation
### Quick Install (Recommended)
**Linux/macOS:**
```bash
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
```
**Windows (PowerShell):**
```powershell
irm https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.ps1 | iex
```
This installer will:
- Clone the repository to `~/.hermes-agent`
- Create a virtual environment and install dependencies
- Set up the `hermes` command in your PATH
- Run an interactive setup wizard to configure API keys
### Manual Installation
If you prefer to install manually:
```bash
# Clone with submodules
git clone --recurse-submodules https://github.com/NousResearch/Hermes-Agent.git
cd Hermes-Agent
# Run the setup script
./setup-hermes.sh
```
Or step-by-step:
```bash
# Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install in editable mode with all extras
pip install -e ".[all]"
# Or install dependencies manually
pip install -r requirements.txt
pip install -e ./mini-swe-agent
# Copy and configure environment
cp .env.example .env
# Edit .env with your API keys
# Run the setup wizard
hermes setup
```
## Quick Start
Once installed, the `hermes` command is your main entry point:
```bash
hermes # Interactive chat (default)
hermes chat # Same as above
hermes chat -q "Hello" # Single query, then exit
hermes setup # Configure API keys and settings
hermes status # Show configuration status
hermes doctor # Diagnose issues
hermes gateway # Start messaging gateway (Telegram/Discord/WhatsApp)
hermes cron daemon # Run cron job scheduler
hermes version # Show version info
```
**Legacy `./hermes` script:**
```bash
# The old CLI script still works:
./hermes
# Or with options:
./hermes --model "anthropic/claude-sonnet-4" --toolsets "web,terminal"
```
The CLI provides:
- Animated spinners during thinking and tool execution
- Kawaii-style feedback messages
- `/commands` for configuration, history, and session management
- Customizable personalities (`/personality kawaii`, `/personality pirate`, etc.)
- Persistent configuration via `cli-config.yaml`
## Configuration
### Environment Variables
```bash
# Copy the example environment file
cp .env.example .env
# Edit .env and add your API keys
nano .env # or use your preferred editor
```
**Required API Keys:**
- `OPENROUTER_API_KEY` - LLM access via OpenRouter (get at: https://openrouter.ai/keys)
- `FIRECRAWL_API_KEY` - Web tools (get at: https://firecrawl.dev/)
- `NOUS_API_KEY` - Vision & reasoning tools (get at: https://inference-api.nousresearch.com/)
- `FAL_KEY` - Image generation (get at: https://fal.ai/)
**Optional API Keys (for specific features):**
- `BROWSERBASE_API_KEY` - Browser automation (get at: https://browserbase.com/)
- `BROWSERBASE_PROJECT_ID` - From Browserbase dashboard
- `MORPH_API_KEY` - For legacy Hecate terminal backend (get at: https://morph.so/)
### 4. Configure Terminal Backend
The terminal tool uses **mini-swe-agent** environments. Configure in `.env` or `cli-config.yaml`:
```bash
# Backend: "local", "docker", "singularity", "modal", or "ssh"
TERMINAL_ENV=local # Default: runs on host machine (no isolation)
TERMINAL_ENV=ssh # Remote execution via SSH (agent code stays local)
TERMINAL_ENV=singularity # Recommended for HPC: Apptainer/Singularity containers
TERMINAL_ENV=docker # Isolated Docker containers
TERMINAL_ENV=modal # Cloud execution via Modal
# Container image (for docker/singularity/modal backends)
TERMINAL_DOCKER_IMAGE=python:3.11-slim
TERMINAL_SINGULARITY_IMAGE=docker://python:3.11-slim
TERMINAL_TIMEOUT=60
# SSH backend (for ssh)
TERMINAL_SSH_HOST=my-server.example.com
TERMINAL_SSH_USER=myuser
TERMINAL_SSH_KEY=~/.ssh/id_rsa # Optional, uses ssh-agent if not set
```
**Backend Requirements:**
- **local**: No extra setup (runs directly on your machine, no isolation)
- **ssh**: SSH access to remote machine (great for sandboxing - agent can't touch its own code)
- **singularity**: Requires Apptainer or Singularity installed (common on HPC clusters, no root needed)
- **docker**: Requires Docker installed and user in `docker` group
- **modal**: Requires Modal account (see setup below)
### Singularity/Apptainer Setup (Recommended for HPC)
Singularity/Apptainer provides rootless container execution, ideal for HPC clusters:
```bash
# 1. Verify Apptainer is installed
apptainer --version # or: singularity --version
# 2. Set up cache directories (important for parallel workers)
# Use /scratch if available (HPC), otherwise /tmp
export APPTAINER_CACHEDIR=/scratch/$USER/.apptainer
export APPTAINER_TMPDIR=/scratch/$USER/.apptainer/tmp
mkdir -p "$APPTAINER_CACHEDIR" "$APPTAINER_TMPDIR"
# 3. Pre-build SIF image (recommended for parallel batch processing)
# This avoids race conditions when multiple workers start simultaneously
apptainer build $APPTAINER_CACHEDIR/python-nodejs.sif docker://nikolaik/python-nodejs:python3.11-nodejs20
# 4. Configure .env to use the local SIF
TERMINAL_ENV=singularity
TERMINAL_SINGULARITY_IMAGE=/scratch/$USER/.apptainer/python-nodejs.sif
```
**Tip:** The batch scripts in `configs/` automatically handle SIF pre-building if `/scratch` is available.
### Modal Cloud Backend Setup
[Modal](https://modal.com) provides serverless cloud compute for running sandboxed environments at scale.
```bash
# 1. Install Modal and dependencies
pip install modal boto3
# 2. Authenticate with Modal (opens browser)
modal setup
# 3. Set terminal backend to modal in .env
TERMINAL_ENV=modal
```
Modal uses CLI-based authentication (stored in `~/.modal/`), so no API key is needed in `.env`. After running `modal setup`, commands will automatically execute in Modal's cloud sandboxes.
### Browser Tools Setup
Browser tools enable the agent to navigate websites, fill forms, click buttons, and extract content. They use [agent-browser](https://github.com/vercel-labs/agent-browser) CLI with [Browserbase](https://browserbase.com) cloud execution.
```bash
# 1. Install Node.js (if not already installed)
# Use nvm (recommended) or your package manager
# 2. Install agent-browser CLI (choose one option):
npm install -g agent-browser # Option A: Global install (recommended)
npm install # Option B: Local install (uses npx fallback)
# 3. Get Browserbase credentials
# Sign up at https://browserbase.com/ and get your:
# - API Key (from Settings → API Keys)
# - Project ID (from your project dashboard)
# 4. Add to your .env file:
BROWSERBASE_API_KEY=your_api_key_here
BROWSERBASE_PROJECT_ID=your_project_id_here
```
**Available Browser Tools:**
| Tool | Description |
|------|-------------|
| `browser_navigate` | Navigate to a URL |
| `browser_snapshot` | Get text-based page snapshot with element refs |
| `browser_click` | Click an element by ref (e.g., `@e5`) |
| `browser_type` | Type text into an input field |
| `browser_scroll` | Scroll up or down |
| `browser_back` | Go back in browser history |
| `browser_press` | Press a keyboard key (Enter, Tab, etc.) |
| `browser_close` | Close the browser session |
| `browser_get_images` | Get list of images on the page |
**Example Usage:**
```bash
# Use browser tools with web search and vision
python run_agent.py \
--query "Go to amazon.com and find the price of the latest Kindle" \
--enabled_toolsets=browser,web,vision
# Use browser-focused distribution
python batch_runner.py \
--dataset_file=browser_tasks.jsonl \
--distribution=browser_use \
--run_name=browser_run
```
See `.env.example` for all available configuration options including debug settings.
### Skills Tools
Skills are on-demand knowledge documents the agent can load when needed. They follow a **progressive disclosure** pattern to minimize token usage:
```
skills/
├── mlops/ # Category folder
│ ├── axolotl/ # Skill folder
│ │ ├── SKILL.md # Main instructions (required)
│ │ ├── references/ # Additional docs, API specs
│ │ └── templates/ # Output formats, configs
│ └── vllm/
│ └── SKILL.md
```
**Available Skills Tools:**
| Tool | Description |
|------|-------------|
| `skills_categories` | List available skill categories (~50 tokens) |
| `skills_list` | List skills with name + description (~3k tokens for 40 skills) |
| `skill_view` | Load full skill content, tags, and linked files |
**Example Usage:**
```bash
# Use skills tools
python run_agent.py \
--query "What skills do you have for fine-tuning? Show me the axolotl skill." \
--enabled_toolsets=skills
```
**Creating Skills:**
Skills use YAML frontmatter for metadata:
```yaml
---
name: my-skill
description: Brief description shown in skills_list
tags: [tag1, tag2]
related_skills: [other-skill]
version: 1.0.0
---
# Skill Content
Instructions, examples, and guidelines here...
```
Skills can include:
- `references/` - Additional documentation, API specs, examples
- `templates/` - Output formats, config files, boilerplate code
- `scripts/` - Executable helpers (Python, shell scripts)
## Session Logging
Every conversation is automatically logged to `logs/` for debugging and inspection:
```
logs/
├── session_20260201_143052_a1b2c3.json
├── session_20260201_150217_d4e5f6.json
└── ...
```
**Log Format:**
```json
{
"session_id": "20260201_143052_a1b2c3",
"model": "anthropic/claude-sonnet-4",
"session_start": "2026-02-01T14:30:52.123456",
"last_updated": "2026-02-01T14:35:12.789012",
"message_count": 8,
"conversations": [
{"from": "system", "value": "..."},
{"from": "human", "value": "..."},
{"from": "gpt", "value": "..."},
{"from": "tool", "value": "..."}
]
}
```
- **Automatic**: Logs are created and updated automatically after each conversation turn
- **Session ID in Banner**: The CLI displays the session ID in the welcome banner
- **Trajectory Format**: Uses the same format as batch processing for consistency
- **Git Ignored**: `logs/` is in `.gitignore` so logs aren't committed
## Context Compression
Long conversations can exceed the model's context limit. Hermes Agent automatically compresses context when approaching the limit:
**How it works:**
1. Tracks actual token usage from API responses (`usage.prompt_tokens`)
2. When tokens reach 85% of model's context limit, triggers compression
3. Protects first 3 turns (system prompt, initial request, first response)
4. Protects last 4 turns (recent context is most relevant)
5. Summarizes middle turns using a fast/cheap model (Gemini Flash)
6. Inserts summary as a user message, conversation continues seamlessly
**Configuration (`cli-config.yaml`):**
```yaml
compression:
enabled: true # Enable auto-compression (default)
threshold: 0.85 # Compress at 85% of context limit
summary_model: "google/gemini-2.0-flash-001"
```
**Or via environment variables:**
```bash
CONTEXT_COMPRESSION_ENABLED=true
CONTEXT_COMPRESSION_THRESHOLD=0.85
CONTEXT_COMPRESSION_MODEL=google/gemini-2.0-flash-001
```
**When compression triggers, you'll see:**
```
📦 Context compression triggered (170,000 tokens ≥ 170,000 threshold)
📊 Model context limit: 200,000 tokens (85% = 170,000)
🗜️ Summarizing turns 4-15 (12 turns)
✅ Compressed: 20 → 9 messages (~45,000 tokens saved)
```
## Scheduled Tasks (Cron Jobs)
Hermes Agent can schedule automated tasks to run in the future - either one-time reminders or recurring jobs.
### CLI Commands
```bash
# List scheduled jobs
/cron
# Add a one-shot reminder (runs once in 30 minutes)
/cron add 30m Remind me to check the build status
# Add a recurring job (every 2 hours)
/cron add "every 2h" Check server status at 192.168.1.100 and report any issues
# Add a cron expression (daily at 9am)
/cron add "0 9 * * *" Generate a morning briefing summarizing GitHub notifications
# Remove a job
/cron remove abc123def456
```
### Agent Self-Scheduling
The agent can also schedule its own follow-up tasks using tools:
```python
# Available when using hermes-cli toolset (default for CLI)
schedule_cronjob(prompt="...", schedule="30m", repeat=1) # One-shot
schedule_cronjob(prompt="...", schedule="every 2h") # Recurring
list_cronjobs() # View all jobs
remove_cronjob(job_id="...") # Cancel a job
```
**⚠️ Important:** Cronjobs run in **isolated sessions with NO prior context**. The prompt must be completely self-contained with all necessary information (file paths, URLs, server addresses, etc.). The future agent will not remember anything from the current conversation.
### Schedule Formats
| Format | Example | Description |
|--------|---------|-------------|
| Duration | `30m`, `2h`, `1d` | One-shot delay from now |
| Interval | `every 30m`, `every 2h` | Recurring at fixed intervals |
| Cron | `0 9 * * *` | Cron expression (requires `croniter`) |
| Timestamp | `2026-02-03T14:00` | One-shot at specific time |
### Repeat Options
| repeat | Behavior |
|--------|----------|
| (omitted) | One-shot schedules run once; intervals/cron run forever |
| `1` | Run once then auto-delete |
| `N` | Run N times then auto-delete |
### Running the Cron Daemon
Jobs are stored in `~/.hermes/cron/jobs.json` and executed by a scheduler:
```bash
# Option 1: Built-in daemon (checks every 60 seconds)
python cli.py --cron-daemon
# Option 2: System cron integration (run once per minute)
# Add to crontab: crontab -e
*/1 * * * * cd ~/hermes-agent && python cli.py --cron-tick-once >> ~/.hermes/cron/cron.log 2>&1
```
### Job Output
Job outputs are saved to `~/.hermes/cron/output/{job_id}/{timestamp}.md` for review.
## Messaging Gateway (Telegram, Discord, WhatsApp)
Connect Hermes Agent to messaging platforms so you can chat from anywhere.
### Quick Start
```bash
# 1. Add your bot token to .env
echo 'TELEGRAM_BOT_TOKEN="your_token"' >> .env
# 2. Test the gateway (foreground)
./scripts/hermes-gateway run
# 3. Install as a background service
./scripts/hermes-gateway install
# 4. Manage the service
./scripts/hermes-gateway start # Start
./scripts/hermes-gateway stop # Stop
./scripts/hermes-gateway status # Check status
```
### Supported Platforms
| Platform | Setup | Toolset |
|----------|-------|---------|
| Telegram | Bot via @BotFather | `hermes-telegram` |
| Discord | Bot via Developer Portal | `hermes-discord` |
| WhatsApp | Node.js bridge | `hermes-whatsapp` |
### Session Management
- Sessions persist across messages (agent remembers context)
- Reset policies: daily (4am), idle (2 hours), or both
- Manual reset: send `/new` or `/reset`
### Cron Job Delivery
Schedule tasks that deliver to specific platforms:
```python
schedule_cronjob(
prompt="Check server status...",
schedule="every 1h",
deliver="telegram" # or "origin", "discord", etc.
)
```
### CLI Commands
| Command | Description |
|---------|-------------|
| `/platforms` | Show gateway configuration status |
| `--gateway` | Start the gateway (CLI flag) |
See [docs/messaging.md](docs/messaging.md) for full setup instructions.
## Interactive CLI
The CLI provides a rich interactive experience for working with the agent.
### Running the CLI
```bash
# Basic usage
./hermes
# With specific model
./hermes --model "anthropic/claude-sonnet-4"
# With specific toolsets
./hermes --toolsets "web,terminal,skills"
```
### CLI Commands
| Command | Description |
|---------|-------------|
| `/help` | Show available commands |
| `/tools` | List available tools by toolset |
| `/toolsets` | List available toolsets |
| `/model [name]` | Show or change the current model |
| `/prompt [text]` | View/set custom system prompt |
| `/personality [name]` | Set a predefined personality |
| `/clear` | Clear screen and reset conversation |
| `/reset` | Reset conversation only |
| `/history` | Show conversation history |
| `/save` | Save current conversation to file |
| `/config` | Show current configuration |
| `/cron` | Manage scheduled tasks (list, add, remove) |
| `/platforms` | Show gateway/messaging platform status |
| `/quit` | Exit the CLI |
### Configuration
Copy `cli-config.yaml.example` to `cli-config.yaml` and customize:
```yaml
# Model settings
model:
default: "anthropic/claude-sonnet-4"
# Terminal backend (local, docker, singularity, modal, or ssh)
terminal:
env_type: "local"
cwd: "." # Use current directory
# Or use SSH for remote execution (keeps agent code isolated)
# terminal:
# env_type: "ssh"
# ssh_host: "my-server.example.com"
# ssh_user: "myuser"
# ssh_key: "~/.ssh/id_rsa"
# cwd: "/home/myuser/project"
# Enable specific toolsets
toolsets:
- all # or: web, terminal, browser, vision, etc.
# Custom personalities (use with /personality command)
agent:
personalities:
helpful: "You are a helpful assistant."
kawaii: "You are a kawaii assistant! Use cute expressions..."
```
### Personalities
Built-in personalities available via `/personality`:
- `helpful`, `concise`, `technical`, `creative`, `teacher`
- `kawaii`, `catgirl`, `pirate`, `shakespeare`, `surfer`
- `noir`, `uwu`, `philosopher`, `hype`
## Toolsets System
The agent uses a toolsets system for organizing and managing tools. All tools must be part of a toolset to be accessible - individual tool selection is not supported. This ensures consistent and logical grouping of capabilities.
### Key Concepts
- **Toolsets**: Logical groups of tools for specific use cases (e.g., "research", "development", "debugging")
- **Composition**: Toolsets can include other toolsets for powerful combinations
- **Custom Toolsets**: Create your own toolsets at runtime or by editing `toolsets.py`
- **Toolset-Only Access**: Tools are only accessible through toolsets, not individually
### Available Toolsets
See `toolsets.py` for the complete list of predefined toolsets including:
- Basic toolsets (web, terminal, vision, creative, reasoning)
- Composite toolsets (research, development, analysis, etc.)
- Scenario-specific toolsets (debugging, documentation, API testing, etc.)
- Special toolsets (safe mode without terminal, minimal, offline)
### Using Toolsets
```bash
# Use a predefined toolset
python run_agent.py --enabled_toolsets=research --query "Find latest AI papers"
# Combine multiple toolsets
python run_agent.py --enabled_toolsets=web,vision --query "Analyze this website"
# Enable all toolsets explicitly (same as omitting the flag)
python run_agent.py --enabled_toolsets=all --query "Do web research and run commands if helpful"
# Safe mode (no terminal access)
python run_agent.py --enabled_toolsets=safe --query "Help without running commands"
# List all available toolsets and tools
python run_agent.py --list_tools
```
See `toolsets.py` for the complete list of available toolsets and how to create custom ones.
## Basic Usage
### Default (all tools enabled)
```bash
# Uses OpenRouter by default - just set OPENROUTER_API_KEY in .env
python run_agent.py \
--query "search up the latest docs on jit in python 3.13 and write me basic example that's not in their docs. profile its perf" \
--max_turns 20 \
--model anthropic/claude-sonnet-4-20250514
```
### With specific toolset
```bash
python run_agent.py \
--query "Debug this Python error" \
--enabled_toolsets=debugging \
--model anthropic/claude-sonnet-4-20250514
```
### Python API
```python
from run_agent import AIAgent
# Uses OpenRouter by default (reads OPENROUTER_API_KEY from .env)
agent = AIAgent(
model="anthropic/claude-sonnet-4-20250514",
enabled_toolsets=["research"]
)
response = agent.chat("Find information about quantum computing")
# Create custom toolset at runtime
from toolsets import create_custom_toolset
create_custom_toolset(
name="my_tools",
description="My custom toolkit",
tools=["web_search"],
includes=["terminal", "vision"]
)
agent = AIAgent(enabled_toolsets=["my_tools"])
```
## Batch Processing
Process multiple prompts from a dataset in parallel with automatic checkpointing and statistics tracking:
```bash
# Basic batch processing
python batch_runner.py \
--dataset_file=prompts.jsonl \
--batch_size=20 \
--run_name=my_run
# With specific distribution
python batch_runner.py \
--dataset_file=prompts.jsonl \
--batch_size=20 \
--run_name=image_run \
--distribution=image_gen \
--num_workers=4
```
**Key Features:**
- Parallel processing with configurable workers
- Toolset distributions for varied data generation
- Automatic checkpointing and resume capability
- Combined output in `data/<run_name>/trajectories.jsonl`
- Tool usage statistics and success rates
Use `--list_distributions` to see available toolset distributions for varied data generation.
### Trajectory Compression
Post-process trajectories to fit within token budgets for training:
```bash
# Compress a directory of JSONL files
python trajectory_compressor.py --input=data/my_run
# Compress a single JSONL file
python trajectory_compressor.py --input=data/trajectories.jsonl
# Compress a 15% sample (useful for creating smaller training sets)
python trajectory_compressor.py --input=data/trajectories.jsonl --sample_percent=15
# Custom output and token target
python trajectory_compressor.py \
--input=data/trajectories.jsonl \
--output=data/compressed.jsonl \
--target_max_tokens=16000
```
**Features:**
- Protects first turns (system, human, first GPT response, first tool call)
- Protects last N turns (configurable)
- Summarizes middle turns using LLM to fit target token budget
- Supports both directory and single file input
- Optional random sampling with `--sample_percent`
- Configurable via `configs/trajectory_compression.yaml`
### Ephemeral System Prompts
The ephemeral system prompt feature allows you to guide the model's behavior during batch processing **without** saving that prompt to the training dataset trajectories. This is useful for:
- Guiding model behavior during data collection
- Adding task-specific instructions
- Keeping saved trajectories clean and focused on tool-calling format
**Example:**
```bash
python batch_runner.py \
--dataset_file=prompts.jsonl \
--batch_size=10 \
--run_name=my_run \
--ephemeral_system_prompt="You are a helpful assistant focused on image generation."
```
The ephemeral prompt will influence the model's behavior during execution, but **only the standard tool-calling system prompt** will be saved in the trajectory files.
The ephemeral prompt influences model behavior during execution, but **only the standard tool-calling system prompt** is saved in trajectory files.
## Command Line Arguments
**Single Agent (`run_agent.py`):**
- `--query`: The question or task for the agent
- `--model`: Model to use (default: claude-opus-4-20250514)
- `--api_key`: API key for authentication
- `--base_url`: API endpoint URL
- `--max_turns`: Maximum number of tool-calling iterations
- `--enabled_toolsets`: Comma-separated list of toolsets to enable. Use `all` (or `*`) to enable everything. If omitted, all toolsets are enabled by default.
- `--disabled_toolsets`: Comma-separated list of toolsets to disable
- `--list_tools`: List all available toolsets and tools
- `--save_trajectories`: Save conversation trajectories to JSONL files
**Batch Processing (`batch_runner.py`):**
- `--dataset_file`: Path to JSONL file with prompts
- `--batch_size`: Number of prompts per batch
- `--run_name`: Name for this run (for output/checkpointing)
- `--distribution`: Toolset distribution to use (default: "default")
- `--num_workers`: Number of parallel workers (default: 4)
- `--resume`: Resume from checkpoint if interrupted
- `--ephemeral_system_prompt`: System prompt used during execution but NOT saved to trajectories
- `--list_distributions`: List available toolset distributions
## Environment Variables
All environment variables can be configured in the `.env` file (copy from `.env.example`).
**LLM Provider (OpenRouter):**
- `OPENROUTER_API_KEY`: Primary LLM access via OpenRouter (supports Claude, GPT-4, Gemini, etc.)
- `LLM_MODEL`: Default model (e.g., `anthropic/claude-sonnet-4`, `openai/gpt-4o`)
**Tool API Keys:**
- `FIRECRAWL_API_KEY`: Web tools (search, extract, crawl)
- `NOUS_API_KEY`: Vision and reasoning tools
- `FAL_KEY`: Image generation tools
**Terminal Tool Configuration (mini-swe-agent backend):**
- `TERMINAL_ENV`: Backend type - `local`, `docker`, `singularity`, `modal`, or `ssh` (default: `local`)
- `TERMINAL_DOCKER_IMAGE`: Docker image for docker backend (default: `python:3.11-slim`)
- `TERMINAL_SINGULARITY_IMAGE`: Singularity/Apptainer image (can be `docker://...` URL or local `.sif` path)
- `TERMINAL_TIMEOUT`: Command timeout in seconds (default: `60`)
- `TERMINAL_LIFETIME_SECONDS`: Cleanup inactive environments after this time (default: `300`)
- `TERMINAL_CWD`: Working directory inside containers (default: `/tmp`)
- `TERMINAL_SCRATCH_DIR`: Custom scratch directory for sandbox storage (optional, auto-detects `/scratch`)
- `SUDO_PASSWORD`: Enable sudo commands by piping password via `sudo -S` (works with all backends)
- If unset in CLI mode, you'll be prompted interactively when sudo is needed (45s timeout)
**SSH Backend Configuration (for remote execution):**
- `TERMINAL_SSH_HOST`: Remote server hostname or IP
- `TERMINAL_SSH_USER`: SSH username
- `TERMINAL_SSH_PORT`: SSH port (default: `22`)
- `TERMINAL_SSH_KEY`: Path to SSH private key (optional, uses ssh-agent if not set)
**Context Compression (auto-shrinks long conversations):**
- `CONTEXT_COMPRESSION_ENABLED`: Enable auto-compression (default: `true`)
- `CONTEXT_COMPRESSION_THRESHOLD`: Compress at this % of context limit (default: `0.85`)
- `CONTEXT_COMPRESSION_MODEL`: Model for generating summaries (default: `google/gemini-2.0-flash-001`)
**Browser Tool Configuration (agent-browser + Browserbase):**
- `BROWSERBASE_API_KEY`: Browserbase API key for cloud browser execution
- `BROWSERBASE_PROJECT_ID`: Browserbase project ID
- `BROWSER_SESSION_TIMEOUT`: Session timeout in seconds (default: `300`)
**Legacy Hecate Terminal Backend (optional):**
- `MORPH_API_KEY`: For Hecate/MorphCloud terminal backend
- `HECATE_VM_LIFETIME_SECONDS`: VM lifetime (default: 300)
- `HECATE_DEFAULT_SNAPSHOT_ID`: Default snapshot (default: snapshot_p5294qxt)
**Debug Options:**
- `WEB_TOOLS_DEBUG`, `VISION_TOOLS_DEBUG`, `MOA_TOOLS_DEBUG`, `IMAGE_TOOLS_DEBUG`: Enable debug logging
## Key Files
| File | Purpose |
|------|---------|
| `hermes` | CLI launcher script (run with `./hermes`) |
| `cli.py` | Interactive CLI implementation |
| `cli-config.yaml` | CLI configuration (copy from `.example`) |
| `run_agent.py` | Main agent runner - single query execution |
| `batch_runner.py` | Parallel batch processing with checkpointing |
| `model_tools.py` | Core tool definitions and handlers |
| `toolsets.py` | Toolset definitions and composition |
| `toolset_distributions.py` | Probability distributions for data generation |
| `trajectory_compressor.py` | Post-process trajectories for training |
| `tools/` | Individual tool implementations |
| `tools/skills_tool.py` | Skills system with progressive disclosure |
| `skills/` | On-demand knowledge documents |
| `docs/` | Documentation |
| `configs/` | Example batch run scripts |

View File

@@ -1,47 +0,0 @@
README.md
batch_runner.py
cli.py
model_tools.py
pyproject.toml
run_agent.py
toolset_distributions.py
toolsets.py
trajectory_compressor.py
cron/__init__.py
cron/jobs.py
cron/scheduler.py
gateway/__init__.py
gateway/config.py
gateway/delivery.py
gateway/run.py
gateway/session.py
hermes_agent.egg-info/PKG-INFO
hermes_agent.egg-info/SOURCES.txt
hermes_agent.egg-info/dependency_links.txt
hermes_agent.egg-info/entry_points.txt
hermes_agent.egg-info/requires.txt
hermes_agent.egg-info/top_level.txt
hermes_cli/__init__.py
hermes_cli/cron.py
hermes_cli/doctor.py
hermes_cli/gateway.py
hermes_cli/main.py
hermes_cli/setup.py
hermes_cli/status.py
tests/test_batch_runner.py
tests/test_checkpoint_resumption.py
tests/test_modal_terminal.py
tests/test_nous_api_limits.py
tests/test_nous_api_pattern.py
tests/test_temperature_fix.py
tests/test_web_tools.py
tools/__init__.py
tools/browser_tool.py
tools/cronjob_tools.py
tools/image_generation_tool.py
tools/mixture_of_agents_tool.py
tools/skills_tool.py
tools/terminal_hecate.py
tools/terminal_tool.py
tools/vision_tools.py
tools/web_tools.py

View File

@@ -1 +0,0 @@

View File

@@ -1,3 +0,0 @@
[console_scripts]
hermes = hermes_cli.main:main
hermes-agent = run_agent:main

View File

@@ -1,35 +0,0 @@
openai
python-dotenv
fire
httpx
rich
tenacity
pyyaml
requests
jinja2
pydantic>=2.0
firecrawl-py
fal-client
litellm>=1.75.5
typer
platformdirs
[all]
croniter
python-telegram-bot>=20.0
discord.py>=2.0
[cron]
croniter
[dev]
pytest
pytest-asyncio
[messaging]
python-telegram-bot>=20.0
discord.py>=2.0
[modal]
modal
boto3

View File

@@ -1,11 +0,0 @@
batch_runner
cli
cron
gateway
hermes_cli
model_tools
run_agent
tools
toolset_distributions
toolsets
trajectory_compressor

View File

@@ -58,8 +58,11 @@ def run_doctor(args):
print(color("◆ Python Environment", Colors.CYAN, Colors.BOLD))
py_version = sys.version_info
if py_version >= (3, 10):
if py_version >= (3, 11):
check_ok(f"Python {py_version.major}.{py_version.minor}.{py_version.micro}")
elif py_version >= (3, 10):
check_ok(f"Python {py_version.major}.{py_version.minor}.{py_version.micro}")
check_warn("Python 3.11+ recommended for RL Training tools (tinker requires >= 3.11)")
elif py_version >= (3, 8):
check_warn(f"Python {py_version.major}.{py_version.minor}.{py_version.micro}", "(3.10+ recommended)")
else:
@@ -100,7 +103,7 @@ def run_doctor(args):
check_ok(name)
except ImportError:
check_fail(name, "(missing)")
issues.append(f"Install {name}: pip install {module}")
issues.append(f"Install {name}: uv pip install {module}")
for module, name in optional_packages:
try:
@@ -263,6 +266,39 @@ def run_doctor(args):
except Exception as e:
check_warn("Anthropic API", f"({e})")
# =========================================================================
# Check: Submodules
# =========================================================================
print()
print(color("◆ Submodules", Colors.CYAN, Colors.BOLD))
# mini-swe-agent (terminal tool backend)
mini_swe_dir = PROJECT_ROOT / "mini-swe-agent"
if mini_swe_dir.exists() and (mini_swe_dir / "pyproject.toml").exists():
try:
__import__("minisweagent")
check_ok("mini-swe-agent", "(terminal backend)")
except ImportError:
check_warn("mini-swe-agent found but not installed", "(run: uv pip install -e ./mini-swe-agent)")
issues.append("Install mini-swe-agent: uv pip install -e ./mini-swe-agent")
else:
check_warn("mini-swe-agent not found", "(run: git submodule update --init --recursive)")
# tinker-atropos (RL training backend)
tinker_dir = PROJECT_ROOT / "tinker-atropos"
if tinker_dir.exists() and (tinker_dir / "pyproject.toml").exists():
if py_version >= (3, 11):
try:
__import__("tinker_atropos")
check_ok("tinker-atropos", "(RL training backend)")
except ImportError:
check_warn("tinker-atropos found but not installed", "(run: uv pip install -e ./tinker-atropos)")
issues.append("Install tinker-atropos: uv pip install -e ./tinker-atropos")
else:
check_warn("tinker-atropos requires Python 3.11+", f"(current: {py_version.major}.{py_version.minor})")
else:
check_warn("tinker-atropos not found", "(run: git submodule update --init --recursive)")
# =========================================================================
# Check: Tool Availability
# =========================================================================

View File

@@ -119,6 +119,7 @@ def cmd_uninstall(args):
def cmd_update(args):
"""Update Hermes Agent to the latest version."""
import subprocess
import shutil
print("🦋 Updating Hermes Agent...")
print()
@@ -163,13 +164,21 @@ def cmd_update(args):
print("→ Pulling updates...")
subprocess.run(["git", "pull", "origin", branch], cwd=PROJECT_ROOT, check=True)
# Reinstall Python dependencies
# Reinstall Python dependencies (prefer uv for speed, fall back to pip)
print("→ Updating Python dependencies...")
venv_pip = PROJECT_ROOT / "venv" / "bin" / "pip"
if venv_pip.exists():
subprocess.run([str(venv_pip), "install", "-e", ".", "--quiet"], cwd=PROJECT_ROOT, check=True)
uv_bin = shutil.which("uv")
if uv_bin:
subprocess.run(
[uv_bin, "pip", "install", "-e", ".", "--quiet"],
cwd=PROJECT_ROOT, check=True,
env={**os.environ, "VIRTUAL_ENV": str(PROJECT_ROOT / "venv")}
)
else:
subprocess.run(["pip", "install", "-e", ".", "--quiet"], cwd=PROJECT_ROOT, check=True)
venv_pip = PROJECT_ROOT / "venv" / "bin" / "pip"
if venv_pip.exists():
subprocess.run([str(venv_pip), "install", "-e", ".", "--quiet"], cwd=PROJECT_ROOT, check=True)
else:
subprocess.run(["pip", "install", "-e", ".", "--quiet"], cwd=PROJECT_ROOT, check=True)
# Check for Node.js deps
if (PROJECT_ROOT / "package.json").exists():

View File

@@ -652,6 +652,32 @@ def run_setup_wizard(args):
print_info("Modal Cloud Configuration:")
print_info("Get credentials at: https://modal.com/settings")
# Check if swe-rex[modal] is installed, install if missing
try:
from swerex.deployment.modal import ModalDeployment
print_info("swe-rex[modal] package: installed ✓")
except ImportError:
print_info("Installing required package: swe-rex[modal]...")
import subprocess
import shutil
# Prefer uv for speed, fall back to pip
uv_bin = shutil.which("uv")
if uv_bin:
result = subprocess.run(
[uv_bin, "pip", "install", "swe-rex[modal]>=1.4.0"],
capture_output=True, text=True
)
else:
result = subprocess.run(
[sys.executable, "-m", "pip", "install", "swe-rex[modal]>=1.4.0"],
capture_output=True, text=True
)
if result.returncode == 0:
print_success("swe-rex[modal] installed (includes modal + boto3)")
else:
print_warning("Failed to install swe-rex[modal] — install manually:")
print_info(' uv pip install "swe-rex[modal]>=1.4.0"')
# Always show current status and allow reconfiguration
current_token = get_env_value('MODAL_TOKEN_ID')
if current_token:
@@ -917,6 +943,24 @@ def run_setup_wizard(args):
save_env_value("BROWSERBASE_API_KEY", api_key)
if project_id:
save_env_value("BROWSERBASE_PROJECT_ID", project_id)
# Check if Node.js dependencies are installed (required for browser tools)
import shutil
node_modules = PROJECT_ROOT / "node_modules" / "agent-browser"
if not node_modules.exists() and shutil.which("npm"):
print_info(" Installing Node.js dependencies for browser tools...")
import subprocess
result = subprocess.run(
["npm", "install", "--silent"],
capture_output=True, text=True, cwd=str(PROJECT_ROOT)
)
if result.returncode == 0:
print_success(" Node.js dependencies installed")
else:
print_warning(" npm install failed — run manually: cd ~/.hermes/hermes-agent && npm install")
elif not node_modules.exists():
print_warning(" Node.js not found — browser tools require: npm install (in the hermes-agent directory)")
print_success(" Configured ✓")
print()
@@ -950,6 +994,11 @@ def run_setup_wizard(args):
tinker_configured = get_env_value('TINKER_API_KEY')
wandb_configured = get_env_value('WANDB_API_KEY')
# Check Python version requirement upfront
rl_python_ok = sys.version_info >= (3, 11)
if not rl_python_ok:
print_warning(f" Requires Python 3.11+ (current: {sys.version_info.major}.{sys.version_info.minor})")
if tinker_configured and wandb_configured:
print_success(" Status: Configured ✓")
if prompt_yes_no(" Update RL training credentials?", False):
@@ -969,18 +1018,55 @@ def run_setup_wizard(args):
print_warning(" Status: Not configured (tools will be disabled)")
if prompt_yes_no(" Set up RL Training?", False):
print_info(" Get Tinker key at: https://tinker-console.thinkingmachines.ai/keys")
print_info(" Get WandB key at: https://wandb.ai/authorize")
api_key = prompt(" Tinker API key", password=True)
if api_key:
save_env_value("TINKER_API_KEY", api_key)
wandb_key = prompt(" WandB API key", password=True)
if wandb_key:
save_env_value("WANDB_API_KEY", wandb_key)
if api_key and wandb_key:
print_success(" Configured ✓")
# Check Python version before proceeding
if not rl_python_ok:
print_error(f" Python 3.11+ required (current: {sys.version_info.major}.{sys.version_info.minor})")
print_info(" Upgrade Python and reinstall to enable RL training tools")
else:
print_warning(" Partially configured (both keys required)")
print_info(" Get Tinker key at: https://tinker-console.thinkingmachines.ai/keys")
print_info(" Get WandB key at: https://wandb.ai/authorize")
api_key = prompt(" Tinker API key", password=True)
if api_key:
save_env_value("TINKER_API_KEY", api_key)
wandb_key = prompt(" WandB API key", password=True)
if wandb_key:
save_env_value("WANDB_API_KEY", wandb_key)
# Check if tinker-atropos submodule is installed
try:
__import__("tinker_atropos")
except ImportError:
tinker_dir = PROJECT_ROOT / "tinker-atropos"
if tinker_dir.exists() and (tinker_dir / "pyproject.toml").exists():
print_info(" Installing tinker-atropos submodule...")
import subprocess
import shutil
# Prefer uv for speed, fall back to pip
uv_bin = shutil.which("uv")
if uv_bin:
result = subprocess.run(
[uv_bin, "pip", "install", "-e", str(tinker_dir)],
capture_output=True, text=True
)
else:
result = subprocess.run(
[sys.executable, "-m", "pip", "install", "-e", str(tinker_dir)],
capture_output=True, text=True
)
if result.returncode == 0:
print_success(" tinker-atropos installed")
else:
print_warning(" tinker-atropos install failed — run manually:")
print_info(' uv pip install -e "./tinker-atropos"')
else:
print_warning(" tinker-atropos submodule not found — run:")
print_info(" git submodule update --init --recursive")
print_info(' uv pip install -e "./tinker-atropos"')
if api_key and wandb_key:
print_success(" Configured ✓")
else:
print_warning(" Partially configured (both keys required)")
# =========================================================================
# Save config and show summary

View File

@@ -1191,8 +1191,19 @@ def handle_web_function_call(function_name: str, function_args: Dict[str, Any])
urls = function_args.get("urls", [])
# Limit URLs to prevent abuse
urls = urls[:5] if isinstance(urls, list) else []
# Run async function in event loop
return asyncio.run(web_extract_tool(urls, "markdown"))
# Run async function -- use existing loop if available (Atropos),
# otherwise create one (normal CLI)
try:
loop = asyncio.get_running_loop()
# Already in an async context (Atropos) -- run in a thread
import concurrent.futures
with concurrent.futures.ThreadPoolExecutor(max_workers=1) as pool:
return pool.submit(
lambda: asyncio.run(web_extract_tool(urls, "markdown"))
).result(timeout=120)
except RuntimeError:
# No running loop (normal CLI) -- use asyncio.run directly
return asyncio.run(web_extract_tool(urls, "markdown"))
else:
return json.dumps({"error": f"Unknown web function: {function_name}"}, ensure_ascii=False)

View File

@@ -22,6 +22,8 @@ dependencies = [
"requests",
"jinja2",
"pydantic>=2.0",
# Interactive CLI (prompt_toolkit is used directly by cli.py)
"prompt_toolkit",
# Tools
"firecrawl-py",
"fal-client",
@@ -32,12 +34,18 @@ dependencies = [
]
[project.optional-dependencies]
modal = ["modal", "boto3"]
modal = ["swe-rex[modal]>=1.4.0"]
dev = ["pytest", "pytest-asyncio"]
messaging = ["python-telegram-bot>=20.0", "discord.py>=2.0"]
messaging = ["python-telegram-bot>=20.0", "discord.py>=2.0", "aiohttp>=3.9.0"]
cron = ["croniter"]
cli = ["simple-term-menu"]
all = ["croniter", "python-telegram-bot>=20.0", "discord.py>=2.0", "simple-term-menu"]
all = [
"hermes-agent[modal]",
"hermes-agent[messaging]",
"hermes-agent[cron]",
"hermes-agent[cli]",
"hermes-agent[dev]",
]
[project.scripts]
hermes = "hermes_cli.main:main"

View File

@@ -6,6 +6,10 @@ httpx
rich
tenacity
prompt_toolkit
pyyaml
requests
jinja2
pydantic>=2.0
# Web tools
firecrawl-py
@@ -15,10 +19,6 @@ fal-client
# mini-swe-agent dependencies (for terminal tool)
# Note: Install mini-swe-agent itself with: pip install -e ./mini-swe-agent
pyyaml
requests
jinja2
pydantic>=2.0
litellm>=1.75.5
typer
platformdirs
@@ -27,18 +27,17 @@ platformdirs
# Requires Docker installed and user in 'docker' group
# Optional: For Modal backend (cloud execution)
# modal
# boto3
# swe-rex[modal]>=1.4.0 # Includes modal + boto3 + swe-rex runtime
# Optional: For cron expression parsing (cronjob scheduling)
croniter
# Optional: For messaging platform integrations (gateway)
# Telegram: pip install python-telegram-bot
# Telegram
python-telegram-bot>=20.0
# Discord: pip install discord.py
# Discord
discord.py>=2.0
# WhatsApp: Requires Node.js bridge (see docs/messaging.md)
# aiohttp # For WhatsApp bridge communication
# WhatsApp bridge communication + general async HTTP (used by gateway)
aiohttp>=3.9.0

View File

@@ -2,6 +2,7 @@
# Hermes Agent Installer for Windows
# ============================================================================
# Installation script for Windows (PowerShell).
# Uses uv for fast Python provisioning and package management.
#
# Usage:
# irm https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.ps1 | iex
@@ -27,6 +28,7 @@ $ErrorActionPreference = "Stop"
$RepoUrlSsh = "git@github.com:NousResearch/hermes-agent.git"
$RepoUrlHttps = "https://github.com/NousResearch/hermes-agent.git"
$PythonVersion = "3.11"
# ============================================================================
# Helper functions
@@ -52,12 +54,12 @@ function Write-Success {
Write-Host "$Message" -ForegroundColor Green
}
function Write-Warning {
function Write-Warn {
param([string]$Message)
Write-Host "$Message" -ForegroundColor Yellow
}
function Write-Error {
function Write-Err {
param([string]$Message)
Write-Host "$Message" -ForegroundColor Red
}
@@ -66,33 +68,93 @@ function Write-Error {
# Dependency checks
# ============================================================================
function Test-Python {
Write-Info "Checking Python..."
function Install-Uv {
Write-Info "Checking for uv package manager..."
# Try different python commands
$pythonCmds = @("python3", "python", "py -3")
# Check if uv is already available
if (Get-Command uv -ErrorAction SilentlyContinue) {
$version = uv --version
$script:UvCmd = "uv"
Write-Success "uv found ($version)"
return $true
}
foreach ($cmd in $pythonCmds) {
try {
$version = & $cmd.Split()[0] $cmd.Split()[1..99] -c "import sys; print(f'{sys.version_info.major}.{sys.version_info.minor}')" 2>$null
if ($version) {
$major, $minor = $version.Split('.')
if ([int]$major -ge 3 -and [int]$minor -ge 10) {
$script:PythonCmd = $cmd
Write-Success "Python $version found"
return $true
}
}
} catch {
# Try next command
# Check common install locations
$uvPaths = @(
"$env:USERPROFILE\.local\bin\uv.exe",
"$env:USERPROFILE\.cargo\bin\uv.exe"
)
foreach ($uvPath in $uvPaths) {
if (Test-Path $uvPath) {
$script:UvCmd = $uvPath
$version = & $uvPath --version
Write-Success "uv found at $uvPath ($version)"
return $true
}
}
Write-Error "Python 3.10+ not found"
Write-Info "Please install Python 3.10 or newer from:"
Write-Info " https://www.python.org/downloads/"
Write-Info ""
Write-Info "Make sure to check 'Add Python to PATH' during installation"
# Install uv
Write-Info "Installing uv (fast Python package manager)..."
try {
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex" 2>&1 | Out-Null
# Find the installed binary
$uvExe = "$env:USERPROFILE\.local\bin\uv.exe"
if (-not (Test-Path $uvExe)) {
$uvExe = "$env:USERPROFILE\.cargo\bin\uv.exe"
}
if (-not (Test-Path $uvExe)) {
# Refresh PATH and try again
$env:Path = [Environment]::GetEnvironmentVariable("Path", "User") + ";" + [Environment]::GetEnvironmentVariable("Path", "Machine")
if (Get-Command uv -ErrorAction SilentlyContinue) {
$uvExe = (Get-Command uv).Source
}
}
if (Test-Path $uvExe) {
$script:UvCmd = $uvExe
$version = & $uvExe --version
Write-Success "uv installed ($version)"
return $true
}
Write-Err "uv installed but not found on PATH"
Write-Info "Try restarting your terminal and re-running"
return $false
} catch {
Write-Err "Failed to install uv"
Write-Info "Install manually: https://docs.astral.sh/uv/getting-started/installation/"
return $false
}
}
function Test-Python {
Write-Info "Checking Python $PythonVersion..."
# Let uv find or install Python
try {
$pythonPath = & $UvCmd python find $PythonVersion 2>$null
if ($pythonPath) {
$ver = & $pythonPath --version 2>$null
Write-Success "Python found: $ver"
return $true
}
} catch { }
# Python not found — use uv to install it (no admin needed!)
Write-Info "Python $PythonVersion not found, installing via uv..."
try {
& $UvCmd python install $PythonVersion 2>&1 | Out-Null
$pythonPath = & $UvCmd python find $PythonVersion 2>$null
if ($pythonPath) {
$ver = & $pythonPath --version 2>$null
Write-Success "Python installed: $ver"
return $true
}
} catch { }
Write-Err "Failed to install Python $PythonVersion"
Write-Info "Install Python $PythonVersion manually, then re-run this script"
return $false
}
@@ -105,7 +167,7 @@ function Test-Git {
return $true
}
Write-Error "Git not found"
Write-Err "Git not found"
Write-Info "Please install Git from:"
Write-Info " https://git-scm.com/download/win"
return $false
@@ -121,7 +183,7 @@ function Test-Node {
return $true
}
Write-Warning "Node.js not found (browser tools will be limited)"
Write-Warn "Node.js not found (browser tools will be limited)"
Write-Info "To install Node.js (optional):"
Write-Info " https://nodejs.org/en/download/"
$script:HasNode = $false
@@ -138,7 +200,7 @@ function Test-Ripgrep {
return $true
}
Write-Warning "ripgrep not found (file search will use findstr fallback)"
Write-Warn "ripgrep not found (file search will use findstr fallback)"
# Check what package managers are available
$hasWinget = Get-Command winget -ErrorAction SilentlyContinue
@@ -185,7 +247,7 @@ function Test-Ripgrep {
} catch { }
}
Write-Warning "Auto-install failed. You can install manually:"
Write-Warn "Auto-install failed. You can install manually:"
} else {
Write-Info "Skipping ripgrep installation. To install manually:"
}
@@ -216,13 +278,12 @@ function Install-Repository {
git pull origin $Branch
Pop-Location
} else {
Write-Error "Directory exists but is not a git repository: $InstallDir"
Write-Err "Directory exists but is not a git repository: $InstallDir"
Write-Info "Remove it or choose a different directory with -InstallDir"
exit 1
}
} else {
# Try SSH first (for private repo access), fall back to HTTPS
# Use --recurse-submodules to also clone mini-swe-agent and tinker-atropos
Write-Info "Trying SSH clone..."
$sshResult = git clone --branch $Branch --recurse-submodules $RepoUrlSsh $InstallDir 2>&1
@@ -235,7 +296,7 @@ function Install-Repository {
if ($LASTEXITCODE -eq 0) {
Write-Success "Cloned via HTTPS"
} else {
Write-Error "Failed to clone repository"
Write-Err "Failed to clone repository"
Write-Info "For private repo access, ensure your SSH key is added to GitHub:"
Write-Info " ssh-add ~/.ssh/id_rsa"
Write-Info " ssh -T git@github.com # Test connection"
@@ -244,7 +305,7 @@ function Install-Repository {
}
}
# Ensure submodules are initialized and updated (for existing installs or if --recurse failed)
# Ensure submodules are initialized and updated
Write-Info "Initializing submodules (mini-swe-agent, tinker-atropos)..."
Push-Location $InstallDir
git submodule update --init --recursive
@@ -260,23 +321,21 @@ function Install-Venv {
return
}
Write-Info "Creating virtual environment..."
Write-Info "Creating virtual environment with Python $PythonVersion..."
Push-Location $InstallDir
if (-not (Test-Path "venv")) {
& $PythonCmd -m venv venv
if (Test-Path "venv") {
Write-Info "Virtual environment already exists, recreating..."
Remove-Item -Recurse -Force "venv"
}
# Activate
& .\venv\Scripts\Activate.ps1
# Upgrade pip
pip install --upgrade pip wheel setuptools | Out-Null
# uv creates the venv and pins the Python version in one step
& $UvCmd venv venv --python $PythonVersion
Pop-Location
Write-Success "Virtual environment ready"
Write-Success "Virtual environment ready (Python $PythonVersion)"
}
function Install-Dependencies {
@@ -285,14 +344,15 @@ function Install-Dependencies {
Push-Location $InstallDir
if (-not $NoVenv) {
& .\venv\Scripts\Activate.ps1
# Tell uv to install into our venv (no activation needed)
$env:VIRTUAL_ENV = "$InstallDir\venv"
}
# Install main package
# Install main package with all extras
try {
pip install -e ".[all]" 2>&1 | Out-Null
& $UvCmd pip install -e ".[all]" 2>&1 | Out-Null
} catch {
pip install -e "." | Out-Null
& $UvCmd pip install -e "." | Out-Null
}
Write-Success "Main package installed"
@@ -301,25 +361,25 @@ function Install-Dependencies {
Write-Info "Installing mini-swe-agent (terminal tool backend)..."
if (Test-Path "mini-swe-agent\pyproject.toml") {
try {
pip install -e ".\mini-swe-agent" 2>&1 | Out-Null
& $UvCmd pip install -e ".\mini-swe-agent" 2>&1 | Out-Null
Write-Success "mini-swe-agent installed"
} catch {
Write-Warning "mini-swe-agent install failed (terminal tools may not work)"
Write-Warn "mini-swe-agent install failed (terminal tools may not work)"
}
} else {
Write-Warning "mini-swe-agent not found (run: git submodule update --init)"
Write-Warn "mini-swe-agent not found (run: git submodule update --init)"
}
Write-Info "Installing tinker-atropos (RL training backend)..."
if (Test-Path "tinker-atropos\pyproject.toml") {
try {
pip install -e ".\tinker-atropos" 2>&1 | Out-Null
& $UvCmd pip install -e ".\tinker-atropos" 2>&1 | Out-Null
Write-Success "tinker-atropos installed"
} catch {
Write-Warning "tinker-atropos install failed (RL tools may not work)"
Write-Warn "tinker-atropos install failed (RL tools may not work)"
}
} else {
Write-Warning "tinker-atropos not found (run: git submodule update --init)"
Write-Warn "tinker-atropos not found (run: git submodule update --init)"
}
Pop-Location
@@ -328,41 +388,44 @@ function Install-Dependencies {
}
function Set-PathVariable {
Write-Info "Setting up PATH..."
Write-Info "Setting up hermes command..."
if ($NoVenv) {
$binDir = "$InstallDir"
$hermesBin = "$InstallDir"
} else {
$binDir = "$InstallDir\venv\Scripts"
$hermesBin = "$InstallDir\venv\Scripts"
}
# Add to user PATH
# Add the venv Scripts dir to user PATH so hermes is globally available
# On Windows, the hermes.exe in venv\Scripts\ has the venv Python baked in
$currentPath = [Environment]::GetEnvironmentVariable("Path", "User")
if ($currentPath -notlike "*$binDir*") {
if ($currentPath -notlike "*$hermesBin*") {
[Environment]::SetEnvironmentVariable(
"Path",
"$binDir;$currentPath",
"$hermesBin;$currentPath",
"User"
)
Write-Success "Added to user PATH"
Write-Success "Added to user PATH: $hermesBin"
} else {
Write-Info "PATH already configured"
}
# Update current session
$env:Path = "$binDir;$env:Path"
$env:Path = "$hermesBin;$env:Path"
Write-Success "hermes command ready"
}
function Copy-ConfigTemplates {
Write-Info "Setting up configuration files..."
# Create ~/.hermes directory structure (config at top level, code in subdir)
# Create ~/.hermes directory structure
New-Item -ItemType Directory -Force -Path "$HermesHome\cron" | Out-Null
New-Item -ItemType Directory -Force -Path "$HermesHome\sessions" | Out-Null
New-Item -ItemType Directory -Force -Path "$HermesHome\logs" | Out-Null
# Create .env at ~/.hermes/.env (top level, easy to find)
# Create .env
$envPath = "$HermesHome\.env"
if (-not (Test-Path $envPath)) {
$examplePath = "$InstallDir\.env.example"
@@ -370,7 +433,6 @@ function Copy-ConfigTemplates {
Copy-Item $examplePath $envPath
Write-Success "Created ~/.hermes/.env from template"
} else {
# Create empty .env if no example exists
New-Item -ItemType File -Force -Path $envPath | Out-Null
Write-Success "Created ~/.hermes/.env"
}
@@ -378,7 +440,7 @@ function Copy-ConfigTemplates {
Write-Info "~/.hermes/.env already exists, keeping it"
}
# Create config.yaml at ~/.hermes/config.yaml (top level, easy to find)
# Create config.yaml
$configPath = "$HermesHome\config.yaml"
if (-not (Test-Path $configPath)) {
$examplePath = "$InstallDir\cli-config.yaml.example"
@@ -407,7 +469,7 @@ function Install-NodeDeps {
npm install --silent 2>&1 | Out-Null
Write-Success "Node.js dependencies installed"
} catch {
Write-Warning "npm install failed (browser tools may not work)"
Write-Warn "npm install failed (browser tools may not work)"
}
}
@@ -426,12 +488,13 @@ function Invoke-SetupWizard {
Push-Location $InstallDir
# Run hermes setup using the venv Python directly (no activation needed)
if (-not $NoVenv) {
& .\venv\Scripts\Activate.ps1
& ".\venv\Scripts\python.exe" -m hermes_cli.main setup
} else {
python -m hermes_cli.main setup
}
python -m hermes_cli.main setup
Pop-Location
}
@@ -478,7 +541,6 @@ function Write-Completion {
Write-Host "⚡ Restart your terminal for PATH changes to take effect" -ForegroundColor Yellow
Write-Host ""
# Show notes about optional tools
if (-not $HasNode) {
Write-Host "Note: Node.js was not found. Browser automation tools" -ForegroundColor Yellow
Write-Host "will have limited functionality." -ForegroundColor Yellow
@@ -500,6 +562,7 @@ function Write-Completion {
function Main {
Write-Banner
if (-not (Install-Uv)) { exit 1 }
if (-not (Test-Python)) { exit 1 }
if (-not (Test-Git)) { exit 1 }
Test-Node # Optional, doesn't fail

View File

@@ -3,6 +3,7 @@
# Hermes Agent Installer
# ============================================================================
# Installation script for Linux and macOS.
# Uses uv for fast Python provisioning and package management.
#
# Usage:
# curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
@@ -29,7 +30,7 @@ REPO_URL_SSH="git@github.com:NousResearch/hermes-agent.git"
REPO_URL_HTTPS="https://github.com/NousResearch/hermes-agent.git"
HERMES_HOME="$HOME/.hermes"
INSTALL_DIR="${HERMES_INSTALL_DIR:-$HERMES_HOME/hermes-agent}"
PYTHON_MIN_VERSION="3.10"
PYTHON_VERSION="3.11"
# Options
USE_VENV=true
@@ -64,7 +65,7 @@ while [[ $# -gt 0 ]]; do
echo " --no-venv Don't create virtual environment"
echo " --skip-setup Skip interactive setup wizard"
echo " --branch NAME Git branch to install (default: main)"
echo " --dir PATH Installation directory (default: ~/.hermes-agent)"
echo " --dir PATH Installation directory (default: ~/.hermes/hermes-agent)"
echo " -h, --help Show this help"
exit 0
;;
@@ -146,50 +147,80 @@ detect_os() {
# Dependency checks
# ============================================================================
check_python() {
log_info "Checking Python..."
install_uv() {
log_info "Checking for uv package manager..."
# Try different python commands
for cmd in python3.12 python3.11 python3.10 python3 python; do
if command -v $cmd &> /dev/null; then
PYTHON_CMD=$cmd
PYTHON_VERSION=$($cmd -c 'import sys; print(f"{sys.version_info.major}.{sys.version_info.minor}")')
# Check version
if python3 -c "import sys; exit(0 if sys.version_info >= (3, 10) else 1)" 2>/dev/null; then
log_success "Python $PYTHON_VERSION found"
return 0
fi
# Check common locations for uv
if command -v uv &> /dev/null; then
UV_CMD="uv"
UV_VERSION=$($UV_CMD --version 2>/dev/null)
log_success "uv found ($UV_VERSION)"
return 0
fi
# Check ~/.local/bin (default uv install location) even if not on PATH yet
if [ -x "$HOME/.local/bin/uv" ]; then
UV_CMD="$HOME/.local/bin/uv"
UV_VERSION=$($UV_CMD --version 2>/dev/null)
log_success "uv found at ~/.local/bin ($UV_VERSION)"
return 0
fi
# Check ~/.cargo/bin (alternative uv install location)
if [ -x "$HOME/.cargo/bin/uv" ]; then
UV_CMD="$HOME/.cargo/bin/uv"
UV_VERSION=$($UV_CMD --version 2>/dev/null)
log_success "uv found at ~/.cargo/bin ($UV_VERSION)"
return 0
fi
# Install uv
log_info "Installing uv (fast Python package manager)..."
if curl -LsSf https://astral.sh/uv/install.sh | sh 2>/dev/null; then
# uv installs to ~/.local/bin by default
if [ -x "$HOME/.local/bin/uv" ]; then
UV_CMD="$HOME/.local/bin/uv"
elif [ -x "$HOME/.cargo/bin/uv" ]; then
UV_CMD="$HOME/.cargo/bin/uv"
elif command -v uv &> /dev/null; then
UV_CMD="uv"
else
log_error "uv installed but not found on PATH"
log_info "Try adding ~/.local/bin to your PATH and re-running"
exit 1
fi
done
UV_VERSION=$($UV_CMD --version 2>/dev/null)
log_success "uv installed ($UV_VERSION)"
else
log_error "Failed to install uv"
log_info "Install manually: https://docs.astral.sh/uv/getting-started/installation/"
exit 1
fi
}
check_python() {
log_info "Checking Python $PYTHON_VERSION..."
log_error "Python 3.10+ not found"
log_info "Please install Python 3.10 or newer:"
# Let uv handle Python — it can download and manage Python versions
# First check if a suitable Python is already available
if $UV_CMD python find "$PYTHON_VERSION" &> /dev/null; then
PYTHON_PATH=$($UV_CMD python find "$PYTHON_VERSION")
PYTHON_FOUND_VERSION=$($PYTHON_PATH --version 2>/dev/null)
log_success "Python found: $PYTHON_FOUND_VERSION"
return 0
fi
case "$OS" in
linux)
case "$DISTRO" in
ubuntu|debian)
log_info " sudo apt update && sudo apt install python3.11 python3.11-venv"
;;
fedora)
log_info " sudo dnf install python3.11"
;;
arch)
log_info " sudo pacman -S python"
;;
*)
log_info " Use your package manager to install Python 3.10+"
;;
esac
;;
macos)
log_info " brew install python@3.11"
log_info " Or download from https://www.python.org/downloads/"
;;
esac
exit 1
# Python not found — use uv to install it (no sudo needed!)
log_info "Python $PYTHON_VERSION not found, installing via uv..."
if $UV_CMD python install "$PYTHON_VERSION"; then
PYTHON_PATH=$($UV_CMD python find "$PYTHON_VERSION")
PYTHON_FOUND_VERSION=$($PYTHON_PATH --version 2>/dev/null)
log_success "Python installed: $PYTHON_FOUND_VERSION"
else
log_error "Failed to install Python $PYTHON_VERSION"
log_info "Install Python $PYTHON_VERSION manually, then re-run this script"
exit 1
fi
}
check_git() {
@@ -294,7 +325,6 @@ check_ripgrep() {
# Check if we can use sudo
CAN_SUDO=false
if command -v sudo &> /dev/null; then
# Check if user has sudo access (without actually running sudo)
if sudo -n true 2>/dev/null || sudo -v 2>/dev/null; then
CAN_SUDO=true
fi
@@ -328,7 +358,6 @@ check_ripgrep() {
esac
else
log_warn "sudo not available - cannot auto-install system packages"
# Try cargo as fallback if available
if command -v cargo &> /dev/null; then
log_info "Trying cargo install (no sudo required)..."
if cargo install ripgrep 2>/dev/null; then
@@ -371,7 +400,6 @@ check_ripgrep() {
log_info " https://github.com/BurntSushi/ripgrep#installation"
;;
esac
# Show cargo alternative for users without sudo
if command -v cargo &> /dev/null; then
log_info " Or without sudo: cargo install ripgrep"
fi
@@ -440,39 +468,36 @@ setup_venv() {
return 0
fi
log_info "Creating virtual environment..."
log_info "Creating virtual environment with Python $PYTHON_VERSION..."
if [ -d "venv" ]; then
log_info "Virtual environment already exists"
else
$PYTHON_CMD -m venv venv
log_info "Virtual environment already exists, recreating..."
rm -rf venv
fi
# Activate
source venv/bin/activate
# uv creates the venv and pins the Python version in one step
$UV_CMD venv venv --python "$PYTHON_VERSION"
# Upgrade pip
pip install --upgrade pip wheel setuptools > /dev/null
log_success "Virtual environment ready"
log_success "Virtual environment ready (Python $PYTHON_VERSION)"
}
install_deps() {
log_info "Installing dependencies..."
if [ "$USE_VENV" = true ]; then
source venv/bin/activate
# Tell uv to install into our venv (no need to activate)
export VIRTUAL_ENV="$INSTALL_DIR/venv"
fi
# Install the main package in editable mode with all extras
pip install -e ".[all]" > /dev/null 2>&1 || pip install -e "." > /dev/null
$UV_CMD pip install -e ".[all]" || $UV_CMD pip install -e "."
log_success "Main package installed"
# Install submodules
log_info "Installing mini-swe-agent (terminal tool backend)..."
if [ -d "mini-swe-agent" ] && [ -f "mini-swe-agent/pyproject.toml" ]; then
pip install -e "./mini-swe-agent" > /dev/null 2>&1 || log_warn "mini-swe-agent install failed (terminal tools may not work)"
$UV_CMD pip install -e "./mini-swe-agent" || log_warn "mini-swe-agent install failed (terminal tools may not work)"
log_success "mini-swe-agent installed"
else
log_warn "mini-swe-agent not found (run: git submodule update --init)"
@@ -480,7 +505,7 @@ install_deps() {
log_info "Installing tinker-atropos (RL training backend)..."
if [ -d "tinker-atropos" ] && [ -f "tinker-atropos/pyproject.toml" ]; then
pip install -e "./tinker-atropos" > /dev/null 2>&1 || log_warn "tinker-atropos install failed (RL tools may not work)"
$UV_CMD pip install -e "./tinker-atropos" || log_warn "tinker-atropos install failed (RL tools may not work)"
log_success "tinker-atropos installed"
else
log_warn "tinker-atropos not found (run: git submodule update --init)"
@@ -490,53 +515,56 @@ install_deps() {
}
setup_path() {
log_info "Setting up PATH..."
log_info "Setting up hermes command..."
# Determine the bin directory
if [ "$USE_VENV" = true ]; then
BIN_DIR="$INSTALL_DIR/venv/bin"
HERMES_BIN="$INSTALL_DIR/venv/bin/hermes"
else
BIN_DIR="$HOME/.local/bin"
mkdir -p "$BIN_DIR"
HERMES_BIN="$(which hermes 2>/dev/null || echo "")"
if [ -z "$HERMES_BIN" ]; then
log_warn "hermes not found on PATH after install"
return 0
fi
fi
# Create symlink in ~/.local/bin (standard user binary location, usually on PATH)
mkdir -p "$HOME/.local/bin"
ln -sf "$HERMES_BIN" "$HOME/.local/bin/hermes"
log_success "Symlinked hermes → ~/.local/bin/hermes"
# Check if ~/.local/bin is on PATH; if not, add it to shell config
if ! echo "$PATH" | tr ':' '\n' | grep -q "^$HOME/.local/bin$"; then
SHELL_CONFIG=""
if [ -n "$BASH_VERSION" ]; then
if [ -f "$HOME/.bashrc" ]; then
SHELL_CONFIG="$HOME/.bashrc"
elif [ -f "$HOME/.bash_profile" ]; then
SHELL_CONFIG="$HOME/.bash_profile"
fi
elif [ -n "$ZSH_VERSION" ] || [ -f "$HOME/.zshrc" ]; then
SHELL_CONFIG="$HOME/.zshrc"
fi
# Create a wrapper script
cat > "$BIN_DIR/hermes" << EOF
#!/bin/bash
cd "$INSTALL_DIR"
exec python -m hermes_cli.main "\$@"
EOF
chmod +x "$BIN_DIR/hermes"
fi
# Add to PATH in shell config
SHELL_CONFIG=""
if [ -n "$BASH_VERSION" ]; then
if [ -f "$HOME/.bashrc" ]; then
SHELL_CONFIG="$HOME/.bashrc"
elif [ -f "$HOME/.bash_profile" ]; then
SHELL_CONFIG="$HOME/.bash_profile"
PATH_LINE='export PATH="$HOME/.local/bin:$PATH"'
if [ -n "$SHELL_CONFIG" ]; then
if ! grep -q '\.local/bin' "$SHELL_CONFIG" 2>/dev/null; then
echo "" >> "$SHELL_CONFIG"
echo "# Hermes Agent — ensure ~/.local/bin is on PATH" >> "$SHELL_CONFIG"
echo "$PATH_LINE" >> "$SHELL_CONFIG"
log_success "Added ~/.local/bin to PATH in $SHELL_CONFIG"
else
log_info "~/.local/bin already referenced in $SHELL_CONFIG"
fi
fi
elif [ -n "$ZSH_VERSION" ] || [ -f "$HOME/.zshrc" ]; then
SHELL_CONFIG="$HOME/.zshrc"
else
log_info "~/.local/bin already on PATH"
fi
PATH_LINE="export PATH=\"$BIN_DIR:\$PATH\""
# Export for current session so hermes works immediately
export PATH="$HOME/.local/bin:$PATH"
if [ -n "$SHELL_CONFIG" ]; then
if ! grep -q "hermes-agent" "$SHELL_CONFIG" 2>/dev/null; then
echo "" >> "$SHELL_CONFIG"
echo "# Hermes Agent" >> "$SHELL_CONFIG"
echo "$PATH_LINE" >> "$SHELL_CONFIG"
log_success "Added to $SHELL_CONFIG"
else
log_info "PATH already configured in $SHELL_CONFIG"
fi
fi
# Also export for current session
export PATH="$BIN_DIR:$PATH"
log_success "PATH configured"
log_success "hermes command ready"
}
copy_config_templates() {
@@ -553,7 +581,6 @@ copy_config_templates() {
cp "$INSTALL_DIR/.env.example" "$HERMES_HOME/.env"
log_success "Created ~/.hermes/.env from template"
else
# Create empty .env if no example exists
touch "$HERMES_HOME/.env"
log_success "Created ~/.hermes/.env"
fi
@@ -601,12 +628,14 @@ run_setup_wizard() {
log_info "Starting setup wizard..."
echo ""
if [ "$USE_VENV" = true ]; then
source "$INSTALL_DIR/venv/bin/activate"
fi
cd "$INSTALL_DIR"
python -m hermes_cli.main setup
# Run hermes setup using the venv Python directly (no activation needed)
if [ "$USE_VENV" = true ]; then
"$INSTALL_DIR/venv/bin/python" -m hermes_cli.main setup
else
python -m hermes_cli.main setup
fi
}
print_success() {
@@ -673,6 +702,7 @@ main() {
print_banner
detect_os
install_uv
check_python
check_git
check_node

View File

@@ -3,16 +3,18 @@
# Hermes Agent Setup Script
# ============================================================================
# Quick setup for developers who cloned the repo manually.
# Uses uv for fast Python provisioning and package management.
#
# Usage:
# ./setup-hermes.sh
#
# This script:
# 1. Creates a virtual environment (if not exists)
# 2. Installs dependencies
# 3. Creates .env from template (if not exists)
# 4. Installs the 'hermes' CLI command
# 5. Runs the setup wizard (optional)
# 1. Installs uv if not present
# 2. Creates a virtual environment with Python 3.11 via uv
# 3. Installs all dependencies (main package + submodules)
# 4. Creates .env from template (if not exists)
# 5. Symlinks the 'hermes' CLI command into ~/.local/bin
# 6. Runs the setup wizard (optional)
# ============================================================================
set -e
@@ -21,38 +23,75 @@ set -e
GREEN='\033[0;32m'
YELLOW='\033[0;33m'
CYAN='\033[0;36m'
RED='\033[0;31m'
NC='\033[0m'
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
cd "$SCRIPT_DIR"
PYTHON_VERSION="3.11"
echo ""
echo -e "${CYAN}🦋 Hermes Agent Setup${NC}"
echo ""
# ============================================================================
# Python check
# Install / locate uv
# ============================================================================
echo -e "${CYAN}${NC} Checking Python..."
echo -e "${CYAN}${NC} Checking for uv..."
PYTHON_CMD=""
for cmd in python3.12 python3.11 python3.10 python3 python; do
if command -v $cmd &> /dev/null; then
if $cmd -c "import sys; exit(0 if sys.version_info >= (3, 10) else 1)" 2>/dev/null; then
PYTHON_CMD=$cmd
break
fi
fi
done
if [ -z "$PYTHON_CMD" ]; then
echo -e "${YELLOW}${NC} Python 3.10+ required"
exit 1
UV_CMD=""
if command -v uv &> /dev/null; then
UV_CMD="uv"
elif [ -x "$HOME/.local/bin/uv" ]; then
UV_CMD="$HOME/.local/bin/uv"
elif [ -x "$HOME/.cargo/bin/uv" ]; then
UV_CMD="$HOME/.cargo/bin/uv"
fi
PYTHON_VERSION=$($PYTHON_CMD -c 'import sys; print(f"{sys.version_info.major}.{sys.version_info.minor}")')
echo -e "${GREEN}${NC} Python $PYTHON_VERSION found"
if [ -n "$UV_CMD" ]; then
UV_VERSION=$($UV_CMD --version 2>/dev/null)
echo -e "${GREEN}${NC} uv found ($UV_VERSION)"
else
echo -e "${CYAN}${NC} Installing uv..."
if curl -LsSf https://astral.sh/uv/install.sh | sh 2>/dev/null; then
if [ -x "$HOME/.local/bin/uv" ]; then
UV_CMD="$HOME/.local/bin/uv"
elif [ -x "$HOME/.cargo/bin/uv" ]; then
UV_CMD="$HOME/.cargo/bin/uv"
fi
if [ -n "$UV_CMD" ]; then
UV_VERSION=$($UV_CMD --version 2>/dev/null)
echo -e "${GREEN}${NC} uv installed ($UV_VERSION)"
else
echo -e "${RED}${NC} uv installed but not found. Add ~/.local/bin to PATH and retry."
exit 1
fi
else
echo -e "${RED}${NC} Failed to install uv. Visit https://docs.astral.sh/uv/"
exit 1
fi
fi
# ============================================================================
# Python check (uv can provision it automatically)
# ============================================================================
echo -e "${CYAN}${NC} Checking Python $PYTHON_VERSION..."
if $UV_CMD python find "$PYTHON_VERSION" &> /dev/null; then
PYTHON_PATH=$($UV_CMD python find "$PYTHON_VERSION")
PYTHON_FOUND_VERSION=$($PYTHON_PATH --version 2>/dev/null)
echo -e "${GREEN}${NC} $PYTHON_FOUND_VERSION found"
else
echo -e "${CYAN}${NC} Python $PYTHON_VERSION not found, installing via uv..."
$UV_CMD python install "$PYTHON_VERSION"
PYTHON_PATH=$($UV_CMD python find "$PYTHON_VERSION")
PYTHON_FOUND_VERSION=$($PYTHON_PATH --version 2>/dev/null)
echo -e "${GREEN}${NC} $PYTHON_FOUND_VERSION installed"
fi
# ============================================================================
# Virtual environment
@@ -60,15 +99,16 @@ echo -e "${GREEN}✓${NC} Python $PYTHON_VERSION found"
echo -e "${CYAN}${NC} Setting up virtual environment..."
if [ ! -d "venv" ]; then
$PYTHON_CMD -m venv venv
echo -e "${GREEN}${NC} Created venv"
else
echo -e "${GREEN}${NC} venv exists"
if [ -d "venv" ]; then
echo -e "${CYAN}${NC} Removing old venv..."
rm -rf venv
fi
source venv/bin/activate
pip install --upgrade pip wheel setuptools > /dev/null
$UV_CMD venv venv --python "$PYTHON_VERSION"
echo -e "${GREEN}${NC} venv created (Python $PYTHON_VERSION)"
# Tell uv to install into this venv (no activation needed for uv)
export VIRTUAL_ENV="$SCRIPT_DIR/venv"
# ============================================================================
# Dependencies
@@ -76,10 +116,34 @@ pip install --upgrade pip wheel setuptools > /dev/null
echo -e "${CYAN}${NC} Installing dependencies..."
pip install -e ".[all]" > /dev/null 2>&1 || pip install -e "." > /dev/null
$UV_CMD pip install -e ".[all]" || $UV_CMD pip install -e "."
echo -e "${GREEN}${NC} Dependencies installed"
# ============================================================================
# Submodules (terminal backend + RL training)
# ============================================================================
echo -e "${CYAN}${NC} Installing submodules..."
# mini-swe-agent (terminal tool backend)
if [ -d "mini-swe-agent" ] && [ -f "mini-swe-agent/pyproject.toml" ]; then
$UV_CMD pip install -e "./mini-swe-agent" && \
echo -e "${GREEN}${NC} mini-swe-agent installed" || \
echo -e "${YELLOW}${NC} mini-swe-agent install failed (terminal tools may not work)"
else
echo -e "${YELLOW}${NC} mini-swe-agent not found (run: git submodule update --init --recursive)"
fi
# tinker-atropos (RL training backend)
if [ -d "tinker-atropos" ] && [ -f "tinker-atropos/pyproject.toml" ]; then
$UV_CMD pip install -e "./tinker-atropos" && \
echo -e "${GREEN}${NC} tinker-atropos installed" || \
echo -e "${YELLOW}${NC} tinker-atropos install failed (RL tools may not work)"
else
echo -e "${YELLOW}${NC} tinker-atropos not found (run: git submodule update --init --recursive)"
fi
# ============================================================================
# Optional: ripgrep (for faster file search)
# ============================================================================
@@ -141,14 +205,17 @@ else
fi
# ============================================================================
# PATH setup
# PATH setup — symlink hermes into ~/.local/bin
# ============================================================================
echo -e "${CYAN}${NC} Setting up hermes command..."
BIN_DIR="$SCRIPT_DIR/venv/bin"
HERMES_BIN="$SCRIPT_DIR/venv/bin/hermes"
mkdir -p "$HOME/.local/bin"
ln -sf "$HERMES_BIN" "$HOME/.local/bin/hermes"
echo -e "${GREEN}${NC} Symlinked hermes → ~/.local/bin/hermes"
# Add to shell config if not already there
# Ensure ~/.local/bin is on PATH in shell config
SHELL_CONFIG=""
if [ -f "$HOME/.zshrc" ]; then
SHELL_CONFIG="$HOME/.zshrc"
@@ -159,13 +226,17 @@ elif [ -f "$HOME/.bash_profile" ]; then
fi
if [ -n "$SHELL_CONFIG" ]; then
if ! grep -q "hermes-agent" "$SHELL_CONFIG" 2>/dev/null; then
echo "" >> "$SHELL_CONFIG"
echo "# Hermes Agent" >> "$SHELL_CONFIG"
echo "export PATH=\"$BIN_DIR:\$PATH\"" >> "$SHELL_CONFIG"
echo -e "${GREEN}${NC} Added to $SHELL_CONFIG"
if ! echo "$PATH" | tr ':' '\n' | grep -q "^$HOME/.local/bin$"; then
if ! grep -q '\.local/bin' "$SHELL_CONFIG" 2>/dev/null; then
echo "" >> "$SHELL_CONFIG"
echo "# Hermes Agent — ensure ~/.local/bin is on PATH" >> "$SHELL_CONFIG"
echo 'export PATH="$HOME/.local/bin:$PATH"' >> "$SHELL_CONFIG"
echo -e "${GREEN}${NC} Added ~/.local/bin to PATH in $SHELL_CONFIG"
else
echo -e "${GREEN}${NC} ~/.local/bin already in $SHELL_CONFIG"
fi
else
echo -e "${GREEN}${NC} PATH already in $SHELL_CONFIG"
echo -e "${GREEN}${NC} ~/.local/bin already on PATH"
fi
fi
@@ -199,5 +270,6 @@ read -p "Would you like to run the setup wizard now? [Y/n] " -n 1 -r
echo
if [[ $REPLY =~ ^[Yy]$ ]] || [[ -z $REPLY ]]; then
echo ""
python -m hermes_cli.main setup
# Run directly with venv Python (no activation needed)
"$SCRIPT_DIR/venv/bin/python" -m hermes_cli.main setup
fi

View File

@@ -2,6 +2,7 @@
"""File Tools Module - LLM agent file manipulation tools."""
import json
import os
import threading
from typing import Optional
from tools.file_operations import ShellFileOperations
@@ -11,23 +12,85 @@ _file_ops_cache: dict = {}
def _get_file_ops(task_id: str = "default") -> ShellFileOperations:
"""Get or create ShellFileOperations for a terminal environment."""
from tools.terminal_tool import _active_environments, _env_lock, _LocalEnvironment
"""Get or create ShellFileOperations for a terminal environment.
Respects the TERMINAL_ENV setting -- if the task_id doesn't have an
environment yet, creates one using the configured backend (local, docker,
modal, etc.) rather than always defaulting to local.
"""
from tools.terminal_tool import (
_active_environments, _env_lock, _create_environment,
_get_env_config, _last_activity, _start_cleanup_thread,
_check_disk_usage_warning,
)
import time
# Fast path: check cache without heavy locks
with _file_ops_lock:
if task_id in _file_ops_cache:
return _file_ops_cache[task_id]
# Check if we need to create a new environment
needs_creation = False
with _env_lock:
if task_id not in _active_environments:
needs_creation = True
# Create environment OUTSIDE locks so we don't block other rollouts
# during slow Modal/Docker startup (~10s)
if needs_creation:
config = _get_env_config()
env_type = config["env_type"]
if env_type == "docker":
image = config["docker_image"]
elif env_type == "singularity":
image = config["singularity_image"]
elif env_type == "modal":
image = config["modal_image"]
else:
image = ""
cwd = config["cwd"]
_check_disk_usage_warning()
if not os.getenv("HERMES_QUIET"):
print(f"[FileTools] Creating new {env_type} environment for task {task_id[:8]}...", flush=True)
new_env = _create_environment(
env_type=env_type,
image=image,
cwd=cwd,
timeout=config["timeout"],
)
# Store under lock (brief) -- do NOT call _start_cleanup_thread inside
# the lock because it also acquires _env_lock (non-reentrant = deadlock)
created = False
with _env_lock:
if task_id not in _active_environments:
import os
env = _LocalEnvironment(cwd=os.getcwd(), timeout=60)
_active_environments[task_id] = env
terminal_env = _active_environments[task_id]
_active_environments[task_id] = new_env
created = True
else:
try:
if hasattr(new_env, 'stop'):
new_env.stop()
except Exception:
pass
file_ops = ShellFileOperations(terminal_env)
if created:
_start_cleanup_thread()
if not os.getenv("HERMES_QUIET"):
print(f"[FileTools] {env_type} environment ready for task {task_id[:8]}", flush=True)
# Now get the environment and build file_ops
with _env_lock:
_last_activity[task_id] = time.time()
terminal_env = _active_environments[task_id]
file_ops = ShellFileOperations(terminal_env)
with _file_ops_lock:
_file_ops_cache[task_id] = file_ops
return file_ops
return file_ops
def clear_file_ops_cache(task_id: str = None):
@@ -56,6 +119,7 @@ def write_file_tool(path: str, content: str, task_id: str = "default") -> str:
result = file_ops.write_file(path, content)
return json.dumps(result.to_dict(), ensure_ascii=False)
except Exception as e:
print(f"[FileTools] write_file error: {type(e).__name__}: {e}", flush=True)
return json.dumps({"error": str(e)}, ensure_ascii=False)

View File

@@ -1300,10 +1300,26 @@ async def rl_test_inference(
# Requirements Check
# ============================================================================
def check_rl_python_version() -> bool:
"""
Check if Python version meets the minimum for RL tools.
tinker-atropos depends on the 'tinker' package which requires Python >= 3.11.
"""
return sys.version_info >= (3, 11)
def check_rl_api_keys() -> bool:
"""
Check if required API keys are available.
Check if required API keys and Python version are available.
RL training requires:
- Python >= 3.11 (tinker package requirement)
- TINKER_API_KEY for the Tinker training API
- WANDB_API_KEY for Weights & Biases metrics
"""
if not check_rl_python_version():
return False
tinker_key = os.getenv("TINKER_API_KEY")
wandb_key = os.getenv("WANDB_API_KEY")
return bool(tinker_key) and bool(wandb_key)
@@ -1311,9 +1327,11 @@ def check_rl_api_keys() -> bool:
def get_missing_keys() -> List[str]:
"""
Get list of missing required API keys.
Get list of missing requirements for RL tools (API keys and Python version).
"""
missing = []
if not check_rl_python_version():
missing.append(f"Python >= 3.11 (current: {sys.version_info.major}.{sys.version_info.minor})")
if not os.getenv("TINKER_API_KEY"):
missing.append("TINKER_API_KEY")
if not os.getenv("WANDB_API_KEY"):

View File

@@ -1347,40 +1347,61 @@ def terminal_tool(
_start_cleanup_thread()
# Get or create environment
# Check under lock, but create OUTSIDE lock so we don't block
# other concurrent rollouts during slow Modal/Docker startup
needs_creation = False
with _env_lock:
if effective_task_id not in _active_environments:
# Check disk usage before creating new environment
_check_disk_usage_warning()
try:
# Build SSH config if using SSH environment
ssh_config = None
if env_type == "ssh":
ssh_config = {
"host": config.get("ssh_host", ""),
"user": config.get("ssh_user", ""),
"port": config.get("ssh_port", 22),
"key": config.get("ssh_key", ""),
}
_active_environments[effective_task_id] = _create_environment(
env_type=env_type,
image=image,
cwd=cwd,
timeout=effective_timeout,
ssh_config=ssh_config
)
except ImportError as e:
return json.dumps({
"output": "",
"exit_code": -1,
"error": f"Terminal tool disabled: mini-swe-agent not available ({e})",
"status": "disabled"
}, ensure_ascii=False)
needs_creation = True
else:
_last_activity[effective_task_id] = time.time()
env = _active_environments[effective_task_id]
# Update last activity time
_last_activity[effective_task_id] = time.time()
env = _active_environments[effective_task_id]
if needs_creation:
_check_disk_usage_warning()
if not os.getenv("HERMES_QUIET"):
print(f"[Terminal] Creating new {env_type} environment for task {effective_task_id[:8]}...", flush=True)
try:
ssh_config = None
if env_type == "ssh":
ssh_config = {
"host": config.get("ssh_host", ""),
"user": config.get("ssh_user", ""),
"port": config.get("ssh_port", 22),
"key": config.get("ssh_key", ""),
}
new_env = _create_environment(
env_type=env_type,
image=image,
cwd=cwd,
timeout=effective_timeout,
ssh_config=ssh_config
)
except ImportError as e:
return json.dumps({
"output": "",
"exit_code": -1,
"error": f"Terminal tool disabled: mini-swe-agent not available ({e})",
"status": "disabled"
}, ensure_ascii=False)
# Store under lock (brief)
with _env_lock:
if effective_task_id not in _active_environments:
_active_environments[effective_task_id] = new_env
else:
# Another thread created it while we were building -- clean up ours
try:
if hasattr(new_env, 'stop'):
new_env.stop()
except Exception:
pass
_last_activity[effective_task_id] = time.time()
env = _active_environments[effective_task_id]
if not os.getenv("HERMES_QUIET"):
print(f"[Terminal] {env_type} environment ready for task {effective_task_id[:8]}", flush=True)
# Check for dangerous commands (only for local/ssh in interactive modes)
# Skip check if force=True (user has confirmed they want to run it)
@@ -1435,13 +1456,20 @@ def terminal_tool(
retry_count += 1
wait_time = 2 ** retry_count
print(f"⚠️ Terminal: execution error, retrying in {wait_time}s (attempt {retry_count}/{max_retries})")
print(f" Command: {command[:200]}")
print(f" Error: {type(e).__name__}: {e}")
print(f" Task ID: {effective_task_id}, Backend: {env_type}")
time.sleep(wait_time)
continue
print(f"❌ Terminal: execution failed after {max_retries} retries")
print(f" Command: {command[:200]}")
print(f" Error: {type(e).__name__}: {e}")
print(f" Task ID: {effective_task_id}, Backend: {env_type}")
return json.dumps({
"output": "",
"exit_code": -1,
"error": f"Command execution failed: {str(e)}"
"error": f"Command execution failed: {type(e).__name__}: {str(e)}"
}, ensure_ascii=False)
# Got a result