Enhance async tool execution and error handling in Hermes agent for Atropos integration

- Updated `.gitignore` to exclude `testlogs` directory. - Refactored `handle_web_function_call` in `model_tools.py` to support running async functions in existing event loops, improving compatibility with Atropos. - Introduced a thread pool executor in `agent_loop.py` for running synchronous tool calls that internally use `asyncio.run()`, preventing deadlocks. - Added `ToolError` class to track tool execution errors, enhancing error reporting during agent loops. - Updated `wandb_log` method in `hermes_base_env.py` to log tool error statistics for better monitoring. - Implemented patches in `patches.py` to ensure async-safe operation of tools within Atropos's event loop. - Enhanced `ToolContext` and `terminal_tool.py` to utilize the new async handling, improving overall tool execution reliability.
Transition installation to uv for py version and speed to be easier to streamline
2026-02-08 05:00:47 +00:00 · 2026-02-07 23:54:53 +00:00 · 2026-02-07 21:11:07 +00:00 · 2026-02-07 21:11:01 +00:00 · 2026-02-07 09:17:16 +00:00 · 2026-02-07 00:05:04 +00:00
44 changed files with 4270 additions and 1344 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -39,6 +39,10 @@ agent-browser/
 *.pem
 privvy*
 images/
+__pycache__/
+hermes_agent.egg-info/
+wandb/
+testlogs

 # CLI config (may contain sensitive SSH paths)
 cli-config.yaml
--- a/README.md
+++ b/README.md
@@ -15,11 +15,13 @@ irm https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/ins
 ```

 The installer will:
- Clone to `~/.hermes-agent` (with submodules: mini-swe-agent, tinker-atropos)
- Create a virtual environment
- Install all dependencies
+- Install [uv](https://docs.astral.sh/uv/) (fast Python package manager) if not present
+- Install Python 3.11 via uv if not already available (no sudo needed)
+- Clone to `~/.hermes/hermes-agent` (with submodules: mini-swe-agent, tinker-atropos)
+- Create a virtual environment with Python 3.11
+- Install all dependencies and submodule packages
+- Symlink `hermes` into `~/.local/bin` so it works globally (no venv activation needed)
 - Run the interactive setup wizard
- Add `hermes` to your PATH

 After installation, reload your shell and run:
 ```bash
@@ -64,13 +66,13 @@ You need at least one LLM provider:
 | Provider | Get Key | Env Variable |
 |----------|---------|--------------|
 | **OpenRouter** (recommended) | [openrouter.ai/keys](https://openrouter.ai/keys) | `OPENROUTER_API_KEY` |
-| Anthropic | [console.anthropic.com](https://console.anthropic.com/) | `ANTHROPIC_API_KEY` |
-| OpenAI | [platform.openai.com](https://platform.openai.com/api-keys) | `OPENAI_API_KEY` |
+

 ### Optional API Keys

 | Feature | Provider | Env Variable |
 |---------|----------|--------------|
+| Custom OpenAI Endpoint (OAI or VLLM/SGLANG) | [platform.openai.com](https://platform.openai.com/api-keys) | `OPENAI_API_KEY` |
 | Web scraping | [Firecrawl](https://firecrawl.dev/) | `FIRECRAWL_API_KEY` |
 | Browser automation | [Browserbase](https://browserbase.com/) | `BROWSERBASE_API_KEY`, `BROWSERBASE_PROJECT_ID` |
 | Image generation | [FAL](https://fal.ai/) | `FAL_KEY` |
@@ -179,8 +181,8 @@ hermes config set terminal.singularity_image ~/python.sif

 **Modal** (serverless cloud):
 ```bash
-pip install modal boto3
-modal setup  # Authenticate
+uv pip install "swe-rex[modal]"   # Installs swe-rex + modal + boto3
+modal setup                    # Authenticate with Modal
 hermes config set terminal.backend modal
 ```

@@ -275,16 +277,19 @@ See [docs/messaging.md](docs/messaging.md) for WhatsApp and advanced setup.

 Train language models with reinforcement learning using the Tinker API and Atropos framework.

+> **Note:** RL training tools require **Python 3.11+** (the upstream `tinker` package has this requirement). On Python 3.10, the RL toolset will be automatically disabled — all other features work fine.
+
 #### Requirements

-1. **API Keys:** Add to `~/.hermes/.env`:
+1. **Python 3.11+** (check with `python3 --version`)
+2. **API Keys:** Add to `~/.hermes/.env`:
 ```bash
 TINKER_API_KEY=your-tinker-key      # Get from https://tinker-console.thinkingmachines.ai/keys
 WANDB_API_KEY=your-wandb-key        # Get from https://wandb.ai/authorize
 OPENROUTER_API_KEY=your-key         # Optional: for rl_test_inference
 ```

-2. **That's it!** tinker-atropos is included as a submodule - no separate installation needed.
+3. **That's it!** tinker-atropos is included as a submodule — the installer handles it automatically.

 #### Using RL Tools

@@ -320,6 +325,94 @@ For extended RL workflows with longer timeouts:
 python rl_cli.py --model "anthropic/claude-sonnet-4-20250514"
 ```

+### 🧪 Atropos RL Environments
+
+Hermes-Agent integrates with the [Atropos](https://github.com/NousResearch/atropos) RL framework through a layered environment system. This allows training models with reinforcement learning on agentic tasks using hermes-agent's tools.
+
+#### Architecture
+
+The integration has three layers:
+
+| Layer | File | Purpose |
+|-------|------|---------|
+| **Agent Loop** | `environments/agent_loop.py` | Reusable multi-turn tool-calling engine (standard OpenAI spec) |
+| **Base Environment** | `environments/hermes_base_env.py` | Abstract Atropos `BaseEnv` subclass with toolset resolution, ToolContext, scoring |
+| **Concrete Envs** | `environments/terminal_test_env.py`, `environments/hermes_swe_env.py` | Task-specific environments |
+
+#### Two-Phase Operation
+
+- **Phase 1 (OpenAI server type)**: Works with any OpenAI-compatible endpoint (VLLM, SGLang, OpenRouter, OpenAI API). The server handles tool call parsing natively. Good for **SFT data generation**, **verifier testing**, and **evaluation**.
+- **Phase 2 (VLLM server type)**: Uses ManagedServer for exact token IDs + logprobs via `/generate`. Client-side tool call parser registry reconstructs structured `tool_calls` from raw output. Required for **full RL training**.
+
+#### Quick Start
+
+```bash
+# 1. Launch VLLM with tool parser
+vllm serve YourModel --tool-parser hermes
+
+# 2. Start the Atropos API server
+run-api
+
+# 3. Run an environment
+python environments/terminal_test_env.py serve \
+    --openai.base_url http://localhost:8000/v1 \
+    --openai.model_name YourModel \
+    --openai.server_type openai
+```
+
+#### ToolContext (Reward Functions)
+
+Reward functions receive a `ToolContext` with unrestricted access to all hermes-agent tools, scoped to the rollout's sandbox:
+
+```python
+async def compute_reward(self, item, result, ctx: ToolContext) -> float:
+    # Run tests in the model's terminal sandbox
+    test = ctx.terminal("pytest -v")
+    if test["exit_code"] == 0:
+        return 1.0
+    # Or check a file, search the web, navigate a browser...
+    return 0.0
+```
+
+#### Creating Custom Environments
+
+Subclass `HermesAgentBaseEnv` and implement 5 methods:
+
+```python
+from environments.hermes_base_env import HermesAgentBaseEnv
+
+class MyEnv(HermesAgentBaseEnv):
+    name = "my-env"
+    async def setup(self): ...            # Load data
+    async def get_next_item(self): ...    # Return next item
+    def format_prompt(self, item): ...    # Item -> prompt string
+    async def compute_reward(self, item, result, ctx): ...  # Score with ToolContext
+    async def evaluate(self, *args, **kwargs): ...          # Periodic eval
+
+if __name__ == "__main__":
+    MyEnv.cli()
+```
+
+#### Toolset Distributions
+
+Configure which tools are available per group, either explicitly or probabilistically:
+
+```bash
+# Explicit toolsets
+--env.enabled_toolsets '["terminal","file","web"]'
+
+# Probabilistic distribution (sampled per group)
+--env.distribution development
+```
+
+#### Tool Call Parsers (Phase 2)
+
+For VLLM server type, a parser registry extracts structured `tool_calls` from raw model output. Supported parsers: `hermes`, `mistral`, `llama3_json`, `qwen`, `deepseek_v3`, `deepseek_v3_1`, `kimi_k2`, `longcat`, `glm45`, `glm47`, `qwen3_coder`.
+
+```bash
+--env.tool_call_parser hermes  # Match your VLLM --tool-parser flag
+```
+
 ### ⏰ Scheduled Tasks (Cron)

 Schedule tasks to run automatically:
@@ -425,26 +518,332 @@ skills/

 ## Manual Installation

-If you prefer not to use the installer:
+If you prefer full control over the installation process (or the quick-install script doesn't suit your environment), follow these steps to set everything up by hand.
+
+### Prerequisites
+
+| Requirement | Minimum Version | Check Command | Notes |
+|-------------|----------------|---------------|-------|
+| **Git** | Any recent | `git --version` | Required |
+| **Node.js** | 18+ | `node --version` | Optional — needed for browser automation tools |
+| **ripgrep** | Any | `rg --version` | Optional — faster file search in terminal tool (falls back to grep) |
+
+> **Note:** Python and pip are **not** prerequisites. The installer uses [uv](https://docs.astral.sh/uv/) to provision Python 3.11 automatically (no sudo needed). If you already have Python 3.11+ installed, uv will use it.
+
+<details>
+<summary><strong>Installing prerequisites by platform</strong></summary>
+
+**Ubuntu / Debian:**
+```bash
+sudo apt update && sudo apt install git
+# Optional:
+sudo apt install ripgrep nodejs npm
+```
+
+**macOS (Homebrew):**
+```bash
+brew install git
+# Optional:
+brew install ripgrep node
+```
+
+**Windows (WSL recommended):**
+Use the [Windows Subsystem for Linux](https://learn.microsoft.com/en-us/windows/wsl/install) and follow the Ubuntu instructions above. Alternatively, use the PowerShell quick-install script at the top of this README.
+
+</details>
+
+---
+
+### Step 1: Clone the Repository
+
+Clone with `--recurse-submodules` to pull the required submodules ([mini-swe-agent](https://github.com/SWE-agent/mini-swe-agent) for the terminal tool backend and [tinker-atropos](https://github.com/nousresearch/tinker-atropos) for RL training):

 ```bash
-# Clone the repository (with submodules)
+git clone --recurse-submodules https://github.com/NousResearch/hermes-agent.git
+cd hermes-agent
+```
+
+If you already cloned without `--recurse-submodules`, initialize them manually:
+```bash
+git submodule update --init --recursive
+```
+
+---
+
+### Step 2: Install uv & Create Virtual Environment
+
+[uv](https://docs.astral.sh/uv/) is a fast Python package manager that can also provision Python itself. Install it and create the venv in one go:
+
+```bash
+# Install uv (if not already installed)
+curl -LsSf https://astral.sh/uv/install.sh | sh
+
+# Create venv with Python 3.11 (uv downloads it if not present — no sudo needed)
+uv venv venv --python 3.11
+```
+
+> **Tip:** You do **not** need to activate the venv to use `hermes`. The entry point has a hardcoded shebang pointing to the venv Python, so it works globally once symlinked (see Step 8). For installing packages, uv can target the venv directly via `VIRTUAL_ENV`.
+
+---
+
+### Step 3: Install Python Dependencies
+
+Install the main package in editable mode with all optional extras (messaging, cron, CLI menus, modal):
+
+```bash
+# Tell uv which venv to install into
+export VIRTUAL_ENV="$(pwd)/venv"
+
+# Install with all extras
+uv pip install -e ".[all]"
+```
+
+If you only want the core agent (no Telegram/Discord/cron support):
+```bash
+uv pip install -e "."
+```
+
+<details>
+<summary><strong>Optional extras breakdown</strong></summary>
+
+| Extra | What it adds | Install command |
+|-------|-------------|-----------------|
+| `all` | Everything below | `uv pip install -e ".[all]"` |
+| `messaging` | Telegram & Discord gateway | `uv pip install -e ".[messaging]"` |
+| `cron` | Cron expression parsing for scheduled tasks | `uv pip install -e ".[cron]"` |
+| `cli` | Terminal menu UI for setup wizard | `uv pip install -e ".[cli]"` |
+| `modal` | Modal cloud execution backend (swe-rex + modal + boto3) | `uv pip install -e ".[modal]"` |
+| `dev` | pytest & test utilities | `uv pip install -e ".[dev]"` |
+
+You can combine extras: `uv pip install -e ".[messaging,cron]"`
+
+</details>
+
+---
+
+### Step 4: Install Submodule Packages
+
+These are local packages checked out as Git submodules. Install them in editable mode:
+
+```bash
+# Terminal tool backend (required for the terminal/command-execution tool)
+uv pip install -e "./mini-swe-agent"
+
+# RL training backend
+uv pip install -e "./tinker-atropos"
+```
+
+Both are optional — if you skip them, the corresponding toolsets simply won't be available.
+
+---
+
+### Step 5: Install Node.js Dependencies (Optional)
+
+Only needed if you plan to use the **browser automation** toolset (Browserbase-powered):
+
+```bash
+npm install
+```
+
+This installs the `agent-browser` package defined in `package.json`. Skip this step if you don't need browser tools.
+
+---
+
+### Step 6: Create the Configuration Directory
+
+Hermes stores all user configuration in `~/.hermes/`:
+
+```bash
+# Create the directory structure
+mkdir -p ~/.hermes/{cron,sessions,logs}
+
+# Copy the example config file
+cp cli-config.yaml.example ~/.hermes/config.yaml
+
+# Create an empty .env file for API keys
+touch ~/.hermes/.env
+```
+
+Your `~/.hermes/` directory should now look like:
+```
+~/.hermes/
+├── config.yaml     # Agent settings (model, terminal, toolsets, compression, etc.)
+├── .env            # API keys and secrets (one per line: KEY=value)
+├── cron/           # Scheduled job data
+├── sessions/       # Messaging gateway sessions
+└── logs/           # Conversation logs
+```
+
+---
+
+### Step 7: Add Your API Keys
+
+Open `~/.hermes/.env` in your editor and add at minimum an LLM provider key:
+
+```bash
+# Required — at least one LLM provider:
+OPENROUTER_API_KEY=sk-or-v1-your-key-here
+
+# Optional — enable additional tools:
+FIRECRAWL_API_KEY=fc-your-key          # Web search & scraping
+BROWSERBASE_API_KEY=bb-your-key        # Browser automation
+BROWSERBASE_PROJECT_ID=your-project-id # Browser automation
+FAL_KEY=your-fal-key                   # Image generation (FLUX)
+TINKER_API_KEY=your-tinker-key         # RL training
+WANDB_API_KEY=your-wandb-key           # RL training metrics
+
+# Optional — messaging gateway:
+TELEGRAM_BOT_TOKEN=123456:ABC-DEF      # From @BotFather
+TELEGRAM_ALLOWED_USERS=your-user-id    # Comma-separated
+DISCORD_BOT_TOKEN=MTIz...              # From Developer Portal
+DISCORD_ALLOWED_USERS=your-user-id     # Comma-separated
+```
+
+Or set them one at a time via the CLI:
+```bash
+hermes config set OPENROUTER_API_KEY sk-or-v1-your-key-here
+```
+
+---
+
+### Step 8: Add `hermes` to Your PATH
+
+The `hermes` entry point at `venv/bin/hermes` has a hardcoded shebang pointing to the venv's Python, so it works **without activating the venv**. The recommended approach is a symlink into `~/.local/bin` (most distributions already have this on PATH):
+
+```bash
+mkdir -p ~/.local/bin
+ln -sf "$(pwd)/venv/bin/hermes" ~/.local/bin/hermes
+```
+
+If `~/.local/bin` isn't on your PATH yet, add it:
+
+**Bash** (`~/.bashrc`):
+```bash
+echo '' >> ~/.bashrc
+echo '# Hermes Agent' >> ~/.bashrc
+echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc
+source ~/.bashrc
+```
+
+**Zsh** (`~/.zshrc`):
+```bash
+echo '' >> ~/.zshrc
+echo '# Hermes Agent' >> ~/.zshrc
+echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.zshrc
+source ~/.zshrc
+```
+
+**Fish** (`~/.config/fish/config.fish`):
+```fish
+fish_add_path $HOME/.local/bin
+```
+
+---
+
+### Step 9: Run the Setup Wizard (Optional)
+
+The interactive setup wizard walks you through configuring your API keys and preferences:
+
+```bash
+hermes setup
+```
+
+This is optional if you already configured `~/.hermes/.env` and `~/.hermes/config.yaml` manually in the steps above.
+
+---
+
+### Step 10: Verify the Installation
+
+```bash
+# Check that the command is available
+hermes version
+
+# Run diagnostics to verify everything is working
+hermes doctor
+
+# Check your configuration
+hermes status
+
+# Test with a quick query
+hermes chat -q "Hello! What tools do you have available?"
+```
+
+If `hermes doctor` reports issues, it will tell you exactly what's missing and how to fix it.
+
+---
+
+### Quick-Reference: Manual Install (Condensed)
+
+For those who just want the commands without the explanations:
+
+```bash
+# Install uv (if not already installed)
+curl -LsSf https://astral.sh/uv/install.sh | sh
+
+# Clone & enter
 git clone --recurse-submodules https://github.com/NousResearch/hermes-agent.git
 cd hermes-agent

-# Run setup script
-./setup-hermes.sh
+# Create venv with Python 3.11 (uv downloads it if needed)
+uv venv venv --python 3.11
+export VIRTUAL_ENV="$(pwd)/venv"

-# Or manually:
-python3 -m venv venv
-source venv/bin/activate
-pip install -e ".[all]"
+# Install everything
+uv pip install -e ".[all]"
+uv pip install -e "./mini-swe-agent"
+uv pip install -e "./tinker-atropos"
+npm install  # optional, for browser tools

-# Install submodules (required for terminal and RL tools)
-pip install -e "./mini-swe-agent"    # Terminal tool backend
-pip install -e "./tinker-atropos"    # RL training backend
+# Configure
+mkdir -p ~/.hermes/{cron,sessions,logs}
+cp cli-config.yaml.example ~/.hermes/config.yaml
+touch ~/.hermes/.env
+echo 'OPENROUTER_API_KEY=sk-or-v1-your-key' >> ~/.hermes/.env

-hermes setup
+# Make hermes available globally (no venv activation needed)
+mkdir -p ~/.local/bin
+ln -sf "$(pwd)/venv/bin/hermes" ~/.local/bin/hermes
+
+# Verify
+hermes doctor
+hermes
+```
+
+---
+
+### Updating a Manual Installation
+
+To update an existing manual install to the latest version:
+
+```bash
+cd /path/to/hermes-agent
+export VIRTUAL_ENV="$(pwd)/venv"
+
+# Pull latest code and submodules
+git pull origin main
+git submodule update --init --recursive
+
+# Reinstall (picks up new dependencies)
+uv pip install -e ".[all]"
+uv pip install -e "./mini-swe-agent"
+uv pip install -e "./tinker-atropos"
+
+# Check for new config options added since your last update
+hermes config check
+hermes config migrate   # Interactively add any missing options
+```
+
+### Uninstalling a Manual Installation
+
+```bash
+# Remove the hermes symlink
+rm -f ~/.local/bin/hermes
+
+# Remove the cloned repository
+rm -rf /path/to/hermes-agent
+
+# Remove user configuration (optional — keep if you plan to reinstall)
+rm -rf ~/.hermes
 ```

 ---
--- a/pycache/model_tools.cpython-310.pyc
+++ b/pycache/model_tools.cpython-310.pyc
--- a/pycache/web_tools.cpython-310.pyc
+++ b/pycache/web_tools.cpython-310.pyc
--- a/environments/init.py
+++ b/environments/init.py
@@ -0,0 +1,28 @@
+"""
+Hermes-Agent Atropos Environments
+
+Provides a layered integration between hermes-agent's tool-calling capabilities
+and the Atropos RL training framework.
+
+Layers:
+    - agent_loop: Reusable multi-turn agent loop with standard OpenAI-spec tool calling
+    - tool_context: Per-rollout tool access handle for reward/verification functions
+    - hermes_base_env: Abstract base environment (BaseEnv subclass) for Atropos
+    - tool_call_parsers: Client-side tool call parser registry for Phase 2 (VLLM /generate)
+
+Concrete environments:
+    - terminal_test_env: Simple file-creation tasks for testing the stack
+    - hermes_swe_env: SWE-bench style tasks with Modal sandboxes
+"""
+
+from environments.agent_loop import AgentResult, HermesAgentLoop
+from environments.tool_context import ToolContext
+from environments.hermes_base_env import HermesAgentBaseEnv, HermesAgentEnvConfig
+
+__all__ = [
+    "AgentResult",
+    "HermesAgentLoop",
+    "ToolContext",
+    "HermesAgentBaseEnv",
+    "HermesAgentEnvConfig",
+]
--- a/environments/agent_loop.py
+++ b/environments/agent_loop.py
@@ -0,0 +1,372 @@
+"""
+HermesAgentLoop -- Reusable Multi-Turn Agent Engine
+
+Runs the hermes-agent tool-calling loop using standard OpenAI-spec tool calling.
+Works with any server that returns ChatCompletion objects with tool_calls:
+    - Phase 1: OpenAI server type (VLLM, SGLang, OpenRouter, OpenAI API)
+    - Phase 2: ManagedServer with client-side tool call parser
+
+The loop passes tools= and checks response.choices[0].message.tool_calls,
+identical to hermes-agent's run_agent.py. Tool execution is dispatched via
+handle_function_call() from model_tools.py.
+"""
+
+import asyncio
+import concurrent.futures
+import json
+import logging
+import uuid
+from dataclasses import dataclass, field
+from typing import Any, Dict, List, Optional, Set
+
+from model_tools import handle_function_call
+
+# Thread pool for running sync tool calls that internally use asyncio.run()
+# (e.g., mini-swe-agent's modal/docker backends). Running them in a separate
+# thread gives them a clean event loop so they don't deadlock inside Atropos's loop.
+_tool_executor = concurrent.futures.ThreadPoolExecutor(max_workers=8)
+
+logger = logging.getLogger(__name__)
+
+
+@dataclass
+class ToolError:
+    """Record of a tool execution error during the agent loop."""
+
+    turn: int                  # Which turn the error occurred on
+    tool_name: str             # Which tool was called
+    arguments: str             # The arguments passed (truncated)
+    error: str                 # The error message
+    tool_result: str           # The raw result returned to the model
+
+
+@dataclass
+class AgentResult:
+    """Result of running the agent loop."""
+
+    # Full conversation history in OpenAI message format
+    messages: List[Dict[str, Any]]
+    # ManagedServer.get_state() if available (Phase 2), None otherwise
+    managed_state: Optional[Dict[str, Any]] = None
+    # How many LLM calls were made
+    turns_used: int = 0
+    # True if model stopped calling tools naturally (vs hitting max_turns)
+    finished_naturally: bool = False
+    # Extracted reasoning content per turn (from PR #297 helpers)
+    reasoning_per_turn: List[Optional[str]] = field(default_factory=list)
+    # Tool errors encountered during the loop
+    tool_errors: List[ToolError] = field(default_factory=list)
+
+
+def _extract_reasoning_from_message(message) -> Optional[str]:
+    """
+    Extract reasoning content from a ChatCompletion message.
+
+    Handles multiple provider formats:
+    1. message.reasoning_content field (some providers)
+    2. message.reasoning field (some providers)
+    3. message.reasoning_details[].text (OpenRouter style)
+
+    Note: <think> block extraction from content is NOT done here -- that's
+    handled by the response already in Phase 1 (server does it) or by
+    ManagedServer's patch in Phase 2.
+
+    Args:
+        message: The assistant message from ChatCompletion response
+
+    Returns:
+        Extracted reasoning text, or None if not found
+    """
+    # Check reasoning_content field (common across providers)
+    if hasattr(message, "reasoning_content") and message.reasoning_content:
+        return message.reasoning_content
+
+    # Check reasoning field
+    if hasattr(message, "reasoning") and message.reasoning:
+        return message.reasoning
+
+    # Check reasoning_details (OpenRouter style)
+    if hasattr(message, "reasoning_details") and message.reasoning_details:
+        for detail in message.reasoning_details:
+            if hasattr(detail, "text") and detail.text:
+                return detail.text
+            if isinstance(detail, dict) and detail.get("text"):
+                return detail["text"]
+
+    return None
+
+
+class HermesAgentLoop:
+    """
+    Runs hermes-agent's tool-calling loop using standard OpenAI-spec tool calling.
+
+    Same pattern as run_agent.py:
+    - Pass tools= to the API
+    - Check response.choices[0].message.tool_calls
+    - Dispatch via handle_function_call()
+
+    Works identically with any server type -- OpenAI, VLLM, SGLang, OpenRouter,
+    or ManagedServer with a parser. The server determines how tool_calls get
+    populated on the response.
+    """
+
+    def __init__(
+        self,
+        server,
+        tool_schemas: List[Dict[str, Any]],
+        valid_tool_names: Set[str],
+        max_turns: int = 30,
+        task_id: Optional[str] = None,
+        temperature: float = 1.0,
+        max_tokens: Optional[int] = None,
+    ):
+        """
+        Initialize the agent loop.
+
+        Args:
+            server: Server object with chat_completion() method (OpenAIServer,
+                    ManagedServer, ServerManager, etc.)
+            tool_schemas: OpenAI-format tool definitions from get_tool_definitions()
+            valid_tool_names: Set of tool names the model is allowed to call
+            max_turns: Maximum number of LLM calls before stopping
+            task_id: Unique ID for terminal/browser session isolation
+            temperature: Sampling temperature for generation
+            max_tokens: Max tokens per generation (None for server default)
+        """
+        self.server = server
+        self.tool_schemas = tool_schemas
+        self.valid_tool_names = valid_tool_names
+        self.max_turns = max_turns
+        self.task_id = task_id or str(uuid.uuid4())
+        self.temperature = temperature
+        self.max_tokens = max_tokens
+
+    async def run(self, messages: List[Dict[str, Any]]) -> AgentResult:
+        """
+        Execute the full agent loop using standard OpenAI tool calling.
+
+        Args:
+            messages: Initial conversation messages (system + user).
+                      Modified in-place as the conversation progresses.
+
+        Returns:
+            AgentResult with full conversation history, managed state, and metadata
+        """
+        reasoning_per_turn = []
+        tool_errors: List[ToolError] = []
+
+        for turn in range(self.max_turns):
+            # Build the chat_completion kwargs
+            chat_kwargs = {
+                "messages": messages,
+                "n": 1,
+                "temperature": self.temperature,
+            }
+
+            # Only pass tools if we have them
+            if self.tool_schemas:
+                chat_kwargs["tools"] = self.tool_schemas
+
+            # Only pass max_tokens if explicitly set
+            if self.max_tokens is not None:
+                chat_kwargs["max_tokens"] = self.max_tokens
+
+            # Make the API call -- standard OpenAI spec
+            try:
+                response = await self.server.chat_completion(**chat_kwargs)
+            except Exception as e:
+                logger.error("API call failed on turn %d: %s", turn + 1, e)
+                return AgentResult(
+                    messages=messages,
+                    managed_state=self._get_managed_state(),
+                    turns_used=turn + 1,
+                    finished_naturally=False,
+                    reasoning_per_turn=reasoning_per_turn,
+                    tool_errors=tool_errors,
+                )
+
+            if not response or not response.choices:
+                logger.warning("Empty response on turn %d", turn + 1)
+                return AgentResult(
+                    messages=messages,
+                    managed_state=self._get_managed_state(),
+                    turns_used=turn + 1,
+                    finished_naturally=False,
+                    reasoning_per_turn=reasoning_per_turn,
+                    tool_errors=tool_errors,
+                )
+
+            assistant_msg = response.choices[0].message
+
+            # Extract reasoning content from the response (all provider formats)
+            reasoning = _extract_reasoning_from_message(assistant_msg)
+            reasoning_per_turn.append(reasoning)
+
+            # Check for tool calls -- standard OpenAI spec
+            if assistant_msg.tool_calls:
+                # Build the assistant message dict for conversation history
+                msg_dict: Dict[str, Any] = {
+                    "role": "assistant",
+                    "content": assistant_msg.content or "",
+                    "tool_calls": [
+                        {
+                            "id": tc.id,
+                            "type": "function",
+                            "function": {
+                                "name": tc.function.name,
+                                "arguments": tc.function.arguments,
+                            },
+                        }
+                        for tc in assistant_msg.tool_calls
+                    ],
+                }
+
+                # Preserve reasoning_content for multi-turn chat template handling
+                # (e.g., Kimi-K2's template renders <think> blocks differently
+                # for history vs. the latest turn based on this field)
+                if reasoning:
+                    msg_dict["reasoning_content"] = reasoning
+
+                messages.append(msg_dict)
+
+                # Execute each tool call via hermes-agent's dispatch
+                for tc in assistant_msg.tool_calls:
+                    tool_name = tc.function.name
+                    tool_args_raw = tc.function.arguments
+
+                    # Validate tool name
+                    if tool_name not in self.valid_tool_names:
+                        tool_result = json.dumps(
+                            {
+                                "error": f"Unknown tool '{tool_name}'. "
+                                f"Available tools: {sorted(self.valid_tool_names)}"
+                            }
+                        )
+                        tool_errors.append(ToolError(
+                            turn=turn + 1, tool_name=tool_name,
+                            arguments=tool_args_raw[:200],
+                            error=f"Unknown tool '{tool_name}'",
+                            tool_result=tool_result,
+                        ))
+                        logger.warning(
+                            "Model called unknown tool '%s' on turn %d",
+                            tool_name, turn + 1,
+                        )
+                    else:
+                        # Parse arguments and dispatch
+                        try:
+                            args = json.loads(tool_args_raw)
+                        except json.JSONDecodeError:
+                            args = {}
+                            logger.warning(
+                                "Invalid JSON in tool call arguments for '%s': %s",
+                                tool_name, tool_args_raw[:200],
+                            )
+
+                        try:
+                            if tool_name == "terminal":
+                                import os
+                                backend = os.getenv("TERMINAL_ENV", "local")
+                                cmd_preview = args.get("command", "")[:80]
+                                print(f"  🖥️  [{backend}] $ {cmd_preview}")
+
+                            # Run tool calls in a thread pool so backends that use
+                            # asyncio.run() internally (modal, docker) get a clean
+                            # event loop instead of deadlocking inside Atropos's loop.
+                            loop = asyncio.get_event_loop()
+                            tool_result = await loop.run_in_executor(
+                                _tool_executor,
+                                lambda: handle_function_call(
+                                    tool_name, args, task_id=self.task_id
+                                ),
+                            )
+                        except Exception as e:
+                            tool_result = json.dumps(
+                                {"error": f"Tool execution failed: {type(e).__name__}: {str(e)}"}
+                            )
+                            tool_errors.append(ToolError(
+                                turn=turn + 1, tool_name=tool_name,
+                                arguments=tool_args_raw[:200],
+                                error=f"{type(e).__name__}: {str(e)}",
+                                tool_result=tool_result,
+                            ))
+                            logger.error(
+                                "Tool '%s' execution failed on turn %d: %s",
+                                tool_name, turn + 1, e,
+                            )
+
+                        # Also check if the tool returned an error in its JSON result
+                        try:
+                            result_data = json.loads(tool_result)
+                            if isinstance(result_data, dict):
+                                err = result_data.get("error")
+                                exit_code = result_data.get("exit_code")
+                                if err and exit_code and exit_code < 0:
+                                    tool_errors.append(ToolError(
+                                        turn=turn + 1, tool_name=tool_name,
+                                        arguments=tool_args_raw[:200],
+                                        error=str(err),
+                                        tool_result=tool_result[:500],
+                                    ))
+                        except (json.JSONDecodeError, TypeError):
+                            pass
+
+                    # Add tool response to conversation
+                    messages.append(
+                        {
+                            "role": "tool",
+                            "tool_call_id": tc.id,
+                            "content": tool_result,
+                        }
+                    )
+
+                logger.debug(
+                    "Turn %d: %d tool calls executed",
+                    turn + 1,
+                    len(assistant_msg.tool_calls),
+                )
+
+            else:
+                # No tool calls -- model is done
+                msg_dict = {
+                    "role": "assistant",
+                    "content": assistant_msg.content or "",
+                }
+                if reasoning:
+                    msg_dict["reasoning_content"] = reasoning
+                messages.append(msg_dict)
+
+                logger.debug(
+                    "Turn %d: model finished naturally (no tool calls)", turn + 1
+                )
+
+                return AgentResult(
+                    messages=messages,
+                    managed_state=self._get_managed_state(),
+                    turns_used=turn + 1,
+                    finished_naturally=True,
+                    reasoning_per_turn=reasoning_per_turn,
+                    tool_errors=tool_errors,
+                )
+
+        # Hit max turns without the model stopping
+        logger.info("Agent hit max_turns (%d) without finishing", self.max_turns)
+        return AgentResult(
+            messages=messages,
+            managed_state=self._get_managed_state(),
+            turns_used=self.max_turns,
+            finished_naturally=False,
+            reasoning_per_turn=reasoning_per_turn,
+            tool_errors=tool_errors,
+        )
+
+    def _get_managed_state(self) -> Optional[Dict[str, Any]]:
+        """
+        Get ManagedServer state if the server supports it.
+
+        Returns state dict with SequenceNodes containing tokens/logprobs/masks,
+        or None if the server doesn't support get_state() (e.g., regular OpenAI server).
+        """
+        if hasattr(self.server, "get_state"):
+            return self.server.get_state()
+        return None
--- a/environments/configs/swe_default.yaml
+++ b/environments/configs/swe_default.yaml
@@ -0,0 +1,33 @@
+# SWE Environment -- Default Configuration
+#
+# SWE-bench style tasks with Modal sandboxes for cloud isolation.
+# Uses terminal + file + web toolsets.
+#
+# Usage:
+#   python environments/hermes_swe_env.py serve --config environments/configs/swe_default.yaml
+
+env:
+  enabled_toolsets: ["terminal", "file", "web"]
+  max_agent_turns: 30
+  max_token_length: 4096
+  group_size: 4
+  terminal_backend: "modal"
+  tool_call_parser: "hermes"
+  tokenizer_name: "NousResearch/DeepHermes-3-Llama-3-3B-Preview"
+  dataset_name: "bigcode/humanevalpack"
+  dataset_split: "test"
+  prompt_field: "prompt"
+  steps_per_eval: 50
+  total_steps: 500
+  use_wandb: true
+  wandb_name: "hermes-swe"
+  system_prompt: >
+    You are a skilled software engineer. You have access to a terminal,
+    file tools, and web search. Use these tools to complete the coding task.
+    Write clean, working code and verify it runs correctly before finishing.
+
+openai:
+  base_url: "http://localhost:8000/v1"
+  model_name: "NousResearch/DeepHermes-3-Llama-3-3B-Preview"
+  server_type: "openai"
+  api_key: ""
--- a/environments/configs/terminal_test_default.yaml
+++ b/environments/configs/terminal_test_default.yaml
@@ -0,0 +1,35 @@
+# Terminal Test Environment -- Default Configuration
+#
+# Simple file-creation tasks for validating the full Atropos + hermes-agent stack.
+# Uses Modal terminal backend and OpenRouter (Claude) for inference.
+# API keys loaded from ~/hermes-agent/.env
+#
+# Usage:
+#   run-api
+#   python environments/terminal_test_env.py serve
+#   # Or with config file:
+#   python environments/terminal_test_env.py serve --config environments/configs/terminal_test_default.yaml
+
+env:
+  enabled_toolsets: ["terminal", "file"]
+  max_agent_turns: 10
+  max_token_length: 2048
+  group_size: 3
+  total_steps: 3
+  steps_per_eval: 3
+  terminal_backend: "modal"
+  tool_call_parser: "hermes"
+  tokenizer_name: "NousResearch/DeepHermes-3-Llama-3-3B-Preview"
+  ensure_scores_are_not_same: false
+  use_wandb: false
+  system_prompt: >
+    You are a helpful assistant with access to a terminal and file tools.
+    Complete the user's request by using the available tools.
+    Be precise and follow instructions exactly.
+
+openai:
+  base_url: "https://openrouter.ai/api/v1"
+  model_name: "anthropic/claude-opus-4.6"
+  server_type: "openai"
+  health_check: false
+  # api_key loaded from OPENROUTER_API_KEY in .env
--- a/environments/hermes_base_env.py
+++ b/environments/hermes_base_env.py
@@ -0,0 +1,615 @@
+"""
+HermesAgentBaseEnv -- Abstract Base Environment for Hermes-Agent + Atropos
+
+Provides the Atropos integration plumbing that all hermes-agent environments share:
+- Two-mode operation (OpenAI server for Phase 1, VLLM ManagedServer for Phase 2)
+- Per-group toolset/distribution resolution
+- Agent loop orchestration via HermesAgentLoop
+- ToolContext creation for reward functions
+- ScoredDataGroup construction from ManagedServer state
+
+Subclasses only need to implement:
+    setup()           -- Load dataset, initialize state
+    get_next_item()   -- Return the next item from the dataset
+    format_prompt()   -- Convert a dataset item into the user message
+    compute_reward()  -- Score the rollout (has full ToolContext access)
+    evaluate()        -- Periodic evaluation
+"""
+
+import asyncio
+import json
+import logging
+import os
+import sys
+import uuid
+from abc import abstractmethod
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Set, Tuple, Union
+
+# Ensure the hermes-agent repo root is on sys.path so that imports like
+# `from model_tools import ...` and `from environments.X import ...` work
+# regardless of where the script is invoked from.
+_repo_root = Path(__file__).resolve().parent.parent
+if str(_repo_root) not in sys.path:
+    sys.path.insert(0, str(_repo_root))
+
+from dotenv import load_dotenv
+from pydantic import Field
+
+# Load API keys from hermes-agent/.env so all environments can access them
+_env_path = _repo_root / ".env"
+if _env_path.exists():
+    load_dotenv(dotenv_path=_env_path)
+
+# Apply monkey patches for async-safe tool operation inside Atropos's event loop.
+# This patches SwerexModalEnvironment to use a background thread instead of
+# asyncio.run(), which would deadlock inside Atropos. Safe for normal CLI too.
+from environments.patches import apply_patches
+apply_patches()
+
+from atroposlib.envs.base import (
+    BaseEnv,
+    BaseEnvConfig,
+    ScoredDataGroup,
+    ScoredDataItem,
+)
+from atroposlib.envs.server_handling.server_manager import (
+    APIServerConfig,
+    ServerBaseline,
+    ServerManager,
+)
+from atroposlib.type_definitions import Item
+
+from environments.agent_loop import AgentResult, HermesAgentLoop
+from environments.tool_context import ToolContext
+
+# Import hermes-agent toolset infrastructure
+from model_tools import get_tool_definitions
+from toolset_distributions import sample_toolsets_from_distribution
+
+logger = logging.getLogger(__name__)
+
+
+class HermesAgentEnvConfig(BaseEnvConfig):
+    """
+    Configuration for hermes-agent Atropos environments.
+
+    Extends BaseEnvConfig with agent-specific settings for toolsets,
+    terminal backend, dataset loading, and tool call parsing.
+    """
+
+    # --- Toolset configuration ---
+    # Mutually exclusive: use either enabled_toolsets OR distribution
+    enabled_toolsets: Optional[List[str]] = Field(
+        default=None,
+        description="Explicit list of hermes toolsets to enable (e.g., ['terminal', 'file', 'web']). "
+        "If None and distribution is also None, all available toolsets are enabled.",
+    )
+    disabled_toolsets: Optional[List[str]] = Field(
+        default=None,
+        description="Toolsets to disable. Applied as a filter on top of enabled_toolsets or distribution.",
+    )
+    distribution: Optional[str] = Field(
+        default=None,
+        description="Name of a toolset distribution from toolset_distributions.py "
+        "(e.g., 'development', 'terminal_tasks'). Sampled once per group. "
+        "Mutually exclusive with enabled_toolsets.",
+    )
+
+    # --- Agent loop configuration ---
+    max_agent_turns: int = Field(
+        default=30,
+        description="Maximum number of LLM calls (tool-calling iterations) per rollout.",
+    )
+    system_prompt: Optional[str] = Field(
+        default=None,
+        description="System prompt for the agent. Tools are handled via the tools= parameter, "
+        "not embedded in the prompt text.",
+    )
+    agent_temperature: float = Field(
+        default=1.0,
+        description="Sampling temperature for agent generation during rollouts.",
+    )
+
+    # --- Terminal backend ---
+    terminal_backend: str = Field(
+        default="local",
+        description="Terminal backend: 'local', 'docker', 'modal', 'ssh', 'singularity'. "
+        "Modal recommended for production RL (cloud isolation per rollout).",
+    )
+
+    # --- Dataset ---
+    dataset_name: Optional[str] = Field(
+        default=None,
+        description="HuggingFace dataset name. Optional if tasks are defined inline.",
+    )
+    dataset_split: str = Field(
+        default="train",
+        description="Dataset split to use.",
+    )
+    prompt_field: str = Field(
+        default="prompt",
+        description="Which field in the dataset contains the prompt.",
+    )
+
+    # --- Phase 2: Tool call parsing ---
+    tool_call_parser: str = Field(
+        default="hermes",
+        description="Tool call parser name for Phase 2 (VLLM server type). "
+        "Ignored in Phase 1 (OpenAI server type where VLLM parses natively). "
+        "Options: hermes, mistral, llama3_json, qwen, deepseek_v3, etc.",
+    )
+
+
+class HermesAgentBaseEnv(BaseEnv):
+    """
+    Abstract base environment for hermes-agent Atropos integration.
+
+    Handles two modes of operation:
+    - Phase 1 (OpenAI server type): Uses server.chat_completion() directly.
+      The server (VLLM, SGLang, OpenRouter, OpenAI) handles tool call parsing
+      and reasoning extraction natively. DummyManagedServer provides placeholder
+      tokens. Good for SFT data gen, verifier testing, evaluation.
+
+    - Phase 2 (VLLM server type): Uses ManagedServer for exact token IDs + logprobs
+      via /generate. Client-side tool call parser reconstructs structured tool_calls
+      from raw output. Full RL training capability.
+
+    Subclasses must implement:
+        setup()           -- Load dataset, initialize state
+        get_next_item()   -- Return the next item to roll out
+        format_prompt()   -- Convert a dataset item into the user message string
+        compute_reward()  -- Score the rollout using ToolContext
+        evaluate()        -- Periodic evaluation
+    """
+
+    name: Optional[str] = "hermes-agent"
+    env_config_cls = HermesAgentEnvConfig
+
+    def __init__(
+        self,
+        config: HermesAgentEnvConfig,
+        server_configs: Union[ServerBaseline, List[APIServerConfig]],
+        slurm=False,
+        testing=False,
+    ):
+        super().__init__(config, server_configs, slurm, testing)
+
+        # Set terminal backend environment variable so hermes tools pick it up
+        if config.terminal_backend:
+            os.environ["TERMINAL_ENV"] = config.terminal_backend
+            print(f"🖥️  Terminal backend: {config.terminal_backend}")
+
+        # Current group's resolved tools (set in collect_trajectories)
+        self._current_group_tools: Optional[Tuple[List[Dict], Set[str]]] = None
+
+        # Tool error tracking for wandb logging
+        self._tool_error_buffer: List[Dict[str, Any]] = []
+
+    # =========================================================================
+    # Toolset resolution (per-group)
+    # =========================================================================
+
+    def _resolve_tools_for_group(self) -> Tuple[List[Dict[str, Any]], Set[str]]:
+        """
+        Resolve toolsets for a group. Called once in collect_trajectories(),
+        then shared by all collect_trajectory() calls in the group.
+
+        If distribution is set, samples probabilistically.
+        If enabled_toolsets is set, uses that explicit list.
+        disabled_toolsets is applied as a filter on top.
+
+        Returns:
+            (tool_schemas, valid_tool_names) tuple
+        """
+        config = self.config
+
+        if config.distribution:
+            group_toolsets = sample_toolsets_from_distribution(config.distribution)
+            logger.info("Sampled toolsets from '%s': %s", config.distribution, group_toolsets)
+        else:
+            group_toolsets = config.enabled_toolsets  # None means "all available"
+
+        tools = get_tool_definitions(
+            enabled_toolsets=group_toolsets,
+            disabled_toolsets=config.disabled_toolsets,
+            quiet_mode=True,
+        )
+
+        valid_names = {t["function"]["name"] for t in tools} if tools else set()
+        logger.info("Resolved %d tools for group: %s", len(valid_names), sorted(valid_names))
+        return tools, valid_names
+
+    # =========================================================================
+    # Server mode detection
+    # =========================================================================
+
+    def _use_managed_server(self) -> bool:
+        """
+        Determine if we should use ManagedServer (Phase 2) or direct server (Phase 1).
+
+        Phase 2 (ManagedServer) is used when the server type is 'vllm' or 'sglang',
+        which go through the /generate endpoint for exact token tracking.
+
+        Phase 1 (direct server) is used for 'openai' server type, which uses
+        /v1/chat/completions with native tool call parsing.
+        """
+        if not self.server.servers:
+            return False
+
+        server = self.server.servers[0]
+        # If the server is an OpenAI server (not VLLM/SGLang), use direct mode
+        from atroposlib.envs.server_handling.openai_server import OpenAIServer
+        return not isinstance(server, OpenAIServer)
+
+    # =========================================================================
+    # Core Atropos integration
+    # =========================================================================
+
+    async def collect_trajectories(
+        self, item: Item
+    ) -> Tuple[
+        Union[Optional[ScoredDataGroup], List[Optional[ScoredDataGroup]]],
+        List[Item],
+    ]:
+        """
+        Override collect_trajectories to resolve toolsets once per group,
+        then delegate to the standard group-level collection.
+
+        The default BaseEnv.collect_trajectories() calls collect_trajectory()
+        group_size times in parallel. We resolve tools once here and store
+        them for all those calls to use.
+        """
+        # Resolve toolsets for this group (shared by all rollouts in the group)
+        self._current_group_tools = self._resolve_tools_for_group()
+
+        # Delegate to the default implementation which calls collect_trajectory()
+        # group_size times via asyncio.gather
+        return await super().collect_trajectories(item)
+
+    # =========================================================================
+    # Wandb rollout display -- format trajectories nicely
+    # =========================================================================
+
+    @staticmethod
+    def _format_trajectory_for_display(messages: List[Dict[str, Any]]) -> str:
+        """
+        Format a conversation's messages into a readable trajectory string
+        for wandb rollout tables. Shows tool calls, tool results, and reasoning
+        in a structured way instead of raw token decoding.
+        """
+        parts = []
+        for msg in messages:
+            role = msg.get("role", "unknown")
+            content = msg.get("content", "")
+
+            if role == "system":
+                parts.append(f"[SYSTEM]\n{content}")
+
+            elif role == "user":
+                parts.append(f"[USER]\n{content}")
+
+            elif role == "assistant":
+                # Show reasoning if present
+                reasoning = msg.get("reasoning_content", "")
+                if reasoning:
+                    # Truncate long reasoning for display
+                    if len(reasoning) > 300:
+                        reasoning = reasoning[:300] + "..."
+                    parts.append(f"[ASSISTANT thinking]\n{reasoning}")
+
+                # Show content
+                if content:
+                    parts.append(f"[ASSISTANT]\n{content}")
+
+                # Show tool calls
+                tool_calls = msg.get("tool_calls", [])
+                for tc in tool_calls:
+                    func = tc.get("function", {})
+                    name = func.get("name", "?")
+                    args = func.get("arguments", "{}")
+                    # Truncate long arguments for display
+                    if len(args) > 200:
+                        args = args[:200] + "..."
+                    parts.append(f"[TOOL CALL] {name}({args})")
+
+            elif role == "tool":
+                tool_id = msg.get("tool_call_id", "")
+                result = content
+                # Truncate long tool results for display
+                if len(result) > 500:
+                    result = result[:500] + "..."
+                parts.append(f"[TOOL RESULT] {result}")
+
+        return "\n\n".join(parts)
+
+    async def add_rollouts_for_wandb(
+        self,
+        scored_data,
+        item=None,
+    ):
+        """
+        Override to show formatted trajectories with tool calls visible,
+        instead of raw token decoding which loses all structure.
+        """
+        num_keep = self.config.num_rollouts_per_group_for_logging
+        if num_keep == -1:
+            num_keep = self.config.group_size
+
+        group = []
+        for i in range(min(num_keep, len(scored_data.get("scores", [])))):
+            score = scored_data["scores"][i]
+
+            # Use messages if available for rich display
+            messages = None
+            if scored_data.get("messages") and i < len(scored_data["messages"]):
+                messages = scored_data["messages"][i]
+
+            if messages:
+                text = self._format_trajectory_for_display(messages)
+            elif scored_data.get("tokens") and i < len(scored_data["tokens"]):
+                text = self.tokenizer.decode(scored_data["tokens"][i])
+            else:
+                text = "(no data)"
+
+            group.append((text, score))
+
+        self.rollouts_for_wandb.append(group)
+        if len(self.rollouts_for_wandb) > self.config.num_rollouts_to_keep:
+            self.rollouts_for_wandb.pop(0)
+
+    async def wandb_log(self, wandb_metrics: Optional[Dict] = None):
+        """Log base metrics including tool errors to wandb."""
+        if wandb_metrics is None:
+            wandb_metrics = {}
+
+        # Log tool error stats
+        if self._tool_error_buffer:
+            wandb_metrics["train/tool_errors_count"] = len(self._tool_error_buffer)
+
+            # Log error details as a summary string (tables can crash wandb on tmp cleanup)
+            error_summaries = []
+            for err in self._tool_error_buffer:
+                error_summaries.append(
+                    f"[turn {err['turn']}] {err['tool']}({err['args'][:80]}) -> {err['error'][:150]}"
+                )
+            wandb_metrics["train/tool_error_details"] = "\n".join(error_summaries)
+
+            # Also print to stdout for immediate visibility
+            for summary in error_summaries:
+                print(f"  Tool Error: {summary}")
+
+            self._tool_error_buffer = []
+        else:
+            wandb_metrics["train/tool_errors_count"] = 0
+
+        await super().wandb_log(wandb_metrics)
+
+    async def collect_trajectory(
+        self, item: Item
+    ) -> Tuple[Optional[Union[ScoredDataItem, Any]], List[Item]]:
+        """
+        Run a single rollout: agent loop + reward computation.
+
+        This is called group_size times in parallel by collect_trajectories().
+        Each call gets its own task_id for terminal/browser session isolation.
+        """
+        task_id = str(uuid.uuid4())
+
+        # Get group-level tools (resolved once in collect_trajectories)
+        if self._current_group_tools is None:
+            # Fallback: resolve per-trajectory if called outside collect_trajectories
+            tools, valid_names = self._resolve_tools_for_group()
+        else:
+            tools, valid_names = self._current_group_tools
+
+        # Build initial messages
+        messages: List[Dict[str, Any]] = []
+        if self.config.system_prompt:
+            messages.append({"role": "system", "content": self.config.system_prompt})
+        messages.append({"role": "user", "content": self.format_prompt(item)})
+
+        # Run the agent loop
+        result: AgentResult
+        if self._use_managed_server():
+            # Phase 2: ManagedServer with parser -- exact tokens + logprobs
+            # Load the tool call parser from registry based on config
+            from environments.tool_call_parsers import get_parser
+            try:
+                tc_parser = get_parser(self.config.tool_call_parser)
+            except KeyError:
+                logger.warning(
+                    "Tool call parser '%s' not found, falling back to 'hermes'",
+                    self.config.tool_call_parser,
+                )
+                tc_parser = get_parser("hermes")
+
+            try:
+                async with self.server.managed_server(
+                    tokenizer=self.tokenizer,
+                    tool_call_parser=tc_parser,
+                ) as managed:
+                    agent = HermesAgentLoop(
+                        server=managed,
+                        tool_schemas=tools,
+                        valid_tool_names=valid_names,
+                        max_turns=self.config.max_agent_turns,
+                        task_id=task_id,
+                        temperature=self.config.agent_temperature,
+                        max_tokens=self.config.max_token_length,
+                    )
+                    result = await agent.run(messages)
+            except NotImplementedError:
+                # DummyManagedServer not allowed -- fall back to Phase 1
+                logger.warning(
+                    "ManagedServer not available (OpenAI server?). "
+                    "Falling back to direct server mode."
+                )
+                agent = HermesAgentLoop(
+                    server=self.server,
+                    tool_schemas=tools,
+                    valid_tool_names=valid_names,
+                    max_turns=self.config.max_agent_turns,
+                    task_id=task_id,
+                    temperature=self.config.agent_temperature,
+                    max_tokens=self.config.max_token_length,
+                )
+                result = await agent.run(messages)
+        else:
+            # Phase 1: OpenAI server -- native tool_calls, placeholder tokens
+            agent = HermesAgentLoop(
+                server=self.server,
+                tool_schemas=tools,
+                valid_tool_names=valid_names,
+                max_turns=self.config.max_agent_turns,
+                task_id=task_id,
+                temperature=self.config.agent_temperature,
+                max_tokens=self.config.max_token_length,
+            )
+            result = await agent.run(messages)
+
+        # Skip reward computation if the agent loop produced no meaningful work
+        # (e.g., API call failed on turn 1). No point spinning up a Modal sandbox
+        # just to verify files that were never created.
+        only_system_and_user = all(
+            msg.get("role") in ("system", "user") for msg in result.messages
+        )
+        if result.turns_used == 0 or only_system_and_user:
+            logger.warning(
+                "Agent loop produced no output (turns=%d, msgs=%d). Skipping reward.",
+                result.turns_used, len(result.messages),
+            )
+            reward = 0.0
+        else:
+            # Compute reward using ToolContext (gives verifier full tool access)
+            ctx = ToolContext(task_id)
+            try:
+                reward = await self.compute_reward(item, result, ctx)
+            except Exception as e:
+                logger.error("compute_reward failed: %s", e)
+                reward = 0.0
+            finally:
+                ctx.cleanup()
+
+        # Track tool errors for wandb logging
+        if result.tool_errors:
+            for err in result.tool_errors:
+                self._tool_error_buffer.append({
+                    "turn": err.turn,
+                    "tool": err.tool_name,
+                    "args": err.arguments[:150],
+                    "error": err.error[:300],
+                    "result": err.tool_result[:300],
+                })
+
+        # Build ScoredDataItem from ManagedServer state
+        # Phase 2: real tokens/masks/logprobs from SequenceNodes
+        # Phase 1: placeholder tokens (still need a valid ScoredDataItem for the pipeline)
+        nodes = (result.managed_state or {}).get("nodes", [])
+
+        if nodes:
+            # Phase 2 (or DummyManagedServer): use actual node data
+            node = nodes[-1]  # Final sequence node = full trajectory
+            scored_item: Dict[str, Any] = {
+                "tokens": node.tokens,
+                "masks": node.masked_tokens,
+                "scores": reward,
+            }
+
+            # Include logprobs if available (Phase 2)
+            if hasattr(node, "logprobs") and node.logprobs:
+                scored_item["advantages"] = None  # Computed by trainer
+                scored_item["ref_logprobs"] = None
+        else:
+            # Phase 1 with no managed state: create placeholder tokens
+            # so the data pipeline doesn't break. These are NOT suitable
+            # for training but allow process mode (SFT data gen) to work.
+            # Tokenize the full conversation to get approximate tokens.
+            full_text = "\n".join(
+                msg.get("content", "") for msg in result.messages if msg.get("content")
+            )
+            if self.tokenizer:
+                tokens = self.tokenizer.encode(full_text, add_special_tokens=True)
+            else:
+                tokens = list(range(min(len(full_text) // 4, 128)))
+
+            scored_item = {
+                "tokens": tokens,
+                "masks": [-100] + tokens[1:],  # Mask first token as prompt
+                "scores": reward,
+            }
+
+        # Always include messages for wandb rollout display and data logging
+        scored_item["messages"] = result.messages
+
+        return scored_item, []
+
+    # =========================================================================
+    # Abstract methods -- subclasses must implement
+    # =========================================================================
+
+    @abstractmethod
+    async def setup(self):
+        """
+        Load dataset, initialize state.
+
+        Called once when the environment starts. Typical implementation:
+            self.dataset = load_dataset(self.config.dataset_name, split=self.config.dataset_split)
+            self.iter = 0
+        """
+        raise NotImplementedError
+
+    @abstractmethod
+    async def get_next_item(self) -> Item:
+        """
+        Return the next item from the dataset for rollout.
+
+        Called by the base env's main loop to get items for workers.
+        Should cycle through the dataset.
+        """
+        raise NotImplementedError
+
+    @abstractmethod
+    def format_prompt(self, item: Item) -> str:
+        """
+        Convert a dataset item into the user message for the agent.
+
+        Args:
+            item: Dataset item (dict, tuple, etc.)
+
+        Returns:
+            The prompt string to send to the agent
+        """
+        raise NotImplementedError
+
+    @abstractmethod
+    async def compute_reward(
+        self, item: Item, result: AgentResult, ctx: ToolContext
+    ) -> float:
+        """
+        Score the rollout. Has full access to:
+        - item: the original dataset item (ground truth, test commands, etc.)
+        - result: AgentResult with full messages, turn count, reasoning, etc.
+        - ctx: ToolContext -- call ANY hermes-agent tool (terminal, file, web,
+               browser, vision...) scoped to this rollout's sandbox. Nothing
+               is off-limits.
+
+        Args:
+            item: The dataset item that was rolled out
+            result: The agent's rollout result
+            ctx: ToolContext with full tool access for verification
+
+        Returns:
+            Reward float (typically 0.0 to 1.0, but any float is valid)
+        """
+        raise NotImplementedError
+
+    @abstractmethod
+    async def evaluate(self, *args, **kwargs):
+        """
+        Periodic evaluation. Called every steps_per_eval steps.
+
+        Typical implementation runs the agent on a held-out eval set
+        and logs metrics via wandb/evaluate_log.
+        """
+        raise NotImplementedError
--- a/environments/hermes_swe_env.py
+++ b/environments/hermes_swe_env.py
@@ -0,0 +1,229 @@
+"""
+HermesSweEnv -- SWE-Bench Style Environment with Modal Sandboxes
+
+A concrete environment for software engineering tasks where the model writes code
+and the reward function runs tests to verify correctness. Uses Modal terminal
+backend for cloud-isolated sandboxes per rollout.
+
+The reward function uses ToolContext.terminal() to run test commands in the same
+Modal sandbox the model used during its agentic loop. All filesystem state from
+the model's tool calls is preserved for verification.
+
+Usage:
+    # Phase 1: OpenAI server type
+    vllm serve YourModel --tool-parser hermes
+    run-api
+    python environments/hermes_swe_env.py serve \\
+        --openai.base_url http://localhost:8000/v1 \\
+        --openai.model_name YourModel \\
+        --openai.server_type openai \\
+        --env.dataset_name bigcode/humanevalpack \\
+        --env.terminal_backend modal
+
+    # Phase 2: VLLM server type (full RL training)
+    python environments/hermes_swe_env.py serve \\
+        --openai.base_url http://localhost:8000/v1 \\
+        --openai.model_name YourModel \\
+        --openai.server_type vllm \\
+        --env.tool_call_parser hermes \\
+        --env.terminal_backend modal
+"""
+
+import logging
+import sys
+import time
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple, Union
+
+# Ensure repo root is on sys.path for imports
+_repo_root = Path(__file__).resolve().parent.parent
+if str(_repo_root) not in sys.path:
+    sys.path.insert(0, str(_repo_root))
+
+from datasets import load_dataset
+
+from atroposlib.envs.base import ScoredDataGroup
+from atroposlib.envs.server_handling.server_manager import APIServerConfig
+from atroposlib.type_definitions import Item
+
+from environments.agent_loop import AgentResult
+from environments.hermes_base_env import HermesAgentBaseEnv, HermesAgentEnvConfig
+from environments.tool_context import ToolContext
+
+logger = logging.getLogger(__name__)
+
+
+class HermesSweEnvConfig(HermesAgentEnvConfig):
+    """Config with defaults for SWE-bench style tasks."""
+
+    pass  # Inherits all fields, overrides defaults in config_init
+
+
+class HermesSweEnv(HermesAgentBaseEnv):
+    """
+    SWE-bench style environment using Modal terminal backend.
+
+    The model gets a coding task, uses terminal + file + web tools to solve it,
+    and the reward function runs tests in the same Modal sandbox to verify.
+
+    Subclass this for specific SWE datasets (HumanEval, SWE-bench, etc.)
+    and customize format_prompt() and compute_reward() as needed.
+    """
+
+    name = "hermes-swe"
+    env_config_cls = HermesSweEnvConfig
+
+    @classmethod
+    def config_init(cls) -> Tuple[HermesSweEnvConfig, List[APIServerConfig]]:
+        """
+        Default configuration for the SWE environment.
+
+        Uses Modal terminal backend for cloud isolation and terminal + file + web toolsets.
+        """
+        env_config = HermesSweEnvConfig(
+            # Toolsets: terminal for running code, file for reading/writing, web for docs
+            enabled_toolsets=["terminal", "file", "web"],
+            disabled_toolsets=None,
+            distribution=None,
+            # Agent settings -- SWE tasks need more turns
+            max_agent_turns=30,
+            max_token_length=4096,
+            agent_temperature=1.0,
+            system_prompt=(
+                "You are a skilled software engineer. You have access to a terminal, "
+                "file tools, and web search. Use these tools to complete the coding task. "
+                "Write clean, working code and verify it runs correctly before finishing."
+            ),
+            # Modal backend for cloud-isolated sandboxes
+            terminal_backend="modal",
+            # Dataset -- override via CLI for your specific SWE dataset
+            dataset_name="bigcode/humanevalpack",
+            dataset_split="test",
+            prompt_field="prompt",
+            # Atropos settings
+            group_size=4,
+            tokenizer_name="NousResearch/DeepHermes-3-Llama-3-3B-Preview",
+            tool_call_parser="hermes",
+            steps_per_eval=50,
+            total_steps=500,
+            use_wandb=True,
+            wandb_name="hermes-swe",
+        )
+
+        server_configs = [
+            APIServerConfig(
+                base_url="http://localhost:8000/v1",
+                model_name="NousResearch/DeepHermes-3-Llama-3-3B-Preview",
+                server_type="openai",  # Phase 1; switch to "vllm" for Phase 2
+                api_key="",
+            )
+        ]
+
+        return env_config, server_configs
+
+    async def setup(self):
+        """Load the SWE dataset."""
+        if self.config.dataset_name:
+            self.dataset = load_dataset(
+                self.config.dataset_name, split=self.config.dataset_split
+            )
+        else:
+            # Placeholder if no dataset specified
+            self.dataset = []
+        self.iter = 0
+        self.reward_buffer: List[float] = []
+
+    async def get_next_item(self) -> Dict[str, Any]:
+        """Cycle through the SWE dataset."""
+        if not self.dataset:
+            raise ValueError("No dataset loaded. Set dataset_name in config.")
+        item = self.dataset[self.iter % len(self.dataset)]
+        self.iter += 1
+        return item
+
+    def format_prompt(self, item: Dict[str, Any]) -> str:
+        """
+        Format the SWE task prompt.
+
+        Override this in subclasses for different dataset formats.
+        Default assumes the dataset has a 'prompt' field and optionally a 'test' field.
+        """
+        prompt = item.get(self.config.prompt_field, "")
+
+        # If the dataset has test information, include it in the prompt
+        test_info = item.get("test", item.get("test_code", item.get("tests", "")))
+        if test_info:
+            prompt += f"\n\nTests to pass:\n{test_info}"
+
+        return prompt
+
+    async def compute_reward(
+        self, item: Dict[str, Any], result: AgentResult, ctx: ToolContext
+    ) -> float:
+        """
+        Score by running tests in the model's Modal sandbox.
+
+        Default implementation:
+        - If the dataset item has a 'test' or 'test_code' field, run it
+        - Check exit code: 0 = pass, non-zero = fail
+        - Partial credit for file creation
+
+        Override this in subclasses for more sophisticated reward logic.
+        """
+        # Find the test command from the dataset item
+        test_code = item.get("test", item.get("test_code", item.get("tests", "")))
+
+        if test_code:
+            # Run the test in the model's sandbox
+            test_result = ctx.terminal(
+                f'cd /workspace && python3 -c "{test_code}"', timeout=60
+            )
+
+            if test_result["exit_code"] == 0:
+                self.reward_buffer.append(1.0)
+                return 1.0
+
+        # Partial credit: check if the model created any Python files
+        file_check = ctx.terminal("find /workspace -name '*.py' -newer /tmp/.start_marker 2>/dev/null | head -5")
+        if file_check["exit_code"] == 0 and file_check.get("output", "").strip():
+            self.reward_buffer.append(0.1)
+            return 0.1
+
+        self.reward_buffer.append(0.0)
+        return 0.0
+
+    async def evaluate(self, *args, **kwargs):
+        """
+        Run evaluation on a held-out set.
+
+        Override for dataset-specific evaluation logic.
+        """
+        start_time = time.time()
+        end_time = time.time()
+
+        eval_metrics = {"eval/placeholder": 0.0}
+        await self.evaluate_log(
+            metrics=eval_metrics,
+            start_time=start_time,
+            end_time=end_time,
+        )
+
+    async def wandb_log(self, wandb_metrics: Optional[Dict] = None):
+        """Log SWE-specific metrics."""
+        if wandb_metrics is None:
+            wandb_metrics = {}
+
+        if self.reward_buffer:
+            wandb_metrics["train/avg_reward"] = sum(self.reward_buffer) / len(
+                self.reward_buffer
+            )
+            wandb_metrics["train/pass_rate"] = sum(
+                1 for r in self.reward_buffer if r == 1.0
+            ) / len(self.reward_buffer)
+            self.reward_buffer = []
+
+        await super().wandb_log(wandb_metrics)
+
+
+if __name__ == "__main__":
+    HermesSweEnv.cli()
--- a/environments/patches.py
+++ b/environments/patches.py
@@ -0,0 +1,188 @@
+"""
+Monkey patches for making hermes-agent tools work inside async frameworks (Atropos).
+
+Problem:
+    Some tools use asyncio.run() internally (e.g., mini-swe-agent's Modal backend,
+    web_extract). This crashes when called from inside Atropos's event loop because
+    asyncio.run() can't be nested.
+
+Solution:
+    Replace the problematic methods with versions that use a dedicated background
+    thread with its own event loop. The calling code sees the same sync interface --
+    call a function, get a result -- but internally the async work happens on a
+    separate thread that doesn't conflict with Atropos's loop.
+
+    These patches are safe for normal CLI use too: when there's no running event
+    loop, the behavior is identical (the background thread approach works regardless).
+
+What gets patched:
+    - SwerexModalEnvironment.__init__ -- creates Modal deployment on a background thread
+    - SwerexModalEnvironment.execute -- runs commands on the same background thread
+    - SwerexModalEnvironment.stop -- stops deployment on the background thread
+
+Usage:
+    Call apply_patches() once at import time (done automatically by hermes_base_env.py).
+    This is idempotent -- calling it multiple times is safe.
+"""
+
+import asyncio
+import logging
+import threading
+from typing import Any
+
+logger = logging.getLogger(__name__)
+
+_patches_applied = False
+
+
+class _AsyncWorker:
+    """
+    A dedicated background thread with its own event loop.
+
+    Allows sync code to submit async coroutines and block for results,
+    even when called from inside another running event loop. Used to
+    bridge sync tool interfaces with async backends (Modal, SWE-ReX).
+    """
+
+    def __init__(self):
+        self._loop: asyncio.AbstractEventLoop = None
+        self._thread: threading.Thread = None
+        self._started = threading.Event()
+
+    def start(self):
+        """Start the background event loop thread."""
+        self._thread = threading.Thread(target=self._run_loop, daemon=True)
+        self._thread.start()
+        self._started.wait(timeout=30)
+
+    def _run_loop(self):
+        """Background thread entry point -- runs the event loop forever."""
+        self._loop = asyncio.new_event_loop()
+        asyncio.set_event_loop(self._loop)
+        self._started.set()
+        self._loop.run_forever()
+
+    def run_coroutine(self, coro, timeout=600):
+        """
+        Submit a coroutine to the background loop and block until it completes.
+
+        Safe to call from any thread, including threads that already have
+        a running event loop.
+        """
+        if self._loop is None or self._loop.is_closed():
+            raise RuntimeError("AsyncWorker loop is not running")
+        future = asyncio.run_coroutine_threadsafe(coro, self._loop)
+        return future.result(timeout=timeout)
+
+    def stop(self):
+        """Stop the background event loop and join the thread."""
+        if self._loop and self._loop.is_running():
+            self._loop.call_soon_threadsafe(self._loop.stop)
+        if self._thread:
+            self._thread.join(timeout=10)
+
+
+def _patch_swerex_modal():
+    """
+    Monkey patch SwerexModalEnvironment to use a background thread event loop
+    instead of asyncio.run(). This makes it safe to call from inside Atropos's
+    async event loop.
+
+    The patched methods have the exact same interface and behavior -- the only
+    difference is HOW the async work is executed internally.
+    """
+    try:
+        from minisweagent.environments.extra.swerex_modal import (
+            SwerexModalEnvironment,
+            SwerexModalEnvironmentConfig,
+        )
+        from swerex.deployment.modal import ModalDeployment
+        from swerex.runtime.abstract import Command as RexCommand
+    except ImportError:
+        # mini-swe-agent or swe-rex not installed -- nothing to patch
+        logger.debug("mini-swe-agent Modal backend not available, skipping patch")
+        return
+
+    # Save original methods so we can refer to config handling
+    _original_init = SwerexModalEnvironment.__init__
+
+    def _patched_init(self, **kwargs):
+        """Patched __init__: creates Modal deployment on a background thread."""
+        self.config = SwerexModalEnvironmentConfig(**kwargs)
+
+        # Start a dedicated event loop thread for all Modal async operations
+        self._worker = _AsyncWorker()
+        self._worker.start()
+
+        # Create AND start the deployment entirely on the worker's loop/thread
+        # so all gRPC channels and async state are bound to that loop
+        async def _create_and_start():
+            deployment = ModalDeployment(
+                image=self.config.image,
+                startup_timeout=self.config.startup_timeout,
+                runtime_timeout=self.config.runtime_timeout,
+                deployment_timeout=self.config.deployment_timeout,
+                install_pipx=self.config.install_pipx,
+                modal_sandbox_kwargs=self.config.modal_sandbox_kwargs,
+            )
+            await deployment.start()
+            return deployment
+
+        self.deployment = self._worker.run_coroutine(_create_and_start())
+
+    def _patched_execute(self, command: str, cwd: str = "", *, timeout: int | None = None) -> dict[str, Any]:
+        """Patched execute: runs commands on the background thread's loop."""
+        async def _do_execute():
+            return await self.deployment.runtime.execute(
+                RexCommand(
+                    command=command,
+                    shell=True,
+                    check=False,
+                    cwd=cwd or self.config.cwd,
+                    timeout=timeout or self.config.timeout,
+                    merge_output_streams=True,
+                    env=self.config.env if self.config.env else None,
+                )
+            )
+
+        output = self._worker.run_coroutine(_do_execute())
+        return {
+            "output": output.stdout,
+            "returncode": output.exit_code,
+        }
+
+    def _patched_stop(self):
+        """Patched stop: stops deployment on the background thread, then stops the thread."""
+        try:
+            self._worker.run_coroutine(
+                asyncio.wait_for(self.deployment.stop(), timeout=10),
+                timeout=15,
+            )
+        except Exception:
+            pass
+        finally:
+            self._worker.stop()
+
+    # Apply the patches
+    SwerexModalEnvironment.__init__ = _patched_init
+    SwerexModalEnvironment.execute = _patched_execute
+    SwerexModalEnvironment.stop = _patched_stop
+
+    logger.debug("Patched SwerexModalEnvironment for async-safe operation")
+
+
+def apply_patches():
+    """
+    Apply all monkey patches needed for Atropos compatibility.
+
+    Safe to call multiple times -- patches are only applied once.
+    Safe for normal CLI use -- patched code works identically when
+    there is no running event loop.
+    """
+    global _patches_applied
+    if _patches_applied:
+        return
+
+    _patch_swerex_modal()
+
+    _patches_applied = True
--- a/environments/terminal_test_env.py
+++ b/environments/terminal_test_env.py
@@ -0,0 +1,292 @@
+"""
+TerminalTestEnv -- Simple Test Environment for Validating the Stack
+
+A self-contained environment with inline tasks (no external dataset needed).
+Each task asks the model to create a file at a known path with specific content.
+The reward verifier cats the file and checks if the content matches.
+
+Enables only terminal + file toolsets. Uses Modal terminal backend with
+OpenRouter (Claude) by default.
+
+Training tasks (3):
+    1. Create ~/greeting.txt with "Hello from Hermes Agent"
+    2. Create ~/count.txt with numbers 1-5, one per line
+    3. Create ~/answer.txt with the result of 123 + 456
+
+Eval task (1):
+    1. Create ~/result.txt with the result of 6 * 7
+
+Usage:
+    # Start Atropos API server
+    run-api
+
+    # Run environment (uses OpenRouter + Modal by default)
+    python environments/terminal_test_env.py serve
+
+    # Process mode (no run-api needed, saves to JSONL)
+    python environments/terminal_test_env.py process \\
+        --env.data_path_to_save_groups terminal_test_output.jsonl
+"""
+
+import logging
+import os
+import sys
+import time
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple, Union
+
+# Ensure repo root is on sys.path for imports
+_repo_root = Path(__file__).resolve().parent.parent
+if str(_repo_root) not in sys.path:
+    sys.path.insert(0, str(_repo_root))
+
+from atroposlib.envs.base import ScoredDataGroup
+from atroposlib.envs.server_handling.server_manager import APIServerConfig
+from atroposlib.type_definitions import Item
+
+from environments.agent_loop import AgentResult
+from environments.hermes_base_env import HermesAgentBaseEnv, HermesAgentEnvConfig
+from environments.tool_context import ToolContext
+
+logger = logging.getLogger(__name__)
+
+
+# =============================================================================
+# Inline task definitions -- no external dataset needed
+# =============================================================================
+
+TRAIN_TASKS = [
+    {
+        "prompt": "Create a file at ~/greeting.txt containing exactly the text: Hello from Hermes Agent",
+        "verify_path": "~/greeting.txt",
+        "expected_content": "Hello from Hermes Agent",
+    },
+    {
+        "prompt": "Create a file at ~/count.txt containing the numbers 1 through 5, one per line",
+        "verify_path": "~/count.txt",
+        "expected_content": "1\n2\n3\n4\n5",
+    },
+    {
+        "prompt": "Create a file at ~/answer.txt containing the result of 123 + 456",
+        "verify_path": "~/answer.txt",
+        "expected_content": "579",
+    },
+]
+
+EVAL_TASKS = [
+    {
+        "prompt": "Create a file at ~/result.txt containing the result of 6 * 7",
+        "verify_path": "~/result.txt",
+        "expected_content": "42",
+    },
+]
+
+
+class TerminalTestEnvConfig(HermesAgentEnvConfig):
+    """Config with defaults suitable for terminal testing."""
+
+    pass  # Inherits all fields, overrides defaults in config_init
+
+
+class TerminalTestEnv(HermesAgentBaseEnv):
+    """
+    Simple test environment with inline file-creation tasks.
+
+    All tasks follow the same pattern: "create a file at ~/X.txt with content Y".
+    The verifier runs `cat ~/X.txt` in the rollout's terminal and checks the output
+    against the expected string. Same verifier logic for all tasks.
+
+    This environment is designed to validate the full stack end-to-end:
+    - Agent loop executes tool calls (terminal/file)
+    - ToolContext provides terminal access to the reward function
+    - Reward function verifies file content via cat
+    - Scored data flows through the Atropos pipeline
+    """
+
+    name = "terminal-test"
+    env_config_cls = TerminalTestEnvConfig
+
+    @classmethod
+    def config_init(cls) -> Tuple[TerminalTestEnvConfig, List[APIServerConfig]]:
+        """
+        Default configuration for the terminal test environment.
+
+        Uses Modal terminal backend for cloud isolation and OpenRouter with
+        Claude for inference. API keys loaded from ~/hermes-agent/.env.
+        """
+        env_config = TerminalTestEnvConfig(
+            # Terminal + file tools only
+            enabled_toolsets=["terminal", "file"],
+            disabled_toolsets=None,
+            distribution=None,
+            # Agent settings
+            max_agent_turns=10,  # Simple tasks, don't need many turns
+            max_token_length=16000,
+            agent_temperature=1.0,
+            system_prompt=(
+                "You are a helpful assistant with access to a terminal and file tools. "
+                "Complete the user's request by using the available tools. "
+                "Be precise and follow instructions exactly."
+            ),
+            # Modal terminal backend for cloud-isolated sandboxes per rollout
+            terminal_backend="modal",
+            # Atropos settings
+            group_size=3,              # 3 rollouts per group
+            tokenizer_name="NousResearch/q-30b-t-h45-e1",
+            tool_call_parser="hermes",
+            steps_per_eval=3,          # Eval after all 3 steps
+            total_steps=3,             # 3 groups total (1 group per step)
+            use_wandb=True,
+            wandb_name="terminal-test",
+            ensure_scores_are_not_same=False,  # Allow all-same scores for simple tasks
+            # No external dataset
+            dataset_name=None,
+        )
+
+        # OpenRouter with Claude -- API key loaded from .env (OPENROUTER_API_KEY)
+        server_configs = [
+            APIServerConfig(
+                base_url="https://openrouter.ai/api/v1",
+                model_name="anthropic/claude-opus-4.6",
+                server_type="openai",
+                api_key=os.getenv("OPENROUTER_API_KEY", ""),
+                health_check=False,  # OpenRouter doesn't have a /health endpoint
+            )
+        ]
+
+        return env_config, server_configs
+
+    async def setup(self):
+        """Initialize inline task lists."""
+        self.train_tasks = list(TRAIN_TASKS)
+        self.eval_tasks = list(EVAL_TASKS)
+        self.iter = 0
+        # Track reward stats for wandb logging
+        self.reward_buffer: List[float] = []
+
+    async def get_next_item(self) -> Dict[str, str]:
+        """Cycle through training tasks."""
+        item = self.train_tasks[self.iter % len(self.train_tasks)]
+        self.iter += 1
+        return item
+
+    def format_prompt(self, item: Dict[str, str]) -> str:
+        """The prompt is directly in the task item."""
+        return item["prompt"]
+
+    async def compute_reward(
+        self, item: Dict[str, str], result: AgentResult, ctx: ToolContext
+    ) -> float:
+        """
+        Verify by cat-ing the expected file path and checking content matches.
+        Same verifier for all tasks -- they all write a file at a known path.
+
+        Scoring:
+            1.0 = exact match
+            0.5 = expected content is present but has extra stuff
+            0.0 = file doesn't exist or content doesn't match
+        """
+        verify_result = ctx.terminal(f"cat {item['verify_path']}")
+
+        # File doesn't exist or can't be read
+        if verify_result["exit_code"] != 0:
+            self.reward_buffer.append(0.0)
+            return 0.0
+
+        actual = verify_result.get("output", "").strip()
+        expected = item["expected_content"].strip()
+
+        # Exact match
+        if actual == expected:
+            self.reward_buffer.append(1.0)
+            return 1.0
+
+        # Partial credit: expected content is present but has extra stuff
+        if expected in actual:
+            self.reward_buffer.append(0.5)
+            return 0.5
+
+        self.reward_buffer.append(0.0)
+        return 0.0
+
+    async def evaluate(self, *args, **kwargs):
+        """
+        Run eval tasks using the agent loop and verify results.
+        Logs accuracy metrics.
+        """
+        start_time = time.time()
+        correct = 0
+        total = len(self.eval_tasks)
+        samples = []
+
+        for eval_item in self.eval_tasks:
+            try:
+                # For eval, we do a simple single-turn completion (not full agent loop)
+                # to keep eval fast. The agent loop is tested via training.
+                completion = await self.server.chat_completion(
+                    messages=[
+                        {"role": "system", "content": self.config.system_prompt or ""},
+                        {"role": "user", "content": eval_item["prompt"]},
+                    ],
+                    n=1,
+                    max_tokens=self.config.max_token_length,
+                    temperature=0.0,
+                    split="eval",
+                )
+
+                response_content = (
+                    completion.choices[0].message.content if completion.choices else ""
+                )
+
+                samples.append(
+                    {
+                        "prompt": eval_item["prompt"],
+                        "response": response_content,
+                        "expected": eval_item["expected_content"],
+                    }
+                )
+
+            except Exception as e:
+                logger.error("Eval failed for item: %s", e)
+                samples.append(
+                    {
+                        "prompt": eval_item["prompt"],
+                        "response": f"ERROR: {e}",
+                        "expected": eval_item["expected_content"],
+                    }
+                )
+
+        end_time = time.time()
+
+        eval_metrics = {
+            "eval/num_samples": total,
+        }
+
+        await self.evaluate_log(
+            metrics=eval_metrics,
+            samples=samples,
+            start_time=start_time,
+            end_time=end_time,
+        )
+
+    async def wandb_log(self, wandb_metrics: Optional[Dict] = None):
+        """Log training metrics including reward stats and accuracy."""
+        if wandb_metrics is None:
+            wandb_metrics = {}
+
+        if self.reward_buffer:
+            total = len(self.reward_buffer)
+            correct = sum(1 for r in self.reward_buffer if r == 1.0)
+            partial = sum(1 for r in self.reward_buffer if r == 0.5)
+
+            wandb_metrics["train/avg_reward"] = sum(self.reward_buffer) / total
+            wandb_metrics["train/accuracy"] = correct / total
+            wandb_metrics["train/partial_match_rate"] = partial / total
+            wandb_metrics["train/total_rollouts"] = total
+            self.reward_buffer = []
+
+        await super().wandb_log(wandb_metrics)
+
+
+if __name__ == "__main__":
+    TerminalTestEnv.cli()
--- a/environments/tool_call_parsers/init.py
+++ b/environments/tool_call_parsers/init.py
@@ -0,0 +1,120 @@
+"""
+Tool Call Parser Registry
+
+Client-side parsers that extract structured tool_calls from raw model output text.
+Used in Phase 2 (VLLM server type) where ManagedServer's /generate endpoint returns
+raw text without tool call parsing.
+
+Each parser is a standalone reimplementation of the corresponding VLLM parser's
+non-streaming extract_tool_calls() logic. No VLLM dependency -- only standard library
+(re, json, uuid) and openai types.
+
+Usage:
+    from environments.tool_call_parsers import get_parser
+
+    parser = get_parser("hermes")
+    content, tool_calls = parser.parse(raw_model_output)
+    # content = text with tool call markup stripped
+    # tool_calls = list of ChatCompletionMessageToolCall objects, or None
+"""
+
+import logging
+from abc import ABC, abstractmethod
+from typing import Dict, List, Optional, Tuple, Type
+
+from openai.types.chat.chat_completion_message_tool_call import (
+    ChatCompletionMessageToolCall,
+)
+
+logger = logging.getLogger(__name__)
+
+# Type alias for parser return value
+ParseResult = Tuple[Optional[str], Optional[List[ChatCompletionMessageToolCall]]]
+
+
+class ToolCallParser(ABC):
+    """
+    Base class for tool call parsers.
+
+    Each parser knows how to extract structured tool_calls from a specific
+    model family's raw output text format.
+    """
+
+    @abstractmethod
+    def parse(self, text: str) -> ParseResult:
+        """
+        Parse raw model output text for tool calls.
+
+        Args:
+            text: Raw decoded text from the model's completion
+
+        Returns:
+            Tuple of (content, tool_calls) where:
+            - content: text with tool call markup stripped (the message 'content' field),
+                       or None if the entire output was tool calls
+            - tool_calls: list of ChatCompletionMessageToolCall objects,
+                          or None if no tool calls were found
+        """
+        raise NotImplementedError
+
+
+# Global parser registry: name -> parser class
+PARSER_REGISTRY: Dict[str, Type[ToolCallParser]] = {}
+
+
+def register_parser(name: str):
+    """
+    Decorator to register a parser class under a given name.
+
+    Usage:
+        @register_parser("hermes")
+        class HermesToolCallParser(ToolCallParser):
+            ...
+    """
+
+    def decorator(cls: Type[ToolCallParser]) -> Type[ToolCallParser]:
+        PARSER_REGISTRY[name] = cls
+        return cls
+
+    return decorator
+
+
+def get_parser(name: str) -> ToolCallParser:
+    """
+    Get a parser instance by name.
+
+    Args:
+        name: Parser name (e.g., "hermes", "mistral", "llama3_json")
+
+    Returns:
+        Instantiated parser
+
+    Raises:
+        KeyError: If parser name is not found in registry
+    """
+    if name not in PARSER_REGISTRY:
+        available = sorted(PARSER_REGISTRY.keys())
+        raise KeyError(
+            f"Tool call parser '{name}' not found. Available parsers: {available}"
+        )
+    return PARSER_REGISTRY[name]()
+
+
+def list_parsers() -> List[str]:
+    """Return sorted list of registered parser names."""
+    return sorted(PARSER_REGISTRY.keys())
+
+
+# Import all parser modules to trigger registration via @register_parser decorators
+# Each module registers itself when imported
+from environments.tool_call_parsers.hermes_parser import HermesToolCallParser  # noqa: E402, F401
+from environments.tool_call_parsers.longcat_parser import LongcatToolCallParser  # noqa: E402, F401
+from environments.tool_call_parsers.mistral_parser import MistralToolCallParser  # noqa: E402, F401
+from environments.tool_call_parsers.llama_parser import LlamaToolCallParser  # noqa: E402, F401
+from environments.tool_call_parsers.qwen_parser import QwenToolCallParser  # noqa: E402, F401
+from environments.tool_call_parsers.deepseek_v3_parser import DeepSeekV3ToolCallParser  # noqa: E402, F401
+from environments.tool_call_parsers.deepseek_v3_1_parser import DeepSeekV31ToolCallParser  # noqa: E402, F401
+from environments.tool_call_parsers.kimi_k2_parser import KimiK2ToolCallParser  # noqa: E402, F401
+from environments.tool_call_parsers.glm45_parser import Glm45ToolCallParser  # noqa: E402, F401
+from environments.tool_call_parsers.glm47_parser import Glm47ToolCallParser  # noqa: E402, F401
+from environments.tool_call_parsers.qwen3_coder_parser import Qwen3CoderToolCallParser  # noqa: E402, F401
--- a/environments/tool_call_parsers/deepseek_v3_1_parser.py
+++ b/environments/tool_call_parsers/deepseek_v3_1_parser.py
@@ -0,0 +1,71 @@
+"""
+DeepSeek V3.1 tool call parser.
+
+Similar to V3 but with a slightly different format:
+    <｜tool▁call▁begin｜>function_name<｜tool▁sep｜>arguments<｜tool▁call▁end｜>
+
+Note: V3 has type+name before the separator, V3.1 has name before and args after.
+
+Based on VLLM's DeepSeekV31ToolParser.extract_tool_calls()
+"""
+
+import re
+import uuid
+from typing import List, Optional
+
+from openai.types.chat.chat_completion_message_tool_call import (
+    ChatCompletionMessageToolCall,
+    Function,
+)
+
+from environments.tool_call_parsers import ParseResult, ToolCallParser, register_parser
+
+
+@register_parser("deepseek_v3_1")
+@register_parser("deepseek_v31")
+class DeepSeekV31ToolCallParser(ToolCallParser):
+    """
+    Parser for DeepSeek V3.1 tool calls.
+
+    Slightly different regex than V3: function_name comes before the separator,
+    arguments come after (no type field, no json code block wrapper).
+    """
+
+    START_TOKEN = "<｜tool▁calls▁begin｜>"
+
+    # Regex captures: function_name, function_arguments
+    PATTERN = re.compile(
+        r"<｜tool▁call▁begin｜>(?P<function_name>.*?)<｜tool▁sep｜>(?P<function_arguments>.*?)<｜tool▁call▁end｜>"
+    )
+
+    def parse(self, text: str) -> ParseResult:
+        if self.START_TOKEN not in text:
+            return text, None
+
+        try:
+            matches = self.PATTERN.findall(text)
+            if not matches:
+                return text, None
+
+            tool_calls: List[ChatCompletionMessageToolCall] = []
+            for match in matches:
+                func_name, func_args = match
+                tool_calls.append(
+                    ChatCompletionMessageToolCall(
+                        id=f"call_{uuid.uuid4().hex[:8]}",
+                        type="function",
+                        function=Function(
+                            name=func_name.strip(),
+                            arguments=func_args.strip(),
+                        ),
+                    )
+                )
+
+            if not tool_calls:
+                return text, None
+
+            content = text[: text.find(self.START_TOKEN)].strip()
+            return content if content else None, tool_calls
+
+        except Exception:
+            return text, None
--- a/environments/tool_call_parsers/deepseek_v3_parser.py
+++ b/environments/tool_call_parsers/deepseek_v3_parser.py
@@ -0,0 +1,75 @@
+"""
+DeepSeek V3 tool call parser.
+
+Format uses special unicode tokens:
+    <｜tool▁calls▁begin｜>
+    <｜tool▁call▁begin｜>type<｜tool▁sep｜>function_name
+    ```json
+    {"arg": "value"}
+    ```
+    <｜tool▁call▁end｜>
+    <｜tool▁calls▁end｜>
+
+Based on VLLM's DeepSeekV3ToolParser.extract_tool_calls()
+"""
+
+import re
+import uuid
+from typing import List, Optional
+
+from openai.types.chat.chat_completion_message_tool_call import (
+    ChatCompletionMessageToolCall,
+    Function,
+)
+
+from environments.tool_call_parsers import ParseResult, ToolCallParser, register_parser
+
+
+@register_parser("deepseek_v3")
+class DeepSeekV3ToolCallParser(ToolCallParser):
+    """
+    Parser for DeepSeek V3 tool calls.
+
+    Uses special unicode tokens with fullwidth angle brackets and block elements.
+    Extracts type, function name, and JSON arguments from the structured format.
+    """
+
+    START_TOKEN = "<｜tool▁calls▁begin｜>"
+
+    # Regex captures: type, function_name, function_arguments
+    PATTERN = re.compile(
+        r"<｜tool▁call▁begin｜>(?P<type>.*)<｜tool▁sep｜>(?P<function_name>.*)\n```json\n(?P<function_arguments>.*)\n```<｜tool▁call▁end｜>"
+    )
+
+    def parse(self, text: str) -> ParseResult:
+        if self.START_TOKEN not in text:
+            return text, None
+
+        try:
+            matches = self.PATTERN.findall(text)
+            if not matches:
+                return text, None
+
+            tool_calls: List[ChatCompletionMessageToolCall] = []
+            for match in matches:
+                tc_type, func_name, func_args = match
+                tool_calls.append(
+                    ChatCompletionMessageToolCall(
+                        id=f"call_{uuid.uuid4().hex[:8]}",
+                        type="function",
+                        function=Function(
+                            name=func_name.strip(),
+                            arguments=func_args.strip(),
+                        ),
+                    )
+                )
+
+            if not tool_calls:
+                return text, None
+
+            # Content is everything before the tool calls section
+            content = text[: text.find(self.START_TOKEN)].strip()
+            return content if content else None, tool_calls
+
+        except Exception:
+            return text, None
--- a/environments/tool_call_parsers/glm45_parser.py
+++ b/environments/tool_call_parsers/glm45_parser.py
@@ -0,0 +1,109 @@
+"""
+GLM 4.5 (GLM-4-MoE) tool call parser.
+
+Format uses custom arg_key/arg_value tags rather than standard JSON:
+    <tool_call>function_name
+    <arg_key>param1</arg_key><arg_value>value1</arg_value>
+    <arg_key>param2</arg_key><arg_value>value2</arg_value>
+    </tool_call>
+
+Values are deserialized using json.loads -> ast.literal_eval -> raw string fallback.
+
+Based on VLLM's Glm4MoeModelToolParser.extract_tool_calls()
+"""
+
+import ast
+import json
+import re
+import uuid
+from typing import Any, Dict, List, Optional
+
+from openai.types.chat.chat_completion_message_tool_call import (
+    ChatCompletionMessageToolCall,
+    Function,
+)
+
+from environments.tool_call_parsers import ParseResult, ToolCallParser, register_parser
+
+
+def _deserialize_value(value: str) -> Any:
+    """
+    Try to deserialize a string value to its native Python type.
+    Attempts json.loads, then ast.literal_eval, then returns raw string.
+    """
+    try:
+        return json.loads(value)
+    except (json.JSONDecodeError, TypeError):
+        pass
+
+    try:
+        return ast.literal_eval(value)
+    except (ValueError, SyntaxError, TypeError):
+        pass
+
+    return value
+
+
+@register_parser("glm45")
+class Glm45ToolCallParser(ToolCallParser):
+    """
+    Parser for GLM 4.5 (GLM-4-MoE) tool calls.
+
+    Uses <tool_call>...</tool_call> tags with <arg_key>/<arg_value> pairs
+    instead of standard JSON arguments.
+    """
+
+    FUNC_CALL_REGEX = re.compile(r"<tool_call>.*?</tool_call>", re.DOTALL)
+    FUNC_DETAIL_REGEX = re.compile(r"<tool_call>([^\n]*)\n(.*)</tool_call>", re.DOTALL)
+    FUNC_ARG_REGEX = re.compile(
+        r"<arg_key>(.*?)</arg_key>\s*<arg_value>(.*?)</arg_value>", re.DOTALL
+    )
+
+    START_TOKEN = "<tool_call>"
+
+    def parse(self, text: str) -> ParseResult:
+        if self.START_TOKEN not in text:
+            return text, None
+
+        try:
+            matched_calls = self.FUNC_CALL_REGEX.findall(text)
+            if not matched_calls:
+                return text, None
+
+            tool_calls: List[ChatCompletionMessageToolCall] = []
+
+            for match in matched_calls:
+                detail = self.FUNC_DETAIL_REGEX.search(match)
+                if not detail:
+                    continue
+
+                func_name = detail.group(1).strip()
+                func_args_raw = detail.group(2)
+
+                # Parse arg_key/arg_value pairs
+                pairs = self.FUNC_ARG_REGEX.findall(func_args_raw) if func_args_raw else []
+                arg_dict: Dict[str, Any] = {}
+                for key, value in pairs:
+                    arg_key = key.strip()
+                    arg_val = _deserialize_value(value.strip())
+                    arg_dict[arg_key] = arg_val
+
+                tool_calls.append(
+                    ChatCompletionMessageToolCall(
+                        id=f"call_{uuid.uuid4().hex[:8]}",
+                        type="function",
+                        function=Function(
+                            name=func_name,
+                            arguments=json.dumps(arg_dict, ensure_ascii=False),
+                        ),
+                    )
+                )
+
+            if not tool_calls:
+                return text, None
+
+            content = text[: text.find(self.START_TOKEN)].strip()
+            return content if content else None, tool_calls
+
+        except Exception:
+            return text, None
--- a/environments/tool_call_parsers/glm47_parser.py
+++ b/environments/tool_call_parsers/glm47_parser.py
@@ -0,0 +1,35 @@
+"""
+GLM 4.7 tool call parser.
+
+Same as GLM 4.5 but with slightly different regex patterns.
+The tool_call tags may wrap differently and arg parsing handles
+newlines between key/value pairs.
+
+Based on VLLM's Glm47MoeModelToolParser (extends Glm4MoeModelToolParser).
+"""
+
+import re
+
+from environments.tool_call_parsers import ParseResult, register_parser
+from environments.tool_call_parsers.glm45_parser import Glm45ToolCallParser
+
+
+@register_parser("glm47")
+class Glm47ToolCallParser(Glm45ToolCallParser):
+    """
+    Parser for GLM 4.7 tool calls.
+    Extends GLM 4.5 with updated regex patterns.
+    """
+
+    def __init__(self):
+        super().__init__()
+        # GLM 4.7 uses a slightly different detail regex that includes
+        # the <tool_call> wrapper and optional arg_key content
+        self.FUNC_DETAIL_REGEX = re.compile(
+            r"<tool_call>(.*?)(<arg_key>.*?)?</tool_call>", re.DOTALL
+        )
+        # GLM 4.7 handles newlines between arg_key and arg_value tags
+        self.FUNC_ARG_REGEX = re.compile(
+            r"<arg_key>(.*?)</arg_key>(?:\\n|\s)*<arg_value>(.*?)</arg_value>",
+            re.DOTALL,
+        )
--- a/environments/tool_call_parsers/hermes_parser.py
+++ b/environments/tool_call_parsers/hermes_parser.py
@@ -0,0 +1,73 @@
+"""
+Hermes tool call parser.
+
+Format: <tool_call>{"name": "func", "arguments": {...}}</tool_call>
+Based on VLLM's Hermes2ProToolParser.extract_tool_calls()
+"""
+
+import json
+import re
+import uuid
+from typing import List, Optional, Tuple
+
+from openai.types.chat.chat_completion_message_tool_call import (
+    ChatCompletionMessageToolCall,
+    Function,
+)
+
+from environments.tool_call_parsers import ParseResult, ToolCallParser, register_parser
+
+
+@register_parser("hermes")
+class HermesToolCallParser(ToolCallParser):
+    """
+    Parser for Hermes-format tool calls.
+
+    Matches <tool_call>...</tool_call> tags containing JSON with "name" and "arguments".
+    Also handles unclosed <tool_call> at end-of-string (truncated generation).
+    """
+
+    # Matches both closed and unclosed tool_call tags
+    PATTERN = re.compile(
+        r"<tool_call>\s*(.*?)\s*</tool_call>|<tool_call>\s*(.*)", re.DOTALL
+    )
+
+    def parse(self, text: str) -> ParseResult:
+        if "<tool_call>" not in text:
+            return text, None
+
+        try:
+            matches = self.PATTERN.findall(text)
+            if not matches:
+                return text, None
+
+            tool_calls: List[ChatCompletionMessageToolCall] = []
+            for match in matches:
+                # match is a tuple: (closed_content, unclosed_content)
+                raw_json = match[0] if match[0] else match[1]
+                if not raw_json.strip():
+                    continue
+
+                tc_data = json.loads(raw_json)
+                tool_calls.append(
+                    ChatCompletionMessageToolCall(
+                        id=f"call_{uuid.uuid4().hex[:8]}",
+                        type="function",
+                        function=Function(
+                            name=tc_data["name"],
+                            arguments=json.dumps(
+                                tc_data.get("arguments", {}), ensure_ascii=False
+                            ),
+                        ),
+                    )
+                )
+
+            if not tool_calls:
+                return text, None
+
+            # Content is everything before the first <tool_call> tag
+            content = text[: text.find("<tool_call>")].strip()
+            return content if content else None, tool_calls
+
+        except Exception:
+            return text, None
--- a/environments/tool_call_parsers/kimi_k2_parser.py
+++ b/environments/tool_call_parsers/kimi_k2_parser.py
@@ -0,0 +1,93 @@
+"""
+Kimi K2 tool call parser.
+
+Format:
+    <|tool_calls_section_begin|>
+    <|tool_call_begin|>function_id:0<|tool_call_argument_begin|>{"arg": "val"}<|tool_call_end|>
+    <|tool_calls_section_end|>
+
+The function_id format is typically "functions.func_name:index" or "func_name:index".
+
+Based on VLLM's KimiK2ToolParser.extract_tool_calls()
+"""
+
+import re
+import uuid
+from typing import List, Optional
+
+from openai.types.chat.chat_completion_message_tool_call import (
+    ChatCompletionMessageToolCall,
+    Function,
+)
+
+from environments.tool_call_parsers import ParseResult, ToolCallParser, register_parser
+
+
+@register_parser("kimi_k2")
+class KimiK2ToolCallParser(ToolCallParser):
+    """
+    Parser for Kimi K2 tool calls.
+
+    Uses section begin/end tokens wrapping individual tool call begin/end tokens.
+    The tool_call_id contains the function name (after last dot, before colon).
+    """
+
+    # Support both singular and plural variants
+    START_TOKENS = [
+        "<|tool_calls_section_begin|>",
+        "<|tool_call_section_begin|>",
+    ]
+
+    # Regex captures: tool_call_id (e.g., "functions.get_weather:0"), function_arguments
+    PATTERN = re.compile(
+        r"<\|tool_call_begin\|>\s*(?P<tool_call_id>[^<]+:\d+)\s*"
+        r"<\|tool_call_argument_begin\|>\s*"
+        r"(?P<function_arguments>(?:(?!<\|tool_call_begin\|>).)*?)\s*"
+        r"<\|tool_call_end\|>",
+        re.DOTALL,
+    )
+
+    def parse(self, text: str) -> ParseResult:
+        # Check for any variant of the start token
+        has_start = any(token in text for token in self.START_TOKENS)
+        if not has_start:
+            return text, None
+
+        try:
+            matches = self.PATTERN.findall(text)
+            if not matches:
+                return text, None
+
+            tool_calls: List[ChatCompletionMessageToolCall] = []
+            for match in matches:
+                function_id, function_args = match
+
+                # Extract function name from ID format: "functions.get_weather:0" -> "get_weather"
+                function_name = function_id.split(":")[0].split(".")[-1]
+
+                tool_calls.append(
+                    ChatCompletionMessageToolCall(
+                        id=function_id,  # Preserve the original ID format
+                        type="function",
+                        function=Function(
+                            name=function_name,
+                            arguments=function_args.strip(),
+                        ),
+                    )
+                )
+
+            if not tool_calls:
+                return text, None
+
+            # Content is everything before the tool calls section
+            earliest_start = len(text)
+            for token in self.START_TOKENS:
+                idx = text.find(token)
+                if idx >= 0 and idx < earliest_start:
+                    earliest_start = idx
+
+            content = text[:earliest_start].strip()
+            return content if content else None, tool_calls
+
+        except Exception:
+            return text, None
--- a/environments/tool_call_parsers/llama_parser.py
+++ b/environments/tool_call_parsers/llama_parser.py
@@ -0,0 +1,96 @@
+"""
+Llama 3.x / 4 tool call parser.
+
+Format: The model outputs JSON objects with "name" and "arguments" (or "parameters") keys.
+May be preceded by <|python_tag|> token. Supports multiple JSON objects separated
+by content or semicolons.
+
+Based on VLLM's Llama3JsonToolParser.extract_tool_calls()
+"""
+
+import json
+import re
+import uuid
+from typing import List, Optional
+
+from openai.types.chat.chat_completion_message_tool_call import (
+    ChatCompletionMessageToolCall,
+    Function,
+)
+
+from environments.tool_call_parsers import ParseResult, ToolCallParser, register_parser
+
+
+@register_parser("llama3_json")
+@register_parser("llama4_json")
+class LlamaToolCallParser(ToolCallParser):
+    """
+    Parser for Llama 3.x and 4 JSON-format tool calls.
+
+    Finds JSON objects containing "name" + ("arguments" or "parameters") keys.
+    Uses Python's json.JSONDecoder.raw_decode for robust extraction of
+    JSON objects from mixed text.
+    """
+
+    BOT_TOKEN = "<|python_tag|>"
+
+    # Regex to find the start of potential JSON objects
+    JSON_START = re.compile(r"\{")
+
+    def parse(self, text: str) -> ParseResult:
+        # Quick check: need either the bot token or a JSON brace
+        if self.BOT_TOKEN not in text and "{" not in text:
+            return text, None
+
+        try:
+            decoder = json.JSONDecoder()
+            tool_calls: List[ChatCompletionMessageToolCall] = []
+            end_index = -1  # Track where the last parsed JSON ended
+
+            for match in self.JSON_START.finditer(text):
+                start = match.start()
+                # Skip if this brace is inside a previously parsed JSON object
+                if start <= end_index:
+                    continue
+
+                try:
+                    obj, json_end = decoder.raw_decode(text[start:])
+                    end_index = start + json_end
+
+                    # Must have "name" and either "arguments" or "parameters"
+                    name = obj.get("name")
+                    args = obj.get("arguments", obj.get("parameters"))
+
+                    if not name or args is None:
+                        continue
+
+                    # Normalize arguments to JSON string
+                    if isinstance(args, dict):
+                        args = json.dumps(args, ensure_ascii=False)
+                    elif not isinstance(args, str):
+                        args = json.dumps(args, ensure_ascii=False)
+
+                    tool_calls.append(
+                        ChatCompletionMessageToolCall(
+                            id=f"call_{uuid.uuid4().hex[:8]}",
+                            type="function",
+                            function=Function(name=name, arguments=args),
+                        )
+                    )
+                except (json.JSONDecodeError, KeyError, ValueError):
+                    continue
+
+            if not tool_calls:
+                return text, None
+
+            # Content is everything before the first tool call JSON
+            # Find where the first tool call starts in the text
+            first_tc_start = text.find("{")
+            if self.BOT_TOKEN in text:
+                first_tc_start = text.find(self.BOT_TOKEN)
+            content = text[:first_tc_start].strip() if first_tc_start > 0 else None
+
+            return content, tool_calls
+
+        except Exception:
+            return text, None
--- a/environments/tool_call_parsers/longcat_parser.py
+++ b/environments/tool_call_parsers/longcat_parser.py
@@ -0,0 +1,69 @@
+"""
+Longcat Flash Chat tool call parser.
+
+Same as Hermes but uses <longcat_tool_call> tags instead of <tool_call>.
+Based on VLLM's LongcatFlashToolParser (extends Hermes2ProToolParser).
+"""
+
+import json
+import re
+import uuid
+from typing import List, Optional
+
+from openai.types.chat.chat_completion_message_tool_call import (
+    ChatCompletionMessageToolCall,
+    Function,
+)
+
+from environments.tool_call_parsers import ParseResult, ToolCallParser, register_parser
+
+
+@register_parser("longcat")
+class LongcatToolCallParser(ToolCallParser):
+    """
+    Parser for Longcat Flash Chat tool calls.
+    Identical logic to Hermes, just different tag names.
+    """
+
+    PATTERN = re.compile(
+        r"<longcat_tool_call>\s*(.*?)\s*</longcat_tool_call>|<longcat_tool_call>\s*(.*)",
+        re.DOTALL,
+    )
+
+    def parse(self, text: str) -> ParseResult:
+        if "<longcat_tool_call>" not in text:
+            return text, None
+
+        try:
+            matches = self.PATTERN.findall(text)
+            if not matches:
+                return text, None
+
+            tool_calls: List[ChatCompletionMessageToolCall] = []
+            for match in matches:
+                raw_json = match[0] if match[0] else match[1]
+                if not raw_json.strip():
+                    continue
+
+                tc_data = json.loads(raw_json)
+                tool_calls.append(
+                    ChatCompletionMessageToolCall(
+                        id=f"call_{uuid.uuid4().hex[:8]}",
+                        type="function",
+                        function=Function(
+                            name=tc_data["name"],
+                            arguments=json.dumps(
+                                tc_data.get("arguments", {}), ensure_ascii=False
+                            ),
+                        ),
+                    )
+                )
+
+            if not tool_calls:
+                return text, None
+
+            content = text[: text.find("<longcat_tool_call>")].strip()
+            return content if content else None, tool_calls
+
+        except Exception:
+            return text, None
--- a/environments/tool_call_parsers/mistral_parser.py
+++ b/environments/tool_call_parsers/mistral_parser.py
@@ -0,0 +1,130 @@
+"""
+Mistral tool call parser.
+
+Supports two formats depending on tokenizer version:
+- Pre-v11: content[TOOL_CALLS] [{"name": ..., "arguments": {...}}, ...]
+- v11+:    content[TOOL_CALLS]tool_name1{"arg": "val"}[TOOL_CALLS]tool_name2{"arg": "val"}
+
+Based on VLLM's MistralToolParser.extract_tool_calls()
+The [TOOL_CALLS] token is the bot_token used by Mistral models.
+"""
+
+import json
+import re
+import uuid
+from typing import List, Optional
+
+from openai.types.chat.chat_completion_message_tool_call import (
+    ChatCompletionMessageToolCall,
+    Function,
+)
+
+from environments.tool_call_parsers import ParseResult, ToolCallParser, register_parser
+
+
+def _generate_mistral_id() -> str:
+    """Mistral tool call IDs are 9-char alphanumeric strings."""
+    import random
+    import string
+
+    return "".join(random.choices(string.ascii_letters + string.digits, k=9))
+
+
+@register_parser("mistral")
+class MistralToolCallParser(ToolCallParser):
+    """
+    Parser for Mistral-format tool calls.
+
+    Detects format by checking if the content after [TOOL_CALLS] starts with '['
+    (pre-v11 JSON array) or with a tool name (v11+ format).
+    """
+
+    # The [TOOL_CALLS] token -- may appear as different strings depending on tokenizer
+    BOT_TOKEN = "[TOOL_CALLS]"
+
+    # Fallback regex for pre-v11 format when JSON parsing fails
+    TOOL_CALL_REGEX = re.compile(r"\[?\s*(\{.*?\})\s*\]?", re.DOTALL)
+
+    def parse(self, text: str) -> ParseResult:
+        if self.BOT_TOKEN not in text:
+            return text, None
+
+        try:
+            parts = text.split(self.BOT_TOKEN)
+            content = parts[0].strip()
+            raw_tool_calls = parts[1:]
+
+            # Detect format: if the first raw part starts with '[', it's pre-v11
+            first_raw = raw_tool_calls[0].strip() if raw_tool_calls else ""
+            is_pre_v11 = first_raw.startswith("[") or first_raw.startswith("{")
+
+            tool_calls: List[ChatCompletionMessageToolCall] = []
+
+            if not is_pre_v11:
+                # v11+ format: [TOOL_CALLS]tool_name{args}[TOOL_CALLS]tool_name2{args2}
+                for raw in raw_tool_calls:
+                    raw = raw.strip()
+                    if not raw or "{" not in raw:
+                        continue
+
+                    brace_idx = raw.find("{")
+                    tool_name = raw[:brace_idx].strip()
+                    args_str = raw[brace_idx:]
+
+                    tool_calls.append(
+                        ChatCompletionMessageToolCall(
+                            id=_generate_mistral_id(),
+                            type="function",
+                            function=Function(name=tool_name, arguments=args_str),
+                        )
+                    )
+            else:
+                # Pre-v11 format: [TOOL_CALLS] [{"name": ..., "arguments": {...}}]
+                try:
+                    parsed = json.loads(first_raw)
+                    if isinstance(parsed, dict):
+                        parsed = [parsed]
+
+                    for tc in parsed:
+                        args = tc.get("arguments", {})
+                        if isinstance(args, dict):
+                            args = json.dumps(args, ensure_ascii=False)
+
+                        tool_calls.append(
+                            ChatCompletionMessageToolCall(
+                                id=_generate_mistral_id(),
+                                type="function",
+                                function=Function(
+                                    name=tc["name"], arguments=args
+                                ),
+                            )
+                        )
+                except json.JSONDecodeError:
+                    # Fallback regex extraction
+                    match = self.TOOL_CALL_REGEX.findall(first_raw)
+                    if match:
+                        for raw_json in match:
+                            try:
+                                tc = json.loads(raw_json)
+                                args = tc.get("arguments", {})
+                                if isinstance(args, dict):
+                                    args = json.dumps(args, ensure_ascii=False)
+                                tool_calls.append(
+                                    ChatCompletionMessageToolCall(
+                                        id=_generate_mistral_id(),
+                                        type="function",
+                                        function=Function(
+                                            name=tc["name"], arguments=args
+                                        ),
+                                    )
+                                )
+                            except (json.JSONDecodeError, KeyError):
+                                continue
+
+            if not tool_calls:
+                return text, None
+
+            return content if content else None, tool_calls
+
+        except Exception:
+            return text, None
--- a/environments/tool_call_parsers/qwen3_coder_parser.py
+++ b/environments/tool_call_parsers/qwen3_coder_parser.py
@@ -0,0 +1,163 @@
+"""
+Qwen3-Coder tool call parser.
+
+Format uses XML-style nested tags:
+    <tool_call>
+    <function=function_name>
+    <parameter=param_name>value</parameter>
+    <parameter=param_name2>value2</parameter>
+    </function>
+    </tool_call>
+
+Parameters are extracted from <parameter=name>value</parameter> tags and
+type-converted using the schema if available, otherwise treated as strings.
+
+Based on VLLM's Qwen3CoderToolParser.extract_tool_calls()
+"""
+
+import ast
+import json
+import re
+import uuid
+from typing import Any, Dict, List, Optional
+
+from openai.types.chat.chat_completion_message_tool_call import (
+    ChatCompletionMessageToolCall,
+    Function,
+)
+
+from environments.tool_call_parsers import ParseResult, ToolCallParser, register_parser
+
+
+def _try_convert_value(value: str) -> Any:
+    """
+    Try to convert a parameter value string to a native Python type.
+    Handles null, numbers, booleans, JSON objects/arrays, and falls back to string.
+    """
+    stripped = value.strip()
+
+    # Handle null
+    if stripped.lower() == "null":
+        return None
+
+    # Try JSON first (handles objects, arrays, strings, numbers, booleans)
+    try:
+        return json.loads(stripped)
+    except (json.JSONDecodeError, TypeError):
+        pass
+
+    # Try Python literal eval (handles tuples, etc.)
+    try:
+        return ast.literal_eval(stripped)
+    except (ValueError, SyntaxError, TypeError):
+        pass
+
+    # Return as string
+    return stripped
+
+
+@register_parser("qwen3_coder")
+class Qwen3CoderToolCallParser(ToolCallParser):
+    """
+    Parser for Qwen3-Coder XML-format tool calls.
+
+    Uses nested XML tags: <tool_call><function=name><parameter=key>val</parameter></function></tool_call>
+    """
+
+    START_TOKEN = "<tool_call>"
+    FUNCTION_PREFIX = "<function="
+
+    # Find complete tool_call blocks (or unclosed at end)
+    TOOL_CALL_REGEX = re.compile(
+        r"<tool_call>(.*?)</tool_call>|<tool_call>(.*?)$", re.DOTALL
+    )
+
+    # Find function blocks within a tool_call
+    FUNCTION_REGEX = re.compile(
+        r"<function=(.*?)</function>|<function=(.*)$", re.DOTALL
+    )
+
+    # Find parameter blocks within a function
+    PARAMETER_REGEX = re.compile(
+        r"<parameter=(.*?)(?:</parameter>|(?=<parameter=)|(?=</function>)|$)",
+        re.DOTALL,
+    )
+
+    def _parse_function_call(self, function_str: str) -> Optional[ChatCompletionMessageToolCall]:
+        """Parse a single <function=name>...</function> block into a ToolCall."""
+        try:
+            # Extract function name: everything before the first '>'
+            gt_idx = function_str.index(">")
+            func_name = function_str[:gt_idx].strip()
+            params_str = function_str[gt_idx + 1:]
+
+            # Extract parameters
+            param_dict: Dict[str, Any] = {}
+            for match_text in self.PARAMETER_REGEX.findall(params_str):
+                if ">" not in match_text:
+                    continue
+                eq_idx = match_text.index(">")
+                param_name = match_text[:eq_idx].strip()
+                param_value = match_text[eq_idx + 1:]
+
+                # Clean up whitespace
+                if param_value.startswith("\n"):
+                    param_value = param_value[1:]
+                if param_value.endswith("\n"):
+                    param_value = param_value[:-1]
+
+                param_dict[param_name] = _try_convert_value(param_value)
+
+            return ChatCompletionMessageToolCall(
+                id=f"call_{uuid.uuid4().hex[:24]}",
+                type="function",
+                function=Function(
+                    name=func_name,
+                    arguments=json.dumps(param_dict, ensure_ascii=False),
+                ),
+            )
+        except (ValueError, IndexError):
+            return None
+
+    def parse(self, text: str) -> ParseResult:
+        if self.FUNCTION_PREFIX not in text:
+            return text, None
+
+        try:
+            # Find all tool_call blocks
+            tc_matches = self.TOOL_CALL_REGEX.findall(text)
+            raw_blocks = [m[0] if m[0] else m[1] for m in tc_matches]
+
+            # Fallback: if no tool_call tags, try the whole text
+            if not raw_blocks:
+                raw_blocks = [text]
+
+            # Find function blocks within each tool_call
+            function_strs: List[str] = []
+            for block in raw_blocks:
+                func_matches = self.FUNCTION_REGEX.findall(block)
+                function_strs.extend(m[0] if m[0] else m[1] for m in func_matches)
+
+            if not function_strs:
+                return text, None
+
+            # Parse each function call
+            tool_calls: List[ChatCompletionMessageToolCall] = []
+            for func_str in function_strs:
+                tc = self._parse_function_call(func_str)
+                if tc is not None:
+                    tool_calls.append(tc)
+
+            if not tool_calls:
+                return text, None
+
+            # Content before tool calls
+            first_tc = text.find(self.START_TOKEN)
+            if first_tc < 0:
+                first_tc = text.find(self.FUNCTION_PREFIX)
+            content = text[:first_tc].strip() if first_tc > 0 else None
+
+            return content, tool_calls
+
+        except Exception:
+            return text, None
--- a/environments/tool_call_parsers/qwen_parser.py
+++ b/environments/tool_call_parsers/qwen_parser.py
@@ -0,0 +1,19 @@
+"""
+Qwen 2.5 tool call parser.
+
+Uses the same <tool_call> format as Hermes.
+Registered as a separate parser name for clarity when using --tool-parser=qwen.
+"""
+
+from environments.tool_call_parsers import register_parser
+from environments.tool_call_parsers.hermes_parser import HermesToolCallParser
+
+
+@register_parser("qwen")
+class QwenToolCallParser(HermesToolCallParser):
+    """
+    Parser for Qwen 2.5 tool calls.
+    Same <tool_call>{"name": ..., "arguments": ...}</tool_call> format as Hermes.
+    """
+
+    pass  # Identical format -- inherits everything from Hermes
--- a/environments/tool_context.py
+++ b/environments/tool_context.py
@@ -0,0 +1,289 @@
+"""
+ToolContext -- Unrestricted Tool Access for Reward Functions
+
+A per-rollout handle that gives reward/verification functions direct access to
+ALL hermes-agent tools, scoped to the rollout's task_id. The same task_id means
+the terminal/browser session is the SAME one the model used during its rollout --
+all state (files, processes, browser tabs) is preserved.
+
+The verifier author decides which tools to use. Nothing is hardcoded or gated.
+
+Example usage in a compute_reward():
+    async def compute_reward(self, item, result, ctx):
+        # Run tests in the model's terminal sandbox
+        test = ctx.terminal("pytest -v")
+        if test["exit_code"] == 0:
+            return 1.0
+
+        # Check if a file was created
+        content = ctx.read_file("/workspace/solution.py")
+        if content.get("content"):
+            return 0.5
+
+        return 0.0
+"""
+
+import json
+import logging
+import os
+from typing import Any, Dict, List, Optional
+
+import asyncio
+import concurrent.futures
+
+from model_tools import handle_function_call
+from tools.terminal_tool import cleanup_vm
+from tools.browser_tool import cleanup_browser
+
+logger = logging.getLogger(__name__)
+
+# Thread pool for running sync tool calls that internally use asyncio.run()
+_tool_executor = concurrent.futures.ThreadPoolExecutor(max_workers=4)
+
+
+def _run_tool_in_thread(tool_name: str, arguments: Dict[str, Any], task_id: str) -> str:
+    """
+    Run a tool call in a thread pool executor so backends that use asyncio.run()
+    internally (modal, docker) get a clean event loop.
+
+    If we're already in an async context, uses run_in_executor.
+    If not (e.g., called from sync code), runs directly.
+    """
+    try:
+        loop = asyncio.get_running_loop()
+        # We're in an async context -- need to run in thread
+        import concurrent.futures
+        with concurrent.futures.ThreadPoolExecutor(max_workers=1) as pool:
+            future = pool.submit(
+                handle_function_call, tool_name, arguments, task_id
+            )
+            return future.result(timeout=300)
+    except RuntimeError:
+        # No running event loop -- safe to call directly
+        return handle_function_call(tool_name, arguments, task_id)
+
+
+class ToolContext:
+    """
+    Open-ended access to all hermes-agent tools for a specific rollout.
+
+    Passed to compute_reward() so verifiers can use any tool they need:
+    terminal commands, file reads/writes, web searches, browser automation, etc.
+    All calls share the rollout's task_id for session isolation.
+    """
+
+    def __init__(self, task_id: str):
+        self.task_id = task_id
+
+    # -------------------------------------------------------------------------
+    # Terminal tools
+    # -------------------------------------------------------------------------
+
+    def terminal(self, command: str, timeout: int = 180) -> Dict[str, Any]:
+        """
+        Run a command in the rollout's terminal session.
+
+        Args:
+            command: Shell command to execute
+            timeout: Command timeout in seconds
+
+        Returns:
+            Dict with 'exit_code' (int) and 'output' (str)
+        """
+        import os
+        backend = os.getenv("TERMINAL_ENV", "local")
+        logger.debug("ToolContext.terminal [%s backend] task=%s: %s", backend, self.task_id[:8], command[:100])
+
+        # Run in thread pool so modal/docker backends' asyncio.run() doesn't deadlock
+        result = _run_tool_in_thread(
+            "terminal",
+            {"command": command, "timeout": timeout},
+            self.task_id,
+        )
+        try:
+            return json.loads(result)
+        except json.JSONDecodeError:
+            return {"exit_code": -1, "output": result}
+
+    # -------------------------------------------------------------------------
+    # File tools
+    # -------------------------------------------------------------------------
+
+    def read_file(self, path: str) -> Dict[str, Any]:
+        """
+        Read a file from the rollout's filesystem.
+
+        Args:
+            path: File path to read
+
+        Returns:
+            Dict with file content or error
+        """
+        result = handle_function_call(
+            "read_file", {"path": path}, task_id=self.task_id
+        )
+        try:
+            return json.loads(result)
+        except json.JSONDecodeError:
+            return {"error": result}
+
+    def write_file(self, path: str, content: str) -> Dict[str, Any]:
+        """
+        Write a file in the rollout's filesystem.
+
+        Args:
+            path: File path to write
+            content: Content to write
+
+        Returns:
+            Dict with success status or error
+        """
+        result = handle_function_call(
+            "write_file", {"path": path, "content": content}, task_id=self.task_id
+        )
+        try:
+            return json.loads(result)
+        except json.JSONDecodeError:
+            return {"error": result}
+
+    def search(self, query: str, path: str = ".") -> Dict[str, Any]:
+        """
+        Search for text in the rollout's filesystem.
+
+        Args:
+            query: Search query
+            path: Directory to search in
+
+        Returns:
+            Dict with search results
+        """
+        result = handle_function_call(
+            "search", {"query": query, "path": path}, task_id=self.task_id
+        )
+        try:
+            return json.loads(result)
+        except json.JSONDecodeError:
+            return {"error": result}
+
+    # -------------------------------------------------------------------------
+    # Web tools
+    # -------------------------------------------------------------------------
+
+    def web_search(self, query: str) -> Dict[str, Any]:
+        """
+        Search the web.
+
+        Args:
+            query: Search query
+
+        Returns:
+            Dict with search results
+        """
+        result = handle_function_call("web_search", {"query": query})
+        try:
+            return json.loads(result)
+        except json.JSONDecodeError:
+            return {"error": result}
+
+    def web_extract(self, urls: List[str]) -> Dict[str, Any]:
+        """
+        Extract content from URLs.
+
+        Args:
+            urls: List of URLs to extract content from
+
+        Returns:
+            Dict with extracted content
+        """
+        result = handle_function_call("web_extract", {"urls": urls})
+        try:
+            return json.loads(result)
+        except json.JSONDecodeError:
+            return {"error": result}
+
+    # -------------------------------------------------------------------------
+    # Browser tools
+    # -------------------------------------------------------------------------
+
+    def browser_navigate(self, url: str) -> Dict[str, Any]:
+        """
+        Navigate the rollout's browser session to a URL.
+
+        Args:
+            url: URL to navigate to
+
+        Returns:
+            Dict with page snapshot or error
+        """
+        result = handle_function_call(
+            "browser_navigate", {"url": url}, task_id=self.task_id
+        )
+        try:
+            return json.loads(result)
+        except json.JSONDecodeError:
+            return {"error": result}
+
+    def browser_snapshot(self) -> Dict[str, Any]:
+        """
+        Take a snapshot of the current browser page.
+
+        Returns:
+            Dict with page content/accessibility snapshot
+        """
+        result = handle_function_call(
+            "browser_snapshot", {}, task_id=self.task_id
+        )
+        try:
+            return json.loads(result)
+        except json.JSONDecodeError:
+            return {"error": result}
+
+    # -------------------------------------------------------------------------
+    # Generic tool access
+    # -------------------------------------------------------------------------
+
+    def call_tool(self, tool_name: str, arguments: Dict[str, Any]) -> str:
+        """
+        Call any hermes-agent tool by name.
+
+        This is the generic escape hatch -- if a tool doesn't have a convenience
+        wrapper above, you can call it directly here.
+
+        Args:
+            tool_name: Name of the tool (e.g., "vision_analyze", "skills_list")
+            arguments: Dict of arguments for the tool
+
+        Returns:
+            Raw JSON string result from the tool
+        """
+        return _run_tool_in_thread(tool_name, arguments, self.task_id)
+
+    # -------------------------------------------------------------------------
+    # Cleanup
+    # -------------------------------------------------------------------------
+
+    def cleanup(self):
+        """
+        Release all resources (terminal VMs, browser sessions) for this rollout.
+
+        Called automatically by the base environment via try/finally after
+        compute_reward() completes. You generally don't need to call this yourself.
+        """
+        try:
+            cleanup_vm(self.task_id)
+        except Exception as e:
+            logger.debug("VM cleanup for task %s: %s", self.task_id, e)
+
+        # Suppress browser_tool's noisy debug prints during cleanup.
+        # The cleanup still runs (safe), it just doesn't spam the console.
+        _prev_quiet = os.environ.get("HERMES_QUIET")
+        os.environ["HERMES_QUIET"] = "1"
+        try:
+            cleanup_browser(self.task_id)
+        except Exception as e:
+            logger.debug("Browser cleanup for task %s: %s", self.task_id, e)
+        finally:
+            if _prev_quiet is None:
+                os.environ.pop("HERMES_QUIET", None)
+            else:
+                os.environ["HERMES_QUIET"] = _prev_quiet
--- a/example-skill/SKILL.md
+++ b/example-skill/SKILL.md
@@ -1,70 +0,0 @@
---
-name: example-skill
-description: An example skill demonstrating the skill file format and structure
---
-
-# Example Skill
-
-This is an example skill file that demonstrates how to create skills for the Hermes Agent.
-
-## Skill File Format
-
-Skills are markdown files with YAML frontmatter at the top:
-
-```yaml
---
-name: your-skill-name
-description: A brief one-line description of what this skill does
---
-```
-
-The frontmatter fields:
- **name**: The identifier used to reference this skill (lowercase, hyphens for spaces)
- **description**: A brief description shown when listing skills (keep under 200 chars)
-
-## Writing Effective Skills
-
-### 1. Be Specific and Actionable
-
-Good skills provide clear, actionable instructions:
-
-```
-When reviewing code:
-1. Check for security vulnerabilities first
-2. Verify error handling is comprehensive
-3. Ensure tests cover edge cases
-```
-
-### 2. Include Examples
-
-Show concrete examples of what you want:
-
-```python
-# Good: Descriptive variable names
-user_authentication_token = get_token()
-
-# Bad: Cryptic abbreviations  
-uat = gt()
-```
-
-### 3. Define When to Use
-
-Help the agent understand when this skill applies:
-
-> Use this skill when: reviewing pull requests, auditing security, or checking code quality.
-
-## Skill Categories
-
-Consider organizing skills by purpose:
-
- **Conventions**: Coding standards, API patterns, naming rules
- **Workflows**: Step-by-step processes for deployments, reviews, releases
- **Knowledge**: Domain-specific information, system architecture, gotchas
- **Templates**: Boilerplate for common tasks, response formats
-
-## Tips
-
-1. Keep the description concise - it's shown in the skills list
-2. Use headers to organize longer skills
-3. Include code examples where helpful
-4. Reference other skills if they're related
--- a/hermes_agent.egg-info/PKG-INFO
+++ b/hermes_agent.egg-info/PKG-INFO
@@ -1,868 +0,0 @@
-Metadata-Version: 2.4
-Name: hermes-agent
-Version: 0.1.0
-Summary: AI agent with advanced tool-calling and toolsets
-Author: Nous Research
-License: MIT
-Requires-Python: >=3.10
-Description-Content-Type: text/markdown
-Requires-Dist: openai
-Requires-Dist: python-dotenv
-Requires-Dist: fire
-Requires-Dist: httpx
-Requires-Dist: rich
-Requires-Dist: tenacity
-Requires-Dist: pyyaml
-Requires-Dist: requests
-Requires-Dist: jinja2
-Requires-Dist: pydantic>=2.0
-Requires-Dist: firecrawl-py
-Requires-Dist: fal-client
-Requires-Dist: litellm>=1.75.5
-Requires-Dist: typer
-Requires-Dist: platformdirs
-Provides-Extra: modal
-Requires-Dist: modal; extra == "modal"
-Requires-Dist: boto3; extra == "modal"
-Provides-Extra: dev
-Requires-Dist: pytest; extra == "dev"
-Requires-Dist: pytest-asyncio; extra == "dev"
-Provides-Extra: messaging
-Requires-Dist: python-telegram-bot>=20.0; extra == "messaging"
-Requires-Dist: discord.py>=2.0; extra == "messaging"
-Provides-Extra: cron
-Requires-Dist: croniter; extra == "cron"
-Provides-Extra: all
-Requires-Dist: croniter; extra == "all"
-Requires-Dist: python-telegram-bot>=20.0; extra == "all"
-Requires-Dist: discord.py>=2.0; extra == "all"
-
-# Hermes Agent
-
-An AI agent with advanced tool-calling capabilities, featuring a flexible toolsets system for organizing and managing tools.
-
-## Features
-
- **Interactive CLI**: Beautiful terminal interface with animated feedback, personalities, and session management
- **Messaging Gateway**: Connect to Telegram, Discord, and WhatsApp for conversational AI anywhere
- **Web Tools**: Search, extract content, and crawl websites
- **Terminal Tools**: Execute commands via local, Docker, Singularity, Modal, or SSH backends
- **Browser Tools**: Automate web browsers to navigate, click, type, and extract content
- **Vision Tools**: Analyze images from URLs
- **Reasoning Tools**: Advanced multi-model reasoning (Mixture of Agents)
- **Creative Tools**: Generate images from text prompts
- **Skills Tools**: On-demand knowledge documents with progressive disclosure
- **Toolsets System**: Organize tools into logical groups for different scenarios
- **Scheduled Tasks**: Cron jobs for automated agent tasks with delivery to platforms
- **Context Compression**: Automatic summarization when approaching context limits
- **Batch Processing**: Process datasets in parallel with checkpointing and statistics tracking
- **Ephemeral System Prompts**: Guide model behavior without polluting training datasets
-
-## Installation
-
-### Quick Install (Recommended)
-
-**Linux/macOS:**
-```bash
-curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
-```
-
-**Windows (PowerShell):**
-```powershell
-irm https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.ps1 | iex
-```
-
-This installer will:
- Clone the repository to `~/.hermes-agent`
- Create a virtual environment and install dependencies
- Set up the `hermes` command in your PATH
- Run an interactive setup wizard to configure API keys
-
-### Manual Installation
-
-If you prefer to install manually:
-
-```bash
-# Clone with submodules
-git clone --recurse-submodules https://github.com/NousResearch/Hermes-Agent.git
-cd Hermes-Agent
-
-# Run the setup script
-./setup-hermes.sh
-```
-
-Or step-by-step:
-
-```bash
-# Create and activate virtual environment
-python3 -m venv venv
-source venv/bin/activate  # Windows: venv\Scripts\activate
-
-# Install in editable mode with all extras
-pip install -e ".[all]"
-
-# Or install dependencies manually
-pip install -r requirements.txt
-pip install -e ./mini-swe-agent
-
-# Copy and configure environment
-cp .env.example .env
-# Edit .env with your API keys
-
-# Run the setup wizard
-hermes setup
-```
-
-## Quick Start
-
-Once installed, the `hermes` command is your main entry point:
-
-```bash
-hermes                    # Interactive chat (default)
-hermes chat               # Same as above
-hermes chat -q "Hello"    # Single query, then exit
-hermes setup              # Configure API keys and settings
-hermes status             # Show configuration status
-hermes doctor             # Diagnose issues
-hermes gateway            # Start messaging gateway (Telegram/Discord/WhatsApp)
-hermes cron daemon        # Run cron job scheduler
-hermes version            # Show version info
-```
-
-**Legacy `./hermes` script:**
-```bash
-# The old CLI script still works:
-./hermes
-
-# Or with options:
-./hermes --model "anthropic/claude-sonnet-4" --toolsets "web,terminal"
-```
-
-The CLI provides:
- Animated spinners during thinking and tool execution
- Kawaii-style feedback messages
- `/commands` for configuration, history, and session management
- Customizable personalities (`/personality kawaii`, `/personality pirate`, etc.)
- Persistent configuration via `cli-config.yaml`
-
-## Configuration
-
-### Environment Variables
-```bash
-# Copy the example environment file
-cp .env.example .env
-
-# Edit .env and add your API keys
-nano .env  # or use your preferred editor
-```
-
-**Required API Keys:**
- `OPENROUTER_API_KEY` - LLM access via OpenRouter (get at: https://openrouter.ai/keys)
- `FIRECRAWL_API_KEY` - Web tools (get at: https://firecrawl.dev/)
- `NOUS_API_KEY` - Vision & reasoning tools (get at: https://inference-api.nousresearch.com/)
- `FAL_KEY` - Image generation (get at: https://fal.ai/)
-
-**Optional API Keys (for specific features):**
- `BROWSERBASE_API_KEY` - Browser automation (get at: https://browserbase.com/)
- `BROWSERBASE_PROJECT_ID` - From Browserbase dashboard
- `MORPH_API_KEY` - For legacy Hecate terminal backend (get at: https://morph.so/)
-
-### 4. Configure Terminal Backend
-
-The terminal tool uses **mini-swe-agent** environments. Configure in `.env` or `cli-config.yaml`:
-
-```bash
-# Backend: "local", "docker", "singularity", "modal", or "ssh"
-TERMINAL_ENV=local          # Default: runs on host machine (no isolation)
-TERMINAL_ENV=ssh            # Remote execution via SSH (agent code stays local)
-TERMINAL_ENV=singularity    # Recommended for HPC: Apptainer/Singularity containers
-TERMINAL_ENV=docker         # Isolated Docker containers
-TERMINAL_ENV=modal          # Cloud execution via Modal
-
-# Container image (for docker/singularity/modal backends)
-TERMINAL_DOCKER_IMAGE=python:3.11-slim
-TERMINAL_SINGULARITY_IMAGE=docker://python:3.11-slim
-TERMINAL_TIMEOUT=60
-
-# SSH backend (for ssh)
-TERMINAL_SSH_HOST=my-server.example.com
-TERMINAL_SSH_USER=myuser
-TERMINAL_SSH_KEY=~/.ssh/id_rsa  # Optional, uses ssh-agent if not set
-```
-
-**Backend Requirements:**
- **local**: No extra setup (runs directly on your machine, no isolation)
- **ssh**: SSH access to remote machine (great for sandboxing - agent can't touch its own code)
- **singularity**: Requires Apptainer or Singularity installed (common on HPC clusters, no root needed)
- **docker**: Requires Docker installed and user in `docker` group
- **modal**: Requires Modal account (see setup below)
-
-### Singularity/Apptainer Setup (Recommended for HPC)
-
-Singularity/Apptainer provides rootless container execution, ideal for HPC clusters:
-
-```bash
-# 1. Verify Apptainer is installed
-apptainer --version  # or: singularity --version
-
-# 2. Set up cache directories (important for parallel workers)
-# Use /scratch if available (HPC), otherwise /tmp
-export APPTAINER_CACHEDIR=/scratch/$USER/.apptainer
-export APPTAINER_TMPDIR=/scratch/$USER/.apptainer/tmp
-mkdir -p "$APPTAINER_CACHEDIR" "$APPTAINER_TMPDIR"
-
-# 3. Pre-build SIF image (recommended for parallel batch processing)
-# This avoids race conditions when multiple workers start simultaneously
-apptainer build $APPTAINER_CACHEDIR/python-nodejs.sif docker://nikolaik/python-nodejs:python3.11-nodejs20
-
-# 4. Configure .env to use the local SIF
-TERMINAL_ENV=singularity
-TERMINAL_SINGULARITY_IMAGE=/scratch/$USER/.apptainer/python-nodejs.sif
-```
-
-**Tip:** The batch scripts in `configs/` automatically handle SIF pre-building if `/scratch` is available.
-
-### Modal Cloud Backend Setup
-
-[Modal](https://modal.com) provides serverless cloud compute for running sandboxed environments at scale.
-
-```bash
-# 1. Install Modal and dependencies
-pip install modal boto3
-
-# 2. Authenticate with Modal (opens browser)
-modal setup
-
-# 3. Set terminal backend to modal in .env
-TERMINAL_ENV=modal
-```
-
-Modal uses CLI-based authentication (stored in `~/.modal/`), so no API key is needed in `.env`. After running `modal setup`, commands will automatically execute in Modal's cloud sandboxes.
-
-### Browser Tools Setup
-
-Browser tools enable the agent to navigate websites, fill forms, click buttons, and extract content. They use [agent-browser](https://github.com/vercel-labs/agent-browser) CLI with [Browserbase](https://browserbase.com) cloud execution.
-
-```bash
-# 1. Install Node.js (if not already installed)
-# Use nvm (recommended) or your package manager
-
-# 2. Install agent-browser CLI (choose one option):
-npm install -g agent-browser     # Option A: Global install (recommended)
-npm install                      # Option B: Local install (uses npx fallback)
-
-# 3. Get Browserbase credentials
-# Sign up at https://browserbase.com/ and get your:
-# - API Key (from Settings → API Keys)
-# - Project ID (from your project dashboard)
-
-# 4. Add to your .env file:
-BROWSERBASE_API_KEY=your_api_key_here
-BROWSERBASE_PROJECT_ID=your_project_id_here
-```
-
-**Available Browser Tools:**
-
-| Tool | Description |
-|------|-------------|
-| `browser_navigate` | Navigate to a URL |
-| `browser_snapshot` | Get text-based page snapshot with element refs |
-| `browser_click` | Click an element by ref (e.g., `@e5`) |
-| `browser_type` | Type text into an input field |
-| `browser_scroll` | Scroll up or down |
-| `browser_back` | Go back in browser history |
-| `browser_press` | Press a keyboard key (Enter, Tab, etc.) |
-| `browser_close` | Close the browser session |
-| `browser_get_images` | Get list of images on the page |
-
-**Example Usage:**
-```bash
-# Use browser tools with web search and vision
-python run_agent.py \
-  --query "Go to amazon.com and find the price of the latest Kindle" \
-  --enabled_toolsets=browser,web,vision
-
-# Use browser-focused distribution
-python batch_runner.py \
-  --dataset_file=browser_tasks.jsonl \
-  --distribution=browser_use \
-  --run_name=browser_run
-```
-
-See `.env.example` for all available configuration options including debug settings.
-
-### Skills Tools
-
-Skills are on-demand knowledge documents the agent can load when needed. They follow a **progressive disclosure** pattern to minimize token usage:
-
-```
-skills/
-├── mlops/                    # Category folder
-│   ├── axolotl/             # Skill folder
-│   │   ├── SKILL.md         # Main instructions (required)
-│   │   ├── references/      # Additional docs, API specs
-│   │   └── templates/       # Output formats, configs
-│   └── vllm/
-│       └── SKILL.md
-```
-
-**Available Skills Tools:**
-
-| Tool | Description |
-|------|-------------|
-| `skills_categories` | List available skill categories (~50 tokens) |
-| `skills_list` | List skills with name + description (~3k tokens for 40 skills) |
-| `skill_view` | Load full skill content, tags, and linked files |
-
-**Example Usage:**
-```bash
-# Use skills tools
-python run_agent.py \
-  --query "What skills do you have for fine-tuning? Show me the axolotl skill." \
-  --enabled_toolsets=skills
-```
-
-**Creating Skills:**
-
-Skills use YAML frontmatter for metadata:
-```yaml
---
-name: my-skill
-description: Brief description shown in skills_list
-tags: [tag1, tag2]
-related_skills: [other-skill]
-version: 1.0.0
---
-# Skill Content
-
-Instructions, examples, and guidelines here...
-```
-
-Skills can include:
- `references/` - Additional documentation, API specs, examples
- `templates/` - Output formats, config files, boilerplate code
- `scripts/` - Executable helpers (Python, shell scripts)
-
-## Session Logging
-
-Every conversation is automatically logged to `logs/` for debugging and inspection:
-
-```
-logs/
-├── session_20260201_143052_a1b2c3.json
-├── session_20260201_150217_d4e5f6.json
-└── ...
-```
-
-**Log Format:**
-```json
-{
-  "session_id": "20260201_143052_a1b2c3",
-  "model": "anthropic/claude-sonnet-4",
-  "session_start": "2026-02-01T14:30:52.123456",
-  "last_updated": "2026-02-01T14:35:12.789012",
-  "message_count": 8,
-  "conversations": [
-    {"from": "system", "value": "..."},
-    {"from": "human", "value": "..."},
-    {"from": "gpt", "value": "..."},
-    {"from": "tool", "value": "..."}
-  ]
-}
-```
-
- **Automatic**: Logs are created and updated automatically after each conversation turn
- **Session ID in Banner**: The CLI displays the session ID in the welcome banner
- **Trajectory Format**: Uses the same format as batch processing for consistency
- **Git Ignored**: `logs/` is in `.gitignore` so logs aren't committed
-
-## Context Compression
-
-Long conversations can exceed the model's context limit. Hermes Agent automatically compresses context when approaching the limit:
-
-**How it works:**
-1. Tracks actual token usage from API responses (`usage.prompt_tokens`)
-2. When tokens reach 85% of model's context limit, triggers compression
-3. Protects first 3 turns (system prompt, initial request, first response)
-4. Protects last 4 turns (recent context is most relevant)
-5. Summarizes middle turns using a fast/cheap model (Gemini Flash)
-6. Inserts summary as a user message, conversation continues seamlessly
-
-**Configuration (`cli-config.yaml`):**
-```yaml
-compression:
-  enabled: true                    # Enable auto-compression (default)
-  threshold: 0.85                  # Compress at 85% of context limit
-  summary_model: "google/gemini-2.0-flash-001"
-```
-
-**Or via environment variables:**
-```bash
-CONTEXT_COMPRESSION_ENABLED=true
-CONTEXT_COMPRESSION_THRESHOLD=0.85
-CONTEXT_COMPRESSION_MODEL=google/gemini-2.0-flash-001
-```
-
-**When compression triggers, you'll see:**
-```
-📦 Context compression triggered (170,000 tokens ≥ 170,000 threshold)
-   📊 Model context limit: 200,000 tokens (85% = 170,000)
-   🗜️  Summarizing turns 4-15 (12 turns)
-   ✅ Compressed: 20 → 9 messages (~45,000 tokens saved)
-```
-
-## Scheduled Tasks (Cron Jobs)
-
-Hermes Agent can schedule automated tasks to run in the future - either one-time reminders or recurring jobs.
-
-### CLI Commands
-
-```bash
-# List scheduled jobs
-/cron
-
-# Add a one-shot reminder (runs once in 30 minutes)
-/cron add 30m Remind me to check the build status
-
-# Add a recurring job (every 2 hours)
-/cron add "every 2h" Check server status at 192.168.1.100 and report any issues
-
-# Add a cron expression (daily at 9am)
-/cron add "0 9 * * *" Generate a morning briefing summarizing GitHub notifications
-
-# Remove a job
-/cron remove abc123def456
-```
-
-### Agent Self-Scheduling
-
-The agent can also schedule its own follow-up tasks using tools:
-
-```python
-# Available when using hermes-cli toolset (default for CLI)
-schedule_cronjob(prompt="...", schedule="30m", repeat=1)  # One-shot
-schedule_cronjob(prompt="...", schedule="every 2h")       # Recurring
-list_cronjobs()                                            # View all jobs
-remove_cronjob(job_id="...")                              # Cancel a job
-```
-
-**⚠️ Important:** Cronjobs run in **isolated sessions with NO prior context**. The prompt must be completely self-contained with all necessary information (file paths, URLs, server addresses, etc.). The future agent will not remember anything from the current conversation.
-
-### Schedule Formats
-
-| Format | Example | Description |
-|--------|---------|-------------|
-| Duration | `30m`, `2h`, `1d` | One-shot delay from now |
-| Interval | `every 30m`, `every 2h` | Recurring at fixed intervals |
-| Cron | `0 9 * * *` | Cron expression (requires `croniter`) |
-| Timestamp | `2026-02-03T14:00` | One-shot at specific time |
-
-### Repeat Options
-
-| repeat | Behavior |
-|--------|----------|
-| (omitted) | One-shot schedules run once; intervals/cron run forever |
-| `1` | Run once then auto-delete |
-| `N` | Run N times then auto-delete |
-
-### Running the Cron Daemon
-
-Jobs are stored in `~/.hermes/cron/jobs.json` and executed by a scheduler:
-
-```bash
-# Option 1: Built-in daemon (checks every 60 seconds)
-python cli.py --cron-daemon
-
-# Option 2: System cron integration (run once per minute)
-# Add to crontab: crontab -e
-*/1 * * * * cd ~/hermes-agent && python cli.py --cron-tick-once >> ~/.hermes/cron/cron.log 2>&1
-```
-
-### Job Output
-
-Job outputs are saved to `~/.hermes/cron/output/{job_id}/{timestamp}.md` for review.
-
-## Messaging Gateway (Telegram, Discord, WhatsApp)
-
-Connect Hermes Agent to messaging platforms so you can chat from anywhere.
-
-### Quick Start
-
-```bash
-# 1. Add your bot token to .env
-echo 'TELEGRAM_BOT_TOKEN="your_token"' >> .env
-
-# 2. Test the gateway (foreground)
-./scripts/hermes-gateway run
-
-# 3. Install as a background service
-./scripts/hermes-gateway install
-
-# 4. Manage the service
-./scripts/hermes-gateway start   # Start
-./scripts/hermes-gateway stop    # Stop
-./scripts/hermes-gateway status  # Check status
-```
-
-### Supported Platforms
-
-| Platform | Setup | Toolset |
-|----------|-------|---------|
-| Telegram | Bot via @BotFather | `hermes-telegram` |
-| Discord | Bot via Developer Portal | `hermes-discord` |
-| WhatsApp | Node.js bridge | `hermes-whatsapp` |
-
-### Session Management
-
- Sessions persist across messages (agent remembers context)
- Reset policies: daily (4am), idle (2 hours), or both
- Manual reset: send `/new` or `/reset`
-
-### Cron Job Delivery
-
-Schedule tasks that deliver to specific platforms:
-
-```python
-schedule_cronjob(
-    prompt="Check server status...",
-    schedule="every 1h",
-    deliver="telegram"  # or "origin", "discord", etc.
-)
-```
-
-### CLI Commands
-
-| Command | Description |
-|---------|-------------|
-| `/platforms` | Show gateway configuration status |
-| `--gateway` | Start the gateway (CLI flag) |
-
-See [docs/messaging.md](docs/messaging.md) for full setup instructions.
-
-## Interactive CLI
-
-The CLI provides a rich interactive experience for working with the agent.
-
-### Running the CLI
-
-```bash
-# Basic usage
-./hermes
-
-# With specific model
-./hermes --model "anthropic/claude-sonnet-4"
-
-# With specific toolsets
-./hermes --toolsets "web,terminal,skills"
-```
-
-### CLI Commands
-
-| Command | Description |
-|---------|-------------|
-| `/help` | Show available commands |
-| `/tools` | List available tools by toolset |
-| `/toolsets` | List available toolsets |
-| `/model [name]` | Show or change the current model |
-| `/prompt [text]` | View/set custom system prompt |
-| `/personality [name]` | Set a predefined personality |
-| `/clear` | Clear screen and reset conversation |
-| `/reset` | Reset conversation only |
-| `/history` | Show conversation history |
-| `/save` | Save current conversation to file |
-| `/config` | Show current configuration |
-| `/cron` | Manage scheduled tasks (list, add, remove) |
-| `/platforms` | Show gateway/messaging platform status |
-| `/quit` | Exit the CLI |
-
-### Configuration
-
-Copy `cli-config.yaml.example` to `cli-config.yaml` and customize:
-
-```yaml
-# Model settings
-model:
-  default: "anthropic/claude-sonnet-4"
-
-# Terminal backend (local, docker, singularity, modal, or ssh)
-terminal:
-  env_type: "local"
-  cwd: "."  # Use current directory
-
-# Or use SSH for remote execution (keeps agent code isolated)
-# terminal:
-#   env_type: "ssh"
-#   ssh_host: "my-server.example.com"
-#   ssh_user: "myuser"
-#   ssh_key: "~/.ssh/id_rsa"
-#   cwd: "/home/myuser/project"
-
-# Enable specific toolsets
-toolsets:
-  - all  # or: web, terminal, browser, vision, etc.
-
-# Custom personalities (use with /personality command)
-agent:
-  personalities:
-    helpful: "You are a helpful assistant."
-    kawaii: "You are a kawaii assistant! Use cute expressions..."
-```
-
-### Personalities
-
-Built-in personalities available via `/personality`:
- `helpful`, `concise`, `technical`, `creative`, `teacher`
- `kawaii`, `catgirl`, `pirate`, `shakespeare`, `surfer`
- `noir`, `uwu`, `philosopher`, `hype`
-
-## Toolsets System
-
-The agent uses a toolsets system for organizing and managing tools. All tools must be part of a toolset to be accessible - individual tool selection is not supported. This ensures consistent and logical grouping of capabilities.
-
-### Key Concepts
-
- **Toolsets**: Logical groups of tools for specific use cases (e.g., "research", "development", "debugging")
- **Composition**: Toolsets can include other toolsets for powerful combinations
- **Custom Toolsets**: Create your own toolsets at runtime or by editing `toolsets.py`
- **Toolset-Only Access**: Tools are only accessible through toolsets, not individually
-
-### Available Toolsets
-
-See `toolsets.py` for the complete list of predefined toolsets including:
- Basic toolsets (web, terminal, vision, creative, reasoning)
- Composite toolsets (research, development, analysis, etc.)
- Scenario-specific toolsets (debugging, documentation, API testing, etc.)
- Special toolsets (safe mode without terminal, minimal, offline)
-
-### Using Toolsets
-
-```bash
-# Use a predefined toolset
-python run_agent.py --enabled_toolsets=research --query "Find latest AI papers"
-
-# Combine multiple toolsets
-python run_agent.py --enabled_toolsets=web,vision --query "Analyze this website"
-
-# Enable all toolsets explicitly (same as omitting the flag)
-python run_agent.py --enabled_toolsets=all --query "Do web research and run commands if helpful"
-
-# Safe mode (no terminal access)
-python run_agent.py --enabled_toolsets=safe --query "Help without running commands"
-
-# List all available toolsets and tools
-python run_agent.py --list_tools
-```
-
-See `toolsets.py` for the complete list of available toolsets and how to create custom ones.
-
-## Basic Usage
-
-### Default (all tools enabled)
-```bash
-# Uses OpenRouter by default - just set OPENROUTER_API_KEY in .env
-python run_agent.py \
-  --query "search up the latest docs on jit in python 3.13 and write me basic example that's not in their docs. profile its perf" \
-  --max_turns 20 \
-  --model anthropic/claude-sonnet-4-20250514
-```
-
-### With specific toolset
-```bash
-python run_agent.py \
-  --query "Debug this Python error" \
-  --enabled_toolsets=debugging \
-  --model anthropic/claude-sonnet-4-20250514
-```
-
-### Python API
-```python
-from run_agent import AIAgent
-
-# Uses OpenRouter by default (reads OPENROUTER_API_KEY from .env)
-agent = AIAgent(
-    model="anthropic/claude-sonnet-4-20250514",
-    enabled_toolsets=["research"]
-)
-response = agent.chat("Find information about quantum computing")
-
-# Create custom toolset at runtime
-from toolsets import create_custom_toolset
-
-create_custom_toolset(
-    name="my_tools",
-    description="My custom toolkit",
-    tools=["web_search"],
-    includes=["terminal", "vision"]
-)
-
-agent = AIAgent(enabled_toolsets=["my_tools"])
-```
-
-## Batch Processing
-
-Process multiple prompts from a dataset in parallel with automatic checkpointing and statistics tracking:
-
-```bash
-# Basic batch processing
-python batch_runner.py \
-  --dataset_file=prompts.jsonl \
-  --batch_size=20 \
-  --run_name=my_run
-
-# With specific distribution
-python batch_runner.py \
-  --dataset_file=prompts.jsonl \
-  --batch_size=20 \
-  --run_name=image_run \
-  --distribution=image_gen \
-  --num_workers=4
-```
-
-**Key Features:**
- Parallel processing with configurable workers
- Toolset distributions for varied data generation
- Automatic checkpointing and resume capability
- Combined output in `data/<run_name>/trajectories.jsonl`
- Tool usage statistics and success rates
-
-Use `--list_distributions` to see available toolset distributions for varied data generation.
-
-### Trajectory Compression
-
-Post-process trajectories to fit within token budgets for training:
-
-```bash
-# Compress a directory of JSONL files
-python trajectory_compressor.py --input=data/my_run
-
-# Compress a single JSONL file
-python trajectory_compressor.py --input=data/trajectories.jsonl
-
-# Compress a 15% sample (useful for creating smaller training sets)
-python trajectory_compressor.py --input=data/trajectories.jsonl --sample_percent=15
-
-# Custom output and token target
-python trajectory_compressor.py \
-  --input=data/trajectories.jsonl \
-  --output=data/compressed.jsonl \
-  --target_max_tokens=16000
-```
-
-**Features:**
- Protects first turns (system, human, first GPT response, first tool call)
- Protects last N turns (configurable)
- Summarizes middle turns using LLM to fit target token budget
- Supports both directory and single file input
- Optional random sampling with `--sample_percent`
- Configurable via `configs/trajectory_compression.yaml`
-
-### Ephemeral System Prompts
-
-The ephemeral system prompt feature allows you to guide the model's behavior during batch processing **without** saving that prompt to the training dataset trajectories. This is useful for:
-
- Guiding model behavior during data collection
- Adding task-specific instructions 
- Keeping saved trajectories clean and focused on tool-calling format
-
-**Example:**
-```bash
-python batch_runner.py \
-  --dataset_file=prompts.jsonl \
-  --batch_size=10 \
-  --run_name=my_run \
-  --ephemeral_system_prompt="You are a helpful assistant focused on image generation."
-```
-
-The ephemeral prompt will influence the model's behavior during execution, but **only the standard tool-calling system prompt** will be saved in the trajectory files.
-
-The ephemeral prompt influences model behavior during execution, but **only the standard tool-calling system prompt** is saved in trajectory files.
-
-## Command Line Arguments
-
-**Single Agent (`run_agent.py`):**
- `--query`: The question or task for the agent
- `--model`: Model to use (default: claude-opus-4-20250514)
- `--api_key`: API key for authentication
- `--base_url`: API endpoint URL
- `--max_turns`: Maximum number of tool-calling iterations
- `--enabled_toolsets`: Comma-separated list of toolsets to enable. Use `all` (or `*`) to enable everything. If omitted, all toolsets are enabled by default.
- `--disabled_toolsets`: Comma-separated list of toolsets to disable
- `--list_tools`: List all available toolsets and tools
- `--save_trajectories`: Save conversation trajectories to JSONL files
-
-**Batch Processing (`batch_runner.py`):**
- `--dataset_file`: Path to JSONL file with prompts
- `--batch_size`: Number of prompts per batch
- `--run_name`: Name for this run (for output/checkpointing)
- `--distribution`: Toolset distribution to use (default: "default")
- `--num_workers`: Number of parallel workers (default: 4)
- `--resume`: Resume from checkpoint if interrupted
- `--ephemeral_system_prompt`: System prompt used during execution but NOT saved to trajectories
- `--list_distributions`: List available toolset distributions
-
-## Environment Variables
-
-All environment variables can be configured in the `.env` file (copy from `.env.example`).
-
-**LLM Provider (OpenRouter):**
- `OPENROUTER_API_KEY`: Primary LLM access via OpenRouter (supports Claude, GPT-4, Gemini, etc.)
- `LLM_MODEL`: Default model (e.g., `anthropic/claude-sonnet-4`, `openai/gpt-4o`)
-
-**Tool API Keys:**
- `FIRECRAWL_API_KEY`: Web tools (search, extract, crawl)
- `NOUS_API_KEY`: Vision and reasoning tools
- `FAL_KEY`: Image generation tools
-
-**Terminal Tool Configuration (mini-swe-agent backend):**
- `TERMINAL_ENV`: Backend type - `local`, `docker`, `singularity`, `modal`, or `ssh` (default: `local`)
- `TERMINAL_DOCKER_IMAGE`: Docker image for docker backend (default: `python:3.11-slim`)
- `TERMINAL_SINGULARITY_IMAGE`: Singularity/Apptainer image (can be `docker://...` URL or local `.sif` path)
- `TERMINAL_TIMEOUT`: Command timeout in seconds (default: `60`)
- `TERMINAL_LIFETIME_SECONDS`: Cleanup inactive environments after this time (default: `300`)
- `TERMINAL_CWD`: Working directory inside containers (default: `/tmp`)
- `TERMINAL_SCRATCH_DIR`: Custom scratch directory for sandbox storage (optional, auto-detects `/scratch`)
- `SUDO_PASSWORD`: Enable sudo commands by piping password via `sudo -S` (works with all backends)
-  - If unset in CLI mode, you'll be prompted interactively when sudo is needed (45s timeout)
-
-**SSH Backend Configuration (for remote execution):**
- `TERMINAL_SSH_HOST`: Remote server hostname or IP
- `TERMINAL_SSH_USER`: SSH username
- `TERMINAL_SSH_PORT`: SSH port (default: `22`)
- `TERMINAL_SSH_KEY`: Path to SSH private key (optional, uses ssh-agent if not set)
-
-**Context Compression (auto-shrinks long conversations):**
- `CONTEXT_COMPRESSION_ENABLED`: Enable auto-compression (default: `true`)
- `CONTEXT_COMPRESSION_THRESHOLD`: Compress at this % of context limit (default: `0.85`)
- `CONTEXT_COMPRESSION_MODEL`: Model for generating summaries (default: `google/gemini-2.0-flash-001`)
-
-**Browser Tool Configuration (agent-browser + Browserbase):**
- `BROWSERBASE_API_KEY`: Browserbase API key for cloud browser execution
- `BROWSERBASE_PROJECT_ID`: Browserbase project ID
- `BROWSER_SESSION_TIMEOUT`: Session timeout in seconds (default: `300`)
-
-**Legacy Hecate Terminal Backend (optional):**
- `MORPH_API_KEY`: For Hecate/MorphCloud terminal backend
- `HECATE_VM_LIFETIME_SECONDS`: VM lifetime (default: 300)
- `HECATE_DEFAULT_SNAPSHOT_ID`: Default snapshot (default: snapshot_p5294qxt)
-
-**Debug Options:**
- `WEB_TOOLS_DEBUG`, `VISION_TOOLS_DEBUG`, `MOA_TOOLS_DEBUG`, `IMAGE_TOOLS_DEBUG`: Enable debug logging
-
-## Key Files
-
-| File | Purpose |
-|------|---------|
-| `hermes` | CLI launcher script (run with `./hermes`) |
-| `cli.py` | Interactive CLI implementation |
-| `cli-config.yaml` | CLI configuration (copy from `.example`) |
-| `run_agent.py` | Main agent runner - single query execution |
-| `batch_runner.py` | Parallel batch processing with checkpointing |
-| `model_tools.py` | Core tool definitions and handlers |
-| `toolsets.py` | Toolset definitions and composition |
-| `toolset_distributions.py` | Probability distributions for data generation |
-| `trajectory_compressor.py` | Post-process trajectories for training |
-| `tools/` | Individual tool implementations |
-| `tools/skills_tool.py` | Skills system with progressive disclosure |
-| `skills/` | On-demand knowledge documents |
-| `docs/` | Documentation |
-| `configs/` | Example batch run scripts |
--- a/hermes_agent.egg-info/SOURCES.txt
+++ b/hermes_agent.egg-info/SOURCES.txt
@@ -1,47 +0,0 @@
-README.md
-batch_runner.py
-cli.py
-model_tools.py
-pyproject.toml
-run_agent.py
-toolset_distributions.py
-toolsets.py
-trajectory_compressor.py
-cron/__init__.py
-cron/jobs.py
-cron/scheduler.py
-gateway/__init__.py
-gateway/config.py
-gateway/delivery.py
-gateway/run.py
-gateway/session.py
-hermes_agent.egg-info/PKG-INFO
-hermes_agent.egg-info/SOURCES.txt
-hermes_agent.egg-info/dependency_links.txt
-hermes_agent.egg-info/entry_points.txt
-hermes_agent.egg-info/requires.txt
-hermes_agent.egg-info/top_level.txt
-hermes_cli/__init__.py
-hermes_cli/cron.py
-hermes_cli/doctor.py
-hermes_cli/gateway.py
-hermes_cli/main.py
-hermes_cli/setup.py
-hermes_cli/status.py
-tests/test_batch_runner.py
-tests/test_checkpoint_resumption.py
-tests/test_modal_terminal.py
-tests/test_nous_api_limits.py
-tests/test_nous_api_pattern.py
-tests/test_temperature_fix.py
-tests/test_web_tools.py
-tools/__init__.py
-tools/browser_tool.py
-tools/cronjob_tools.py
-tools/image_generation_tool.py
-tools/mixture_of_agents_tool.py
-tools/skills_tool.py
-tools/terminal_hecate.py
-tools/terminal_tool.py
-tools/vision_tools.py
-tools/web_tools.py
--- a/hermes_agent.egg-info/dependency_links.txt
+++ b/hermes_agent.egg-info/dependency_links.txt
@@ -1 +0,0 @@
-
--- a/hermes_agent.egg-info/entry_points.txt
+++ b/hermes_agent.egg-info/entry_points.txt
@@ -1,3 +0,0 @@
-[console_scripts]
-hermes = hermes_cli.main:main
-hermes-agent = run_agent:main
--- a/hermes_agent.egg-info/requires.txt
+++ b/hermes_agent.egg-info/requires.txt
@@ -1,35 +0,0 @@
-openai
-python-dotenv
-fire
-httpx
-rich
-tenacity
-pyyaml
-requests
-jinja2
-pydantic>=2.0
-firecrawl-py
-fal-client
-litellm>=1.75.5
-typer
-platformdirs
-
-[all]
-croniter
-python-telegram-bot>=20.0
-discord.py>=2.0
-
-[cron]
-croniter
-
-[dev]
-pytest
-pytest-asyncio
-
-[messaging]
-python-telegram-bot>=20.0
-discord.py>=2.0
-
-[modal]
-modal
-boto3
--- a/hermes_agent.egg-info/top_level.txt
+++ b/hermes_agent.egg-info/top_level.txt
@@ -1,11 +0,0 @@
-batch_runner
-cli
-cron
-gateway
-hermes_cli
-model_tools
-run_agent
-tools
-toolset_distributions
-toolsets
-trajectory_compressor
--- a/hermes_cli/doctor.py
+++ b/hermes_cli/doctor.py
@@ -58,8 +58,11 @@ def run_doctor(args):
    print(color("◆ Python Environment", Colors.CYAN, Colors.BOLD))
    
    py_version = sys.version_info
-    if py_version >= (3, 10):
+    if py_version >= (3, 11):
        check_ok(f"Python {py_version.major}.{py_version.minor}.{py_version.micro}")
+    elif py_version >= (3, 10):
+        check_ok(f"Python {py_version.major}.{py_version.minor}.{py_version.micro}")
+        check_warn("Python 3.11+ recommended for RL Training tools (tinker requires >= 3.11)")
    elif py_version >= (3, 8):
        check_warn(f"Python {py_version.major}.{py_version.minor}.{py_version.micro}", "(3.10+ recommended)")
    else:
@@ -100,7 +103,7 @@ def run_doctor(args):
            check_ok(name)
        except ImportError:
            check_fail(name, "(missing)")
-            issues.append(f"Install {name}: pip install {module}")
+            issues.append(f"Install {name}: uv pip install {module}")
    
    for module, name in optional_packages:
        try:
@@ -263,6 +266,39 @@ def run_doctor(args):
        except Exception as e:
            check_warn("Anthropic API", f"({e})")
    
+    # =========================================================================
+    # Check: Submodules
+    # =========================================================================
+    print()
+    print(color("◆ Submodules", Colors.CYAN, Colors.BOLD))
+    
+    # mini-swe-agent (terminal tool backend)
+    mini_swe_dir = PROJECT_ROOT / "mini-swe-agent"
+    if mini_swe_dir.exists() and (mini_swe_dir / "pyproject.toml").exists():
+        try:
+            __import__("minisweagent")
+            check_ok("mini-swe-agent", "(terminal backend)")
+        except ImportError:
+            check_warn("mini-swe-agent found but not installed", "(run: uv pip install -e ./mini-swe-agent)")
+            issues.append("Install mini-swe-agent: uv pip install -e ./mini-swe-agent")
+    else:
+        check_warn("mini-swe-agent not found", "(run: git submodule update --init --recursive)")
+    
+    # tinker-atropos (RL training backend)
+    tinker_dir = PROJECT_ROOT / "tinker-atropos"
+    if tinker_dir.exists() and (tinker_dir / "pyproject.toml").exists():
+        if py_version >= (3, 11):
+            try:
+                __import__("tinker_atropos")
+                check_ok("tinker-atropos", "(RL training backend)")
+            except ImportError:
+                check_warn("tinker-atropos found but not installed", "(run: uv pip install -e ./tinker-atropos)")
+                issues.append("Install tinker-atropos: uv pip install -e ./tinker-atropos")
+        else:
+            check_warn("tinker-atropos requires Python 3.11+", f"(current: {py_version.major}.{py_version.minor})")
+    else:
+        check_warn("tinker-atropos not found", "(run: git submodule update --init --recursive)")
+    
    # =========================================================================
    # Check: Tool Availability
    # =========================================================================
--- a/hermes_cli/main.py
+++ b/hermes_cli/main.py
@@ -119,6 +119,7 @@ def cmd_uninstall(args):
 def cmd_update(args):
    """Update Hermes Agent to the latest version."""
    import subprocess
+    import shutil
    
    print("🦋 Updating Hermes Agent...")
    print()
@@ -163,13 +164,21 @@ def cmd_update(args):
        print("→ Pulling updates...")
        subprocess.run(["git", "pull", "origin", branch], cwd=PROJECT_ROOT, check=True)
        
-        # Reinstall Python dependencies
+        # Reinstall Python dependencies (prefer uv for speed, fall back to pip)
        print("→ Updating Python dependencies...")
-        venv_pip = PROJECT_ROOT / "venv" / "bin" / "pip"
-        if venv_pip.exists():
-            subprocess.run([str(venv_pip), "install", "-e", ".", "--quiet"], cwd=PROJECT_ROOT, check=True)
+        uv_bin = shutil.which("uv")
+        if uv_bin:
+            subprocess.run(
+                [uv_bin, "pip", "install", "-e", ".", "--quiet"],
+                cwd=PROJECT_ROOT, check=True,
+                env={**os.environ, "VIRTUAL_ENV": str(PROJECT_ROOT / "venv")}
+            )
        else:
-            subprocess.run(["pip", "install", "-e", ".", "--quiet"], cwd=PROJECT_ROOT, check=True)
+            venv_pip = PROJECT_ROOT / "venv" / "bin" / "pip"
+            if venv_pip.exists():
+                subprocess.run([str(venv_pip), "install", "-e", ".", "--quiet"], cwd=PROJECT_ROOT, check=True)
+            else:
+                subprocess.run(["pip", "install", "-e", ".", "--quiet"], cwd=PROJECT_ROOT, check=True)
        
        # Check for Node.js deps
        if (PROJECT_ROOT / "package.json").exists():
--- a/hermes_cli/setup.py
+++ b/hermes_cli/setup.py
@@ -652,6 +652,32 @@ def run_setup_wizard(args):
        print_info("Modal Cloud Configuration:")
        print_info("Get credentials at: https://modal.com/settings")
        
+        # Check if swe-rex[modal] is installed, install if missing
+        try:
+            from swerex.deployment.modal import ModalDeployment
+            print_info("swe-rex[modal] package: installed ✓")
+        except ImportError:
+            print_info("Installing required package: swe-rex[modal]...")
+            import subprocess
+            import shutil
+            # Prefer uv for speed, fall back to pip
+            uv_bin = shutil.which("uv")
+            if uv_bin:
+                result = subprocess.run(
+                    [uv_bin, "pip", "install", "swe-rex[modal]>=1.4.0"],
+                    capture_output=True, text=True
+                )
+            else:
+                result = subprocess.run(
+                    [sys.executable, "-m", "pip", "install", "swe-rex[modal]>=1.4.0"],
+                    capture_output=True, text=True
+                )
+            if result.returncode == 0:
+                print_success("swe-rex[modal] installed (includes modal + boto3)")
+            else:
+                print_warning("Failed to install swe-rex[modal] — install manually:")
+                print_info('  uv pip install "swe-rex[modal]>=1.4.0"')
+        
        # Always show current status and allow reconfiguration
        current_token = get_env_value('MODAL_TOKEN_ID')
        if current_token:
@@ -917,6 +943,24 @@ def run_setup_wizard(args):
                save_env_value("BROWSERBASE_API_KEY", api_key)
            if project_id:
                save_env_value("BROWSERBASE_PROJECT_ID", project_id)
+            
+            # Check if Node.js dependencies are installed (required for browser tools)
+            import shutil
+            node_modules = PROJECT_ROOT / "node_modules" / "agent-browser"
+            if not node_modules.exists() and shutil.which("npm"):
+                print_info("    Installing Node.js dependencies for browser tools...")
+                import subprocess
+                result = subprocess.run(
+                    ["npm", "install", "--silent"],
+                    capture_output=True, text=True, cwd=str(PROJECT_ROOT)
+                )
+                if result.returncode == 0:
+                    print_success("    Node.js dependencies installed")
+                else:
+                    print_warning("    npm install failed — run manually: cd ~/.hermes/hermes-agent && npm install")
+            elif not node_modules.exists():
+                print_warning("    Node.js not found — browser tools require: npm install (in the hermes-agent directory)")
+            
            print_success("    Configured ✓")
    print()
    
@@ -950,6 +994,11 @@ def run_setup_wizard(args):
    tinker_configured = get_env_value('TINKER_API_KEY')
    wandb_configured = get_env_value('WANDB_API_KEY')
    
+    # Check Python version requirement upfront
+    rl_python_ok = sys.version_info >= (3, 11)
+    if not rl_python_ok:
+        print_warning(f"  Requires Python 3.11+ (current: {sys.version_info.major}.{sys.version_info.minor})")
+    
    if tinker_configured and wandb_configured:
        print_success("  Status: Configured ✓")
        if prompt_yes_no("  Update RL training credentials?", False):
@@ -969,18 +1018,55 @@ def run_setup_wizard(args):
            print_warning("  Status: Not configured (tools will be disabled)")
        
        if prompt_yes_no("  Set up RL Training?", False):
-            print_info("    Get Tinker key at: https://tinker-console.thinkingmachines.ai/keys")
-            print_info("    Get WandB key at: https://wandb.ai/authorize")
-            api_key = prompt("    Tinker API key", password=True)
-            if api_key:
-                save_env_value("TINKER_API_KEY", api_key)
-            wandb_key = prompt("    WandB API key", password=True)
-            if wandb_key:
-                save_env_value("WANDB_API_KEY", wandb_key)
-            if api_key and wandb_key:
-                print_success("    Configured ✓")
+            # Check Python version before proceeding
+            if not rl_python_ok:
+                print_error(f"    Python 3.11+ required (current: {sys.version_info.major}.{sys.version_info.minor})")
+                print_info("    Upgrade Python and reinstall to enable RL training tools")
            else:
-                print_warning("    Partially configured (both keys required)")
+                print_info("    Get Tinker key at: https://tinker-console.thinkingmachines.ai/keys")
+                print_info("    Get WandB key at: https://wandb.ai/authorize")
+                api_key = prompt("    Tinker API key", password=True)
+                if api_key:
+                    save_env_value("TINKER_API_KEY", api_key)
+                wandb_key = prompt("    WandB API key", password=True)
+                if wandb_key:
+                    save_env_value("WANDB_API_KEY", wandb_key)
+                
+                # Check if tinker-atropos submodule is installed
+                try:
+                    __import__("tinker_atropos")
+                except ImportError:
+                    tinker_dir = PROJECT_ROOT / "tinker-atropos"
+                    if tinker_dir.exists() and (tinker_dir / "pyproject.toml").exists():
+                        print_info("    Installing tinker-atropos submodule...")
+                        import subprocess
+                        import shutil
+                        # Prefer uv for speed, fall back to pip
+                        uv_bin = shutil.which("uv")
+                        if uv_bin:
+                            result = subprocess.run(
+                                [uv_bin, "pip", "install", "-e", str(tinker_dir)],
+                                capture_output=True, text=True
+                            )
+                        else:
+                            result = subprocess.run(
+                                [sys.executable, "-m", "pip", "install", "-e", str(tinker_dir)],
+                                capture_output=True, text=True
+                            )
+                        if result.returncode == 0:
+                            print_success("    tinker-atropos installed")
+                        else:
+                            print_warning("    tinker-atropos install failed — run manually:")
+                            print_info('      uv pip install -e "./tinker-atropos"')
+                    else:
+                        print_warning("    tinker-atropos submodule not found — run:")
+                        print_info("      git submodule update --init --recursive")
+                        print_info('      uv pip install -e "./tinker-atropos"')
+                
+                if api_key and wandb_key:
+                    print_success("    Configured ✓")
+                else:
+                    print_warning("    Partially configured (both keys required)")
    
    # =========================================================================
    # Save config and show summary
--- a/model_tools.py
+++ b/model_tools.py
@@ -1191,8 +1191,19 @@ def handle_web_function_call(function_name: str, function_args: Dict[str, Any])
        urls = function_args.get("urls", [])
        # Limit URLs to prevent abuse
        urls = urls[:5] if isinstance(urls, list) else []
-        # Run async function in event loop
-        return asyncio.run(web_extract_tool(urls, "markdown"))
+        # Run async function -- use existing loop if available (Atropos),
+        # otherwise create one (normal CLI)
+        try:
+            loop = asyncio.get_running_loop()
+            # Already in an async context (Atropos) -- run in a thread
+            import concurrent.futures
+            with concurrent.futures.ThreadPoolExecutor(max_workers=1) as pool:
+                return pool.submit(
+                    lambda: asyncio.run(web_extract_tool(urls, "markdown"))
+                ).result(timeout=120)
+        except RuntimeError:
+            # No running loop (normal CLI) -- use asyncio.run directly
+            return asyncio.run(web_extract_tool(urls, "markdown"))
    
    else:
        return json.dumps({"error": f"Unknown web function: {function_name}"}, ensure_ascii=False)
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -22,6 +22,8 @@ dependencies = [
  "requests",
  "jinja2",
  "pydantic>=2.0",
+  # Interactive CLI (prompt_toolkit is used directly by cli.py)
+  "prompt_toolkit",
  # Tools
  "firecrawl-py",
  "fal-client",
@@ -32,12 +34,18 @@ dependencies = [
 ]

 [project.optional-dependencies]
-modal = ["modal", "boto3"]
+modal = ["swe-rex[modal]>=1.4.0"]
 dev = ["pytest", "pytest-asyncio"]
-messaging = ["python-telegram-bot>=20.0", "discord.py>=2.0"]
+messaging = ["python-telegram-bot>=20.0", "discord.py>=2.0", "aiohttp>=3.9.0"]
 cron = ["croniter"]
 cli = ["simple-term-menu"]
-all = ["croniter", "python-telegram-bot>=20.0", "discord.py>=2.0", "simple-term-menu"]
+all = [
+  "hermes-agent[modal]",
+  "hermes-agent[messaging]",
+  "hermes-agent[cron]",
+  "hermes-agent[cli]",
+  "hermes-agent[dev]",
+]

 [project.scripts]
 hermes = "hermes_cli.main:main"
--- a/requirements.txt
+++ b/requirements.txt
@@ -6,6 +6,10 @@ httpx
 rich
 tenacity
 prompt_toolkit
+pyyaml
+requests
+jinja2
+pydantic>=2.0

 # Web tools
 firecrawl-py
@@ -15,10 +19,6 @@ fal-client

 # mini-swe-agent dependencies (for terminal tool)
 # Note: Install mini-swe-agent itself with: pip install -e ./mini-swe-agent
-pyyaml
-requests
-jinja2
-pydantic>=2.0
 litellm>=1.75.5
 typer
 platformdirs
@@ -27,18 +27,17 @@ platformdirs
 # Requires Docker installed and user in 'docker' group

 # Optional: For Modal backend (cloud execution)
-# modal
-# boto3
+# swe-rex[modal]>=1.4.0  # Includes modal + boto3 + swe-rex runtime

 # Optional: For cron expression parsing (cronjob scheduling)
 croniter

 # Optional: For messaging platform integrations (gateway)
-# Telegram: pip install python-telegram-bot
+# Telegram
 python-telegram-bot>=20.0

-# Discord: pip install discord.py
+# Discord
 discord.py>=2.0

-# WhatsApp: Requires Node.js bridge (see docs/messaging.md)
-# aiohttp  # For WhatsApp bridge communication
+# WhatsApp bridge communication + general async HTTP (used by gateway)
+aiohttp>=3.9.0
--- a/scripts/install.ps1
+++ b/scripts/install.ps1
@@ -2,6 +2,7 @@
 # Hermes Agent Installer for Windows
 # ============================================================================
 # Installation script for Windows (PowerShell).
+# Uses uv for fast Python provisioning and package management.
 #
 # Usage:
 #   irm https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.ps1 | iex
@@ -27,6 +28,7 @@ $ErrorActionPreference = "Stop"

 $RepoUrlSsh = "git@github.com:NousResearch/hermes-agent.git"
 $RepoUrlHttps = "https://github.com/NousResearch/hermes-agent.git"
+$PythonVersion = "3.11"

 # ============================================================================
 # Helper functions
@@ -52,12 +54,12 @@ function Write-Success {
    Write-Host "✓ $Message" -ForegroundColor Green
 }

-function Write-Warning {
+function Write-Warn {
    param([string]$Message)
    Write-Host "⚠ $Message" -ForegroundColor Yellow
 }

-function Write-Error {
+function Write-Err {
    param([string]$Message)
    Write-Host "✗ $Message" -ForegroundColor Red
 }
@@ -66,33 +68,93 @@ function Write-Error {
 # Dependency checks
 # ============================================================================

-function Test-Python {
-    Write-Info "Checking Python..."
+function Install-Uv {
+    Write-Info "Checking for uv package manager..."
    
-    # Try different python commands
-    $pythonCmds = @("python3", "python", "py -3")
+    # Check if uv is already available
+    if (Get-Command uv -ErrorAction SilentlyContinue) {
+        $version = uv --version
+        $script:UvCmd = "uv"
+        Write-Success "uv found ($version)"
+        return $true
+    }
    
-    foreach ($cmd in $pythonCmds) {
-        try {
-            $version = & $cmd.Split()[0] $cmd.Split()[1..99] -c "import sys; print(f'{sys.version_info.major}.{sys.version_info.minor}')" 2>$null
-            if ($version) {
-                $major, $minor = $version.Split('.')
-                if ([int]$major -ge 3 -and [int]$minor -ge 10) {
-                    $script:PythonCmd = $cmd
-                    Write-Success "Python $version found"
-                    return $true
-                }
-            }
-        } catch {
-            # Try next command
+    # Check common install locations
+    $uvPaths = @(
+        "$env:USERPROFILE\.local\bin\uv.exe",
+        "$env:USERPROFILE\.cargo\bin\uv.exe"
+    )
+    foreach ($uvPath in $uvPaths) {
+        if (Test-Path $uvPath) {
+            $script:UvCmd = $uvPath
+            $version = & $uvPath --version
+            Write-Success "uv found at $uvPath ($version)"
+            return $true
        }
    }
    
-    Write-Error "Python 3.10+ not found"
-    Write-Info "Please install Python 3.10 or newer from:"
-    Write-Info "  https://www.python.org/downloads/"
-    Write-Info ""
-    Write-Info "Make sure to check 'Add Python to PATH' during installation"
+    # Install uv
+    Write-Info "Installing uv (fast Python package manager)..."
+    try {
+        powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex" 2>&1 | Out-Null
+        
+        # Find the installed binary
+        $uvExe = "$env:USERPROFILE\.local\bin\uv.exe"
+        if (-not (Test-Path $uvExe)) {
+            $uvExe = "$env:USERPROFILE\.cargo\bin\uv.exe"
+        }
+        if (-not (Test-Path $uvExe)) {
+            # Refresh PATH and try again
+            $env:Path = [Environment]::GetEnvironmentVariable("Path", "User") + ";" + [Environment]::GetEnvironmentVariable("Path", "Machine")
+            if (Get-Command uv -ErrorAction SilentlyContinue) {
+                $uvExe = (Get-Command uv).Source
+            }
+        }
+        
+        if (Test-Path $uvExe) {
+            $script:UvCmd = $uvExe
+            $version = & $uvExe --version
+            Write-Success "uv installed ($version)"
+            return $true
+        }
+        
+        Write-Err "uv installed but not found on PATH"
+        Write-Info "Try restarting your terminal and re-running"
+        return $false
+    } catch {
+        Write-Err "Failed to install uv"
+        Write-Info "Install manually: https://docs.astral.sh/uv/getting-started/installation/"
+        return $false
+    }
+}
+
+function Test-Python {
+    Write-Info "Checking Python $PythonVersion..."
+    
+    # Let uv find or install Python
+    try {
+        $pythonPath = & $UvCmd python find $PythonVersion 2>$null
+        if ($pythonPath) {
+            $ver = & $pythonPath --version 2>$null
+            Write-Success "Python found: $ver"
+            return $true
+        }
+    } catch { }
+    
+    # Python not found — use uv to install it (no admin needed!)
+    Write-Info "Python $PythonVersion not found, installing via uv..."
+    try {
+        & $UvCmd python install $PythonVersion 2>&1 | Out-Null
+        $pythonPath = & $UvCmd python find $PythonVersion 2>$null
+        if ($pythonPath) {
+            $ver = & $pythonPath --version 2>$null
+            Write-Success "Python installed: $ver"
+            return $true
+        }
+    } catch { }
+    
+    Write-Err "Failed to install Python $PythonVersion"
+    Write-Info "Install Python $PythonVersion manually, then re-run this script"
    return $false
 }

@@ -105,7 +167,7 @@ function Test-Git {
        return $true
    }
    
-    Write-Error "Git not found"
+    Write-Err "Git not found"
    Write-Info "Please install Git from:"
    Write-Info "  https://git-scm.com/download/win"
    return $false
@@ -121,7 +183,7 @@ function Test-Node {
        return $true
    }
    
-    Write-Warning "Node.js not found (browser tools will be limited)"
+    Write-Warn "Node.js not found (browser tools will be limited)"
    Write-Info "To install Node.js (optional):"
    Write-Info "  https://nodejs.org/en/download/"
    $script:HasNode = $false
@@ -138,7 +200,7 @@ function Test-Ripgrep {
        return $true
    }
    
-    Write-Warning "ripgrep not found (file search will use findstr fallback)"
+    Write-Warn "ripgrep not found (file search will use findstr fallback)"
    
    # Check what package managers are available
    $hasWinget = Get-Command winget -ErrorAction SilentlyContinue
@@ -185,7 +247,7 @@ function Test-Ripgrep {
            } catch { }
        }
        
-        Write-Warning "Auto-install failed. You can install manually:"
+        Write-Warn "Auto-install failed. You can install manually:"
    } else {
        Write-Info "Skipping ripgrep installation. To install manually:"
    }
@@ -216,13 +278,12 @@ function Install-Repository {
            git pull origin $Branch
            Pop-Location
        } else {
-            Write-Error "Directory exists but is not a git repository: $InstallDir"
+            Write-Err "Directory exists but is not a git repository: $InstallDir"
            Write-Info "Remove it or choose a different directory with -InstallDir"
            exit 1
        }
    } else {
        # Try SSH first (for private repo access), fall back to HTTPS
-        # Use --recurse-submodules to also clone mini-swe-agent and tinker-atropos
        Write-Info "Trying SSH clone..."
        $sshResult = git clone --branch $Branch --recurse-submodules $RepoUrlSsh $InstallDir 2>&1
        
@@ -235,7 +296,7 @@ function Install-Repository {
            if ($LASTEXITCODE -eq 0) {
                Write-Success "Cloned via HTTPS"
            } else {
-                Write-Error "Failed to clone repository"
+                Write-Err "Failed to clone repository"
                Write-Info "For private repo access, ensure your SSH key is added to GitHub:"
                Write-Info "  ssh-add ~/.ssh/id_rsa"
                Write-Info "  ssh -T git@github.com  # Test connection"
@@ -244,7 +305,7 @@ function Install-Repository {
        }
    }
    
-    # Ensure submodules are initialized and updated (for existing installs or if --recurse failed)
+    # Ensure submodules are initialized and updated
    Write-Info "Initializing submodules (mini-swe-agent, tinker-atropos)..."
    Push-Location $InstallDir
    git submodule update --init --recursive
@@ -260,23 +321,21 @@ function Install-Venv {
        return
    }
    
-    Write-Info "Creating virtual environment..."
+    Write-Info "Creating virtual environment with Python $PythonVersion..."
    
    Push-Location $InstallDir
    
-    if (-not (Test-Path "venv")) {
-        & $PythonCmd -m venv venv
+    if (Test-Path "venv") {
+        Write-Info "Virtual environment already exists, recreating..."
+        Remove-Item -Recurse -Force "venv"
    }
    
-    # Activate
-    & .\venv\Scripts\Activate.ps1
-    
-    # Upgrade pip
-    pip install --upgrade pip wheel setuptools | Out-Null
+    # uv creates the venv and pins the Python version in one step
+    & $UvCmd venv venv --python $PythonVersion
    
    Pop-Location
    
-    Write-Success "Virtual environment ready"
+    Write-Success "Virtual environment ready (Python $PythonVersion)"
 }

 function Install-Dependencies {
@@ -285,14 +344,15 @@ function Install-Dependencies {
    Push-Location $InstallDir
    
    if (-not $NoVenv) {
-        & .\venv\Scripts\Activate.ps1
+        # Tell uv to install into our venv (no activation needed)
+        $env:VIRTUAL_ENV = "$InstallDir\venv"
    }
    
-    # Install main package
+    # Install main package with all extras
    try {
-        pip install -e ".[all]" 2>&1 | Out-Null
+        & $UvCmd pip install -e ".[all]" 2>&1 | Out-Null
    } catch {
-        pip install -e "." | Out-Null
+        & $UvCmd pip install -e "." | Out-Null
    }
    
    Write-Success "Main package installed"
@@ -301,25 +361,25 @@ function Install-Dependencies {
    Write-Info "Installing mini-swe-agent (terminal tool backend)..."
    if (Test-Path "mini-swe-agent\pyproject.toml") {
        try {
-            pip install -e ".\mini-swe-agent" 2>&1 | Out-Null
+            & $UvCmd pip install -e ".\mini-swe-agent" 2>&1 | Out-Null
            Write-Success "mini-swe-agent installed"
        } catch {
-            Write-Warning "mini-swe-agent install failed (terminal tools may not work)"
+            Write-Warn "mini-swe-agent install failed (terminal tools may not work)"
        }
    } else {
-        Write-Warning "mini-swe-agent not found (run: git submodule update --init)"
+        Write-Warn "mini-swe-agent not found (run: git submodule update --init)"
    }
    
    Write-Info "Installing tinker-atropos (RL training backend)..."
    if (Test-Path "tinker-atropos\pyproject.toml") {
        try {
-            pip install -e ".\tinker-atropos" 2>&1 | Out-Null
+            & $UvCmd pip install -e ".\tinker-atropos" 2>&1 | Out-Null
            Write-Success "tinker-atropos installed"
        } catch {
-            Write-Warning "tinker-atropos install failed (RL tools may not work)"
+            Write-Warn "tinker-atropos install failed (RL tools may not work)"
        }
    } else {
-        Write-Warning "tinker-atropos not found (run: git submodule update --init)"
+        Write-Warn "tinker-atropos not found (run: git submodule update --init)"
    }
    
    Pop-Location
@@ -328,41 +388,44 @@ function Install-Dependencies {
 }

 function Set-PathVariable {
-    Write-Info "Setting up PATH..."
+    Write-Info "Setting up hermes command..."
    
    if ($NoVenv) {
-        $binDir = "$InstallDir"
+        $hermesBin = "$InstallDir"
    } else {
-        $binDir = "$InstallDir\venv\Scripts"
+        $hermesBin = "$InstallDir\venv\Scripts"
    }
    
-    # Add to user PATH
+    # Add the venv Scripts dir to user PATH so hermes is globally available
+    # On Windows, the hermes.exe in venv\Scripts\ has the venv Python baked in
    $currentPath = [Environment]::GetEnvironmentVariable("Path", "User")
    
-    if ($currentPath -notlike "*$binDir*") {
+    if ($currentPath -notlike "*$hermesBin*") {
        [Environment]::SetEnvironmentVariable(
            "Path",
-            "$binDir;$currentPath",
+            "$hermesBin;$currentPath",
            "User"
        )
-        Write-Success "Added to user PATH"
+        Write-Success "Added to user PATH: $hermesBin"
    } else {
        Write-Info "PATH already configured"
    }
    
    # Update current session
-    $env:Path = "$binDir;$env:Path"
+    $env:Path = "$hermesBin;$env:Path"
+    
+    Write-Success "hermes command ready"
 }

 function Copy-ConfigTemplates {
    Write-Info "Setting up configuration files..."
    
-    # Create ~/.hermes directory structure (config at top level, code in subdir)
+    # Create ~/.hermes directory structure
    New-Item -ItemType Directory -Force -Path "$HermesHome\cron" | Out-Null
    New-Item -ItemType Directory -Force -Path "$HermesHome\sessions" | Out-Null
    New-Item -ItemType Directory -Force -Path "$HermesHome\logs" | Out-Null
    
-    # Create .env at ~/.hermes/.env (top level, easy to find)
+    # Create .env
    $envPath = "$HermesHome\.env"
    if (-not (Test-Path $envPath)) {
        $examplePath = "$InstallDir\.env.example"
@@ -370,7 +433,6 @@ function Copy-ConfigTemplates {
            Copy-Item $examplePath $envPath
            Write-Success "Created ~/.hermes/.env from template"
        } else {
-            # Create empty .env if no example exists
            New-Item -ItemType File -Force -Path $envPath | Out-Null
            Write-Success "Created ~/.hermes/.env"
        }
@@ -378,7 +440,7 @@ function Copy-ConfigTemplates {
        Write-Info "~/.hermes/.env already exists, keeping it"
    }
    
-    # Create config.yaml at ~/.hermes/config.yaml (top level, easy to find)
+    # Create config.yaml
    $configPath = "$HermesHome\config.yaml"
    if (-not (Test-Path $configPath)) {
        $examplePath = "$InstallDir\cli-config.yaml.example"
@@ -407,7 +469,7 @@ function Install-NodeDeps {
            npm install --silent 2>&1 | Out-Null
            Write-Success "Node.js dependencies installed"
        } catch {
-            Write-Warning "npm install failed (browser tools may not work)"
+            Write-Warn "npm install failed (browser tools may not work)"
        }
    }
    
@@ -426,12 +488,13 @@ function Invoke-SetupWizard {
    
    Push-Location $InstallDir
    
+    # Run hermes setup using the venv Python directly (no activation needed)
    if (-not $NoVenv) {
-        & .\venv\Scripts\Activate.ps1
+        & ".\venv\Scripts\python.exe" -m hermes_cli.main setup
+    } else {
+        python -m hermes_cli.main setup
    }
    
-    python -m hermes_cli.main setup
-    
    Pop-Location
 }

@@ -478,7 +541,6 @@ function Write-Completion {
    Write-Host "⚡ Restart your terminal for PATH changes to take effect" -ForegroundColor Yellow
    Write-Host ""
    
-    # Show notes about optional tools
    if (-not $HasNode) {
        Write-Host "Note: Node.js was not found. Browser automation tools" -ForegroundColor Yellow
        Write-Host "will have limited functionality." -ForegroundColor Yellow
@@ -500,6 +562,7 @@ function Write-Completion {
 function Main {
    Write-Banner
    
+    if (-not (Install-Uv)) { exit 1 }
    if (-not (Test-Python)) { exit 1 }
    if (-not (Test-Git)) { exit 1 }
    Test-Node      # Optional, doesn't fail
--- a/scripts/install.sh
+++ b/scripts/install.sh
@@ -3,6 +3,7 @@
 # Hermes Agent Installer
 # ============================================================================
 # Installation script for Linux and macOS.
+# Uses uv for fast Python provisioning and package management.
 #
 # Usage:
 #   curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
@@ -29,7 +30,7 @@ REPO_URL_SSH="git@github.com:NousResearch/hermes-agent.git"
 REPO_URL_HTTPS="https://github.com/NousResearch/hermes-agent.git"
 HERMES_HOME="$HOME/.hermes"
 INSTALL_DIR="${HERMES_INSTALL_DIR:-$HERMES_HOME/hermes-agent}"
-PYTHON_MIN_VERSION="3.10"
+PYTHON_VERSION="3.11"

 # Options
 USE_VENV=true
@@ -64,7 +65,7 @@ while [[ $# -gt 0 ]]; do
            echo "  --no-venv      Don't create virtual environment"
            echo "  --skip-setup   Skip interactive setup wizard"
            echo "  --branch NAME  Git branch to install (default: main)"
-            echo "  --dir PATH     Installation directory (default: ~/.hermes-agent)"
+            echo "  --dir PATH     Installation directory (default: ~/.hermes/hermes-agent)"
            echo "  -h, --help     Show this help"
            exit 0
            ;;
@@ -146,50 +147,80 @@ detect_os() {
 # Dependency checks
 # ============================================================================

-check_python() {
-    log_info "Checking Python..."
+install_uv() {
+    log_info "Checking for uv package manager..."
    
-    # Try different python commands
-    for cmd in python3.12 python3.11 python3.10 python3 python; do
-        if command -v $cmd &> /dev/null; then
-            PYTHON_CMD=$cmd
-            PYTHON_VERSION=$($cmd -c 'import sys; print(f"{sys.version_info.major}.{sys.version_info.minor}")')
-            
-            # Check version
-            if python3 -c "import sys; exit(0 if sys.version_info >= (3, 10) else 1)" 2>/dev/null; then
-                log_success "Python $PYTHON_VERSION found"
-                return 0
-            fi
+    # Check common locations for uv
+    if command -v uv &> /dev/null; then
+        UV_CMD="uv"
+        UV_VERSION=$($UV_CMD --version 2>/dev/null)
+        log_success "uv found ($UV_VERSION)"
+        return 0
+    fi
+    
+    # Check ~/.local/bin (default uv install location) even if not on PATH yet
+    if [ -x "$HOME/.local/bin/uv" ]; then
+        UV_CMD="$HOME/.local/bin/uv"
+        UV_VERSION=$($UV_CMD --version 2>/dev/null)
+        log_success "uv found at ~/.local/bin ($UV_VERSION)"
+        return 0
+    fi
+    
+    # Check ~/.cargo/bin (alternative uv install location)
+    if [ -x "$HOME/.cargo/bin/uv" ]; then
+        UV_CMD="$HOME/.cargo/bin/uv"
+        UV_VERSION=$($UV_CMD --version 2>/dev/null)
+        log_success "uv found at ~/.cargo/bin ($UV_VERSION)"
+        return 0
+    fi
+    
+    # Install uv
+    log_info "Installing uv (fast Python package manager)..."
+    if curl -LsSf https://astral.sh/uv/install.sh | sh 2>/dev/null; then
+        # uv installs to ~/.local/bin by default
+        if [ -x "$HOME/.local/bin/uv" ]; then
+            UV_CMD="$HOME/.local/bin/uv"
+        elif [ -x "$HOME/.cargo/bin/uv" ]; then
+            UV_CMD="$HOME/.cargo/bin/uv"
+        elif command -v uv &> /dev/null; then
+            UV_CMD="uv"
+        else
+            log_error "uv installed but not found on PATH"
+            log_info "Try adding ~/.local/bin to your PATH and re-running"
+            exit 1
        fi
-    done
+        UV_VERSION=$($UV_CMD --version 2>/dev/null)
+        log_success "uv installed ($UV_VERSION)"
+    else
+        log_error "Failed to install uv"
+        log_info "Install manually: https://docs.astral.sh/uv/getting-started/installation/"
+        exit 1
+    fi
+}
+
+check_python() {
+    log_info "Checking Python $PYTHON_VERSION..."
    
-    log_error "Python 3.10+ not found"
-    log_info "Please install Python 3.10 or newer:"
+    # Let uv handle Python — it can download and manage Python versions
+    # First check if a suitable Python is already available
+    if $UV_CMD python find "$PYTHON_VERSION" &> /dev/null; then
+        PYTHON_PATH=$($UV_CMD python find "$PYTHON_VERSION")
+        PYTHON_FOUND_VERSION=$($PYTHON_PATH --version 2>/dev/null)
+        log_success "Python found: $PYTHON_FOUND_VERSION"
+        return 0
+    fi
    
-    case "$OS" in
-        linux)
-            case "$DISTRO" in
-                ubuntu|debian)
-                    log_info "  sudo apt update && sudo apt install python3.11 python3.11-venv"
-                    ;;
-                fedora)
-                    log_info "  sudo dnf install python3.11"
-                    ;;
-                arch)
-                    log_info "  sudo pacman -S python"
-                    ;;
-                *)
-                    log_info "  Use your package manager to install Python 3.10+"
-                    ;;
-            esac
-            ;;
-        macos)
-            log_info "  brew install python@3.11"
-            log_info "  Or download from https://www.python.org/downloads/"
-            ;;
-    esac
-    
-    exit 1
+    # Python not found — use uv to install it (no sudo needed!)
+    log_info "Python $PYTHON_VERSION not found, installing via uv..."
+    if $UV_CMD python install "$PYTHON_VERSION"; then
+        PYTHON_PATH=$($UV_CMD python find "$PYTHON_VERSION")
+        PYTHON_FOUND_VERSION=$($PYTHON_PATH --version 2>/dev/null)
+        log_success "Python installed: $PYTHON_FOUND_VERSION"
+    else
+        log_error "Failed to install Python $PYTHON_VERSION"
+        log_info "Install Python $PYTHON_VERSION manually, then re-run this script"
+        exit 1
+    fi
 }

 check_git() {
@@ -294,7 +325,6 @@ check_ripgrep() {
        # Check if we can use sudo
        CAN_SUDO=false
        if command -v sudo &> /dev/null; then
-            # Check if user has sudo access (without actually running sudo)
            if sudo -n true 2>/dev/null || sudo -v 2>/dev/null; then
                CAN_SUDO=true
            fi
@@ -328,7 +358,6 @@ check_ripgrep() {
                    esac
                else
                    log_warn "sudo not available - cannot auto-install system packages"
-                    # Try cargo as fallback if available
                    if command -v cargo &> /dev/null; then
                        log_info "Trying cargo install (no sudo required)..."
                        if cargo install ripgrep 2>/dev/null; then
@@ -371,7 +400,6 @@ check_ripgrep() {
                    log_info "  https://github.com/BurntSushi/ripgrep#installation"
                    ;;
            esac
-            # Show cargo alternative for users without sudo
            if command -v cargo &> /dev/null; then
                log_info "  Or without sudo: cargo install ripgrep"
            fi
@@ -440,39 +468,36 @@ setup_venv() {
        return 0
    fi
    
-    log_info "Creating virtual environment..."
+    log_info "Creating virtual environment with Python $PYTHON_VERSION..."
    
    if [ -d "venv" ]; then
-        log_info "Virtual environment already exists"
-    else
-        $PYTHON_CMD -m venv venv
+        log_info "Virtual environment already exists, recreating..."
+        rm -rf venv
    fi
    
-    # Activate
-    source venv/bin/activate
+    # uv creates the venv and pins the Python version in one step
+    $UV_CMD venv venv --python "$PYTHON_VERSION"
    
-    # Upgrade pip
-    pip install --upgrade pip wheel setuptools > /dev/null
-    
-    log_success "Virtual environment ready"
+    log_success "Virtual environment ready (Python $PYTHON_VERSION)"
 }

 install_deps() {
    log_info "Installing dependencies..."
    
    if [ "$USE_VENV" = true ]; then
-        source venv/bin/activate
+        # Tell uv to install into our venv (no need to activate)
+        export VIRTUAL_ENV="$INSTALL_DIR/venv"
    fi
    
    # Install the main package in editable mode with all extras
-    pip install -e ".[all]" > /dev/null 2>&1 || pip install -e "." > /dev/null
+    $UV_CMD pip install -e ".[all]" || $UV_CMD pip install -e "."
    
    log_success "Main package installed"
    
    # Install submodules
    log_info "Installing mini-swe-agent (terminal tool backend)..."
    if [ -d "mini-swe-agent" ] && [ -f "mini-swe-agent/pyproject.toml" ]; then
-        pip install -e "./mini-swe-agent" > /dev/null 2>&1 || log_warn "mini-swe-agent install failed (terminal tools may not work)"
+        $UV_CMD pip install -e "./mini-swe-agent" || log_warn "mini-swe-agent install failed (terminal tools may not work)"
        log_success "mini-swe-agent installed"
    else
        log_warn "mini-swe-agent not found (run: git submodule update --init)"
@@ -480,7 +505,7 @@ install_deps() {
    
    log_info "Installing tinker-atropos (RL training backend)..."
    if [ -d "tinker-atropos" ] && [ -f "tinker-atropos/pyproject.toml" ]; then
-        pip install -e "./tinker-atropos" > /dev/null 2>&1 || log_warn "tinker-atropos install failed (RL tools may not work)"
+        $UV_CMD pip install -e "./tinker-atropos" || log_warn "tinker-atropos install failed (RL tools may not work)"
        log_success "tinker-atropos installed"
    else
        log_warn "tinker-atropos not found (run: git submodule update --init)"
@@ -490,53 +515,56 @@ install_deps() {
 }

 setup_path() {
-    log_info "Setting up PATH..."
+    log_info "Setting up hermes command..."
    
-    # Determine the bin directory
    if [ "$USE_VENV" = true ]; then
-        BIN_DIR="$INSTALL_DIR/venv/bin"
+        HERMES_BIN="$INSTALL_DIR/venv/bin/hermes"
    else
-        BIN_DIR="$HOME/.local/bin"
-        mkdir -p "$BIN_DIR"
+        HERMES_BIN="$(which hermes 2>/dev/null || echo "")"
+        if [ -z "$HERMES_BIN" ]; then
+            log_warn "hermes not found on PATH after install"
+            return 0
+        fi
+    fi
+    
+    # Create symlink in ~/.local/bin (standard user binary location, usually on PATH)
+    mkdir -p "$HOME/.local/bin"
+    ln -sf "$HERMES_BIN" "$HOME/.local/bin/hermes"
+    log_success "Symlinked hermes → ~/.local/bin/hermes"
+    
+    # Check if ~/.local/bin is on PATH; if not, add it to shell config
+    if ! echo "$PATH" | tr ':' '\n' | grep -q "^$HOME/.local/bin$"; then
+        SHELL_CONFIG=""
+        if [ -n "$BASH_VERSION" ]; then
+            if [ -f "$HOME/.bashrc" ]; then
+                SHELL_CONFIG="$HOME/.bashrc"
+            elif [ -f "$HOME/.bash_profile" ]; then
+                SHELL_CONFIG="$HOME/.bash_profile"
+            fi
+        elif [ -n "$ZSH_VERSION" ] || [ -f "$HOME/.zshrc" ]; then
+            SHELL_CONFIG="$HOME/.zshrc"
+        fi
        
-        # Create a wrapper script
-        cat > "$BIN_DIR/hermes" << EOF
-#!/bin/bash
-cd "$INSTALL_DIR"
-exec python -m hermes_cli.main "\$@"
-EOF
-        chmod +x "$BIN_DIR/hermes"
-    fi
-    
-    # Add to PATH in shell config
-    SHELL_CONFIG=""
-    if [ -n "$BASH_VERSION" ]; then
-        if [ -f "$HOME/.bashrc" ]; then
-            SHELL_CONFIG="$HOME/.bashrc"
-        elif [ -f "$HOME/.bash_profile" ]; then
-            SHELL_CONFIG="$HOME/.bash_profile"
+        PATH_LINE='export PATH="$HOME/.local/bin:$PATH"'
+        
+        if [ -n "$SHELL_CONFIG" ]; then
+            if ! grep -q '\.local/bin' "$SHELL_CONFIG" 2>/dev/null; then
+                echo "" >> "$SHELL_CONFIG"
+                echo "# Hermes Agent — ensure ~/.local/bin is on PATH" >> "$SHELL_CONFIG"
+                echo "$PATH_LINE" >> "$SHELL_CONFIG"
+                log_success "Added ~/.local/bin to PATH in $SHELL_CONFIG"
+            else
+                log_info "~/.local/bin already referenced in $SHELL_CONFIG"
+            fi
        fi
-    elif [ -n "$ZSH_VERSION" ] || [ -f "$HOME/.zshrc" ]; then
-        SHELL_CONFIG="$HOME/.zshrc"
+    else
+        log_info "~/.local/bin already on PATH"
    fi
    
-    PATH_LINE="export PATH=\"$BIN_DIR:\$PATH\""
+    # Export for current session so hermes works immediately
+    export PATH="$HOME/.local/bin:$PATH"
    
-    if [ -n "$SHELL_CONFIG" ]; then
-        if ! grep -q "hermes-agent" "$SHELL_CONFIG" 2>/dev/null; then
-            echo "" >> "$SHELL_CONFIG"
-            echo "# Hermes Agent" >> "$SHELL_CONFIG"
-            echo "$PATH_LINE" >> "$SHELL_CONFIG"
-            log_success "Added to $SHELL_CONFIG"
-        else
-            log_info "PATH already configured in $SHELL_CONFIG"
-        fi
-    fi
-    
-    # Also export for current session
-    export PATH="$BIN_DIR:$PATH"
-    
-    log_success "PATH configured"
+    log_success "hermes command ready"
 }

 copy_config_templates() {
@@ -553,7 +581,6 @@ copy_config_templates() {
            cp "$INSTALL_DIR/.env.example" "$HERMES_HOME/.env"
            log_success "Created ~/.hermes/.env from template"
        else
-            # Create empty .env if no example exists
            touch "$HERMES_HOME/.env"
            log_success "Created ~/.hermes/.env"
        fi
@@ -601,12 +628,14 @@ run_setup_wizard() {
    log_info "Starting setup wizard..."
    echo ""
    
-    if [ "$USE_VENV" = true ]; then
-        source "$INSTALL_DIR/venv/bin/activate"
-    fi
-    
    cd "$INSTALL_DIR"
-    python -m hermes_cli.main setup
+    
+    # Run hermes setup using the venv Python directly (no activation needed)
+    if [ "$USE_VENV" = true ]; then
+        "$INSTALL_DIR/venv/bin/python" -m hermes_cli.main setup
+    else
+        python -m hermes_cli.main setup
+    fi
 }

 print_success() {
@@ -673,6 +702,7 @@ main() {
    print_banner
    
    detect_os
+    install_uv
    check_python
    check_git
    check_node
--- a/setup-hermes.sh
+++ b/setup-hermes.sh
@@ -3,16 +3,18 @@
 # Hermes Agent Setup Script
 # ============================================================================
 # Quick setup for developers who cloned the repo manually.
+# Uses uv for fast Python provisioning and package management.
 #
 # Usage:
 #   ./setup-hermes.sh
 #
 # This script:
-# 1. Creates a virtual environment (if not exists)
-# 2. Installs dependencies
-# 3. Creates .env from template (if not exists)
-# 4. Installs the 'hermes' CLI command
-# 5. Runs the setup wizard (optional)
+# 1. Installs uv if not present
+# 2. Creates a virtual environment with Python 3.11 via uv
+# 3. Installs all dependencies (main package + submodules)
+# 4. Creates .env from template (if not exists)
+# 5. Symlinks the 'hermes' CLI command into ~/.local/bin
+# 6. Runs the setup wizard (optional)
 # ============================================================================

 set -e
@@ -21,38 +23,75 @@ set -e
 GREEN='\033[0;32m'
 YELLOW='\033[0;33m'
 CYAN='\033[0;36m'
+RED='\033[0;31m'
 NC='\033[0m'

 SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
 cd "$SCRIPT_DIR"

+PYTHON_VERSION="3.11"
+
 echo ""
 echo -e "${CYAN}🦋 Hermes Agent Setup${NC}"
 echo ""

 # ============================================================================
-# Python check
+# Install / locate uv
 # ============================================================================

-echo -e "${CYAN}→${NC} Checking Python..."
+echo -e "${CYAN}→${NC} Checking for uv..."

-PYTHON_CMD=""
-for cmd in python3.12 python3.11 python3.10 python3 python; do
-    if command -v $cmd &> /dev/null; then
-        if $cmd -c "import sys; exit(0 if sys.version_info >= (3, 10) else 1)" 2>/dev/null; then
-            PYTHON_CMD=$cmd
-            break
-        fi
-    fi
-done
-
-if [ -z "$PYTHON_CMD" ]; then
-    echo -e "${YELLOW}✗${NC} Python 3.10+ required"
-    exit 1
+UV_CMD=""
+if command -v uv &> /dev/null; then
+    UV_CMD="uv"
+elif [ -x "$HOME/.local/bin/uv" ]; then
+    UV_CMD="$HOME/.local/bin/uv"
+elif [ -x "$HOME/.cargo/bin/uv" ]; then
+    UV_CMD="$HOME/.cargo/bin/uv"
 fi

-PYTHON_VERSION=$($PYTHON_CMD -c 'import sys; print(f"{sys.version_info.major}.{sys.version_info.minor}")')
-echo -e "${GREEN}✓${NC} Python $PYTHON_VERSION found"
+if [ -n "$UV_CMD" ]; then
+    UV_VERSION=$($UV_CMD --version 2>/dev/null)
+    echo -e "${GREEN}✓${NC} uv found ($UV_VERSION)"
+else
+    echo -e "${CYAN}→${NC} Installing uv..."
+    if curl -LsSf https://astral.sh/uv/install.sh | sh 2>/dev/null; then
+        if [ -x "$HOME/.local/bin/uv" ]; then
+            UV_CMD="$HOME/.local/bin/uv"
+        elif [ -x "$HOME/.cargo/bin/uv" ]; then
+            UV_CMD="$HOME/.cargo/bin/uv"
+        fi
+        
+        if [ -n "$UV_CMD" ]; then
+            UV_VERSION=$($UV_CMD --version 2>/dev/null)
+            echo -e "${GREEN}✓${NC} uv installed ($UV_VERSION)"
+        else
+            echo -e "${RED}✗${NC} uv installed but not found. Add ~/.local/bin to PATH and retry."
+            exit 1
+        fi
+    else
+        echo -e "${RED}✗${NC} Failed to install uv. Visit https://docs.astral.sh/uv/"
+        exit 1
+    fi
+fi
+
+# ============================================================================
+# Python check (uv can provision it automatically)
+# ============================================================================
+
+echo -e "${CYAN}→${NC} Checking Python $PYTHON_VERSION..."
+
+if $UV_CMD python find "$PYTHON_VERSION" &> /dev/null; then
+    PYTHON_PATH=$($UV_CMD python find "$PYTHON_VERSION")
+    PYTHON_FOUND_VERSION=$($PYTHON_PATH --version 2>/dev/null)
+    echo -e "${GREEN}✓${NC} $PYTHON_FOUND_VERSION found"
+else
+    echo -e "${CYAN}→${NC} Python $PYTHON_VERSION not found, installing via uv..."
+    $UV_CMD python install "$PYTHON_VERSION"
+    PYTHON_PATH=$($UV_CMD python find "$PYTHON_VERSION")
+    PYTHON_FOUND_VERSION=$($PYTHON_PATH --version 2>/dev/null)
+    echo -e "${GREEN}✓${NC} $PYTHON_FOUND_VERSION installed"
+fi

 # ============================================================================
 # Virtual environment
@@ -60,15 +99,16 @@ echo -e "${GREEN}✓${NC} Python $PYTHON_VERSION found"

 echo -e "${CYAN}→${NC} Setting up virtual environment..."

-if [ ! -d "venv" ]; then
-    $PYTHON_CMD -m venv venv
-    echo -e "${GREEN}✓${NC} Created venv"
-else
-    echo -e "${GREEN}✓${NC} venv exists"
+if [ -d "venv" ]; then
+    echo -e "${CYAN}→${NC} Removing old venv..."
+    rm -rf venv
 fi

-source venv/bin/activate
-pip install --upgrade pip wheel setuptools > /dev/null
+$UV_CMD venv venv --python "$PYTHON_VERSION"
+echo -e "${GREEN}✓${NC} venv created (Python $PYTHON_VERSION)"
+
+# Tell uv to install into this venv (no activation needed for uv)
+export VIRTUAL_ENV="$SCRIPT_DIR/venv"

 # ============================================================================
 # Dependencies
@@ -76,10 +116,34 @@ pip install --upgrade pip wheel setuptools > /dev/null

 echo -e "${CYAN}→${NC} Installing dependencies..."

-pip install -e ".[all]" > /dev/null 2>&1 || pip install -e "." > /dev/null
+$UV_CMD pip install -e ".[all]" || $UV_CMD pip install -e "."

 echo -e "${GREEN}✓${NC} Dependencies installed"

+# ============================================================================
+# Submodules (terminal backend + RL training)
+# ============================================================================
+
+echo -e "${CYAN}→${NC} Installing submodules..."
+
+# mini-swe-agent (terminal tool backend)
+if [ -d "mini-swe-agent" ] && [ -f "mini-swe-agent/pyproject.toml" ]; then
+    $UV_CMD pip install -e "./mini-swe-agent" && \
+        echo -e "${GREEN}✓${NC} mini-swe-agent installed" || \
+        echo -e "${YELLOW}⚠${NC} mini-swe-agent install failed (terminal tools may not work)"
+else
+    echo -e "${YELLOW}⚠${NC} mini-swe-agent not found (run: git submodule update --init --recursive)"
+fi
+
+# tinker-atropos (RL training backend)
+if [ -d "tinker-atropos" ] && [ -f "tinker-atropos/pyproject.toml" ]; then
+    $UV_CMD pip install -e "./tinker-atropos" && \
+        echo -e "${GREEN}✓${NC} tinker-atropos installed" || \
+        echo -e "${YELLOW}⚠${NC} tinker-atropos install failed (RL tools may not work)"
+else
+    echo -e "${YELLOW}⚠${NC} tinker-atropos not found (run: git submodule update --init --recursive)"
+fi
+
 # ============================================================================
 # Optional: ripgrep (for faster file search)
 # ============================================================================
@@ -141,14 +205,17 @@ else
 fi

 # ============================================================================
-# PATH setup
+# PATH setup — symlink hermes into ~/.local/bin
 # ============================================================================

 echo -e "${CYAN}→${NC} Setting up hermes command..."

-BIN_DIR="$SCRIPT_DIR/venv/bin"
+HERMES_BIN="$SCRIPT_DIR/venv/bin/hermes"
+mkdir -p "$HOME/.local/bin"
+ln -sf "$HERMES_BIN" "$HOME/.local/bin/hermes"
+echo -e "${GREEN}✓${NC} Symlinked hermes → ~/.local/bin/hermes"

-# Add to shell config if not already there
+# Ensure ~/.local/bin is on PATH in shell config
 SHELL_CONFIG=""
 if [ -f "$HOME/.zshrc" ]; then
    SHELL_CONFIG="$HOME/.zshrc"
@@ -159,13 +226,17 @@ elif [ -f "$HOME/.bash_profile" ]; then
 fi

 if [ -n "$SHELL_CONFIG" ]; then
-    if ! grep -q "hermes-agent" "$SHELL_CONFIG" 2>/dev/null; then
-        echo "" >> "$SHELL_CONFIG"
-        echo "# Hermes Agent" >> "$SHELL_CONFIG"
-        echo "export PATH=\"$BIN_DIR:\$PATH\"" >> "$SHELL_CONFIG"
-        echo -e "${GREEN}✓${NC} Added to $SHELL_CONFIG"
+    if ! echo "$PATH" | tr ':' '\n' | grep -q "^$HOME/.local/bin$"; then
+        if ! grep -q '\.local/bin' "$SHELL_CONFIG" 2>/dev/null; then
+            echo "" >> "$SHELL_CONFIG"
+            echo "# Hermes Agent — ensure ~/.local/bin is on PATH" >> "$SHELL_CONFIG"
+            echo 'export PATH="$HOME/.local/bin:$PATH"' >> "$SHELL_CONFIG"
+            echo -e "${GREEN}✓${NC} Added ~/.local/bin to PATH in $SHELL_CONFIG"
+        else
+            echo -e "${GREEN}✓${NC} ~/.local/bin already in $SHELL_CONFIG"
+        fi
    else
-        echo -e "${GREEN}✓${NC} PATH already in $SHELL_CONFIG"
+        echo -e "${GREEN}✓${NC} ~/.local/bin already on PATH"
    fi
 fi

@@ -199,5 +270,6 @@ read -p "Would you like to run the setup wizard now? [Y/n] " -n 1 -r
 echo
 if [[ $REPLY =~ ^[Yy]$ ]] || [[ -z $REPLY ]]; then
    echo ""
-    python -m hermes_cli.main setup
+    # Run directly with venv Python (no activation needed)
+    "$SCRIPT_DIR/venv/bin/python" -m hermes_cli.main setup
 fi
--- a/tools/file_tools.py
+++ b/tools/file_tools.py
@@ -2,6 +2,7 @@
 """File Tools Module - LLM agent file manipulation tools."""

 import json
+import os
 import threading
 from typing import Optional
 from tools.file_operations import ShellFileOperations
@@ -11,23 +12,85 @@ _file_ops_cache: dict = {}


 def _get_file_ops(task_id: str = "default") -> ShellFileOperations:
-    """Get or create ShellFileOperations for a terminal environment."""
-    from tools.terminal_tool import _active_environments, _env_lock, _LocalEnvironment
+    """Get or create ShellFileOperations for a terminal environment.
    
+    Respects the TERMINAL_ENV setting -- if the task_id doesn't have an
+    environment yet, creates one using the configured backend (local, docker,
+    modal, etc.) rather than always defaulting to local.
+    """
+    from tools.terminal_tool import (
+        _active_environments, _env_lock, _create_environment,
+        _get_env_config, _last_activity, _start_cleanup_thread,
+        _check_disk_usage_warning,
+    )
+    import time
+    
+    # Fast path: check cache without heavy locks
    with _file_ops_lock:
        if task_id in _file_ops_cache:
            return _file_ops_cache[task_id]
+    
+    # Check if we need to create a new environment
+    needs_creation = False
+    with _env_lock:
+        if task_id not in _active_environments:
+            needs_creation = True
+    
+    # Create environment OUTSIDE locks so we don't block other rollouts
+    # during slow Modal/Docker startup (~10s)
+    if needs_creation:
+        config = _get_env_config()
+        env_type = config["env_type"]
        
+        if env_type == "docker":
+            image = config["docker_image"]
+        elif env_type == "singularity":
+            image = config["singularity_image"]
+        elif env_type == "modal":
+            image = config["modal_image"]
+        else:
+            image = ""
+        
+        cwd = config["cwd"]
+        _check_disk_usage_warning()
+        if not os.getenv("HERMES_QUIET"):
+            print(f"[FileTools] Creating new {env_type} environment for task {task_id[:8]}...", flush=True)
+        
+        new_env = _create_environment(
+            env_type=env_type,
+            image=image,
+            cwd=cwd,
+            timeout=config["timeout"],
+        )
+        
+        # Store under lock (brief) -- do NOT call _start_cleanup_thread inside
+        # the lock because it also acquires _env_lock (non-reentrant = deadlock)
+        created = False
        with _env_lock:
            if task_id not in _active_environments:
-                import os
-                env = _LocalEnvironment(cwd=os.getcwd(), timeout=60)
-                _active_environments[task_id] = env
-            terminal_env = _active_environments[task_id]
+                _active_environments[task_id] = new_env
+                created = True
+            else:
+                try:
+                    if hasattr(new_env, 'stop'):
+                        new_env.stop()
+                except Exception:
+                    pass
        
-        file_ops = ShellFileOperations(terminal_env)
+        if created:
+            _start_cleanup_thread()
+            if not os.getenv("HERMES_QUIET"):
+                print(f"[FileTools] {env_type} environment ready for task {task_id[:8]}", flush=True)
+    
+    # Now get the environment and build file_ops
+    with _env_lock:
+        _last_activity[task_id] = time.time()
+        terminal_env = _active_environments[task_id]
+    
+    file_ops = ShellFileOperations(terminal_env)
+    with _file_ops_lock:
        _file_ops_cache[task_id] = file_ops
-        return file_ops
+    return file_ops


 def clear_file_ops_cache(task_id: str = None):
@@ -56,6 +119,7 @@ def write_file_tool(path: str, content: str, task_id: str = "default") -> str:
        result = file_ops.write_file(path, content)
        return json.dumps(result.to_dict(), ensure_ascii=False)
    except Exception as e:
+        print(f"[FileTools] write_file error: {type(e).__name__}: {e}", flush=True)  
        return json.dumps({"error": str(e)}, ensure_ascii=False)


--- a/tools/rl_training_tool.py
+++ b/tools/rl_training_tool.py
@@ -1300,10 +1300,26 @@ async def rl_test_inference(
 # Requirements Check
 # ============================================================================

+def check_rl_python_version() -> bool:
+    """
+    Check if Python version meets the minimum for RL tools.
+    
+    tinker-atropos depends on the 'tinker' package which requires Python >= 3.11.
+    """
+    return sys.version_info >= (3, 11)
+
+
 def check_rl_api_keys() -> bool:
    """
-    Check if required API keys are available.
+    Check if required API keys and Python version are available.
+    
+    RL training requires:
+    - Python >= 3.11 (tinker package requirement)
+    - TINKER_API_KEY for the Tinker training API
+    - WANDB_API_KEY for Weights & Biases metrics
    """
+    if not check_rl_python_version():
+        return False
    tinker_key = os.getenv("TINKER_API_KEY")
    wandb_key = os.getenv("WANDB_API_KEY")
    return bool(tinker_key) and bool(wandb_key)
@@ -1311,9 +1327,11 @@ def check_rl_api_keys() -> bool:

 def get_missing_keys() -> List[str]:
    """
-    Get list of missing required API keys.
+    Get list of missing requirements for RL tools (API keys and Python version).
    """
    missing = []
+    if not check_rl_python_version():
+        missing.append(f"Python >= 3.11 (current: {sys.version_info.major}.{sys.version_info.minor})")
    if not os.getenv("TINKER_API_KEY"):
        missing.append("TINKER_API_KEY")
    if not os.getenv("WANDB_API_KEY"):
--- a/tools/terminal_tool.py
+++ b/tools/terminal_tool.py
@@ -1347,40 +1347,61 @@ def terminal_tool(
        _start_cleanup_thread()

        # Get or create environment
+        # Check under lock, but create OUTSIDE lock so we don't block
+        # other concurrent rollouts during slow Modal/Docker startup
+        needs_creation = False
        with _env_lock:
            if effective_task_id not in _active_environments:
-                # Check disk usage before creating new environment
-                _check_disk_usage_warning()
-                
-                try:
-                    # Build SSH config if using SSH environment
-                    ssh_config = None
-                    if env_type == "ssh":
-                        ssh_config = {
-                            "host": config.get("ssh_host", ""),
-                            "user": config.get("ssh_user", ""),
-                            "port": config.get("ssh_port", 22),
-                            "key": config.get("ssh_key", ""),
-                        }
-                    
-                    _active_environments[effective_task_id] = _create_environment(
-                        env_type=env_type,
-                        image=image,
-                        cwd=cwd,
-                        timeout=effective_timeout,
-                        ssh_config=ssh_config
-                    )
-                except ImportError as e:
-                    return json.dumps({
-                        "output": "",
-                        "exit_code": -1,
-                        "error": f"Terminal tool disabled: mini-swe-agent not available ({e})",
-                        "status": "disabled"
-                    }, ensure_ascii=False)
+                needs_creation = True
+            else:
+                _last_activity[effective_task_id] = time.time()
+                env = _active_environments[effective_task_id]

-            # Update last activity time
-            _last_activity[effective_task_id] = time.time()
-            env = _active_environments[effective_task_id]
+        if needs_creation:
+            _check_disk_usage_warning()
+            if not os.getenv("HERMES_QUIET"):
+                print(f"[Terminal] Creating new {env_type} environment for task {effective_task_id[:8]}...", flush=True)
+            try:
+                ssh_config = None
+                if env_type == "ssh":
+                    ssh_config = {
+                        "host": config.get("ssh_host", ""),
+                        "user": config.get("ssh_user", ""),
+                        "port": config.get("ssh_port", 22),
+                        "key": config.get("ssh_key", ""),
+                    }
+
+                new_env = _create_environment(
+                    env_type=env_type,
+                    image=image,
+                    cwd=cwd,
+                    timeout=effective_timeout,
+                    ssh_config=ssh_config
+                )
+            except ImportError as e:
+                return json.dumps({
+                    "output": "",
+                    "exit_code": -1,
+                    "error": f"Terminal tool disabled: mini-swe-agent not available ({e})",
+                    "status": "disabled"
+                }, ensure_ascii=False)
+
+            # Store under lock (brief)
+            with _env_lock:
+                if effective_task_id not in _active_environments:
+                    _active_environments[effective_task_id] = new_env
+                else:
+                    # Another thread created it while we were building -- clean up ours
+                    try:
+                        if hasattr(new_env, 'stop'):
+                            new_env.stop()
+                    except Exception:
+                        pass
+
+                _last_activity[effective_task_id] = time.time()
+                env = _active_environments[effective_task_id]
+                if not os.getenv("HERMES_QUIET"):
+                    print(f"[Terminal] {env_type} environment ready for task {effective_task_id[:8]}", flush=True)

        # Check for dangerous commands (only for local/ssh in interactive modes)
        # Skip check if force=True (user has confirmed they want to run it)
@@ -1435,13 +1456,20 @@ def terminal_tool(
                        retry_count += 1
                        wait_time = 2 ** retry_count
                        print(f"⚠️  Terminal: execution error, retrying in {wait_time}s (attempt {retry_count}/{max_retries})")
+                        print(f"   Command: {command[:200]}")
+                        print(f"   Error: {type(e).__name__}: {e}")
+                        print(f"   Task ID: {effective_task_id}, Backend: {env_type}")
                        time.sleep(wait_time)
                        continue
                    
+                    print(f"❌ Terminal: execution failed after {max_retries} retries")
+                    print(f"   Command: {command[:200]}")
+                    print(f"   Error: {type(e).__name__}: {e}")
+                    print(f"   Task ID: {effective_task_id}, Backend: {env_type}")
                    return json.dumps({
                        "output": "",
                        "exit_code": -1,
-                        "error": f"Command execution failed: {str(e)}"
+                        "error": f"Command execution failed: {type(e).__name__}: {str(e)}"
                    }, ensure_ascii=False)
                
                # Got a result
Author	SHA1	Message	Date
teknium	d999d9876d	Enhance async tool execution and error handling in Hermes agent for Atropos integration - Updated `.gitignore` to exclude `testlogs` directory. - Refactored `handle_web_function_call` in `model_tools.py` to support running async functions in existing event loops, improving compatibility with Atropos. - Introduced a thread pool executor in `agent_loop.py` for running synchronous tool calls that internally use `asyncio.run()`, preventing deadlocks. - Added `ToolError` class to track tool execution errors, enhancing error reporting during agent loops. - Updated `wandb_log` method in `hermes_base_env.py` to log tool error statistics for better monitoring. - Implemented patches in `patches.py` to ensure async-safe operation of tools within Atropos's event loop. - Enhanced `ToolContext` and `terminal_tool.py` to utilize the new async handling, improving overall tool execution reliability.	2026-02-08 05:00:47 +00:00
teknium	a8809bbd3e	Transition installation to uv for py version and speed to be easier to streamline - Integrated `uv` as a fast Python package manager for automatic Python provisioning and dependency management. - Updated installation scripts (`setup-hermes.sh`, `install.sh`, `install.ps1`) to utilize `uv` for installing Python and packages, streamlining the setup process. - Revised `README.md` to reflect changes in installation steps, including symlinking `hermes` for global access and clarifying Python version requirements. - Adjusted commands in `doctor.py` and other scripts to recommend `uv` for package installations, ensuring consistency across the project.	2026-02-07 23:54:53 +00:00
teknium	a478e44585	Increase max_token_length in TerminalTestEnv to 16000 for enhanced processing capacity	2026-02-07 21:11:07 +00:00
teknium	c0494b3558	Update pyproject.toml to refine dependency management - Reorganized the 'all' dependencies to include specific optional groups for better modularity. - Added support for 'hermes-agent' with distinct categories: modal, messaging, cron, cli, and dev.	2026-02-07 21:11:01 +00:00
teknium	07b615e96e	Add support for Atropos Agentic RL environments (requires branch tool_call_support in Atropos atm) - Added new environments for reinforcement learning, including `HermesSweEnv` for software engineering tasks and `TerminalTestEnv` for inline testing. - Introduced `ToolContext` for unrestricted access to tools during reward computation. - Updated `.gitignore` to exclude `wandb/` directory. - Enhanced `README.md` with detailed architecture and usage instructions for Atropos environments. - Added configuration files for SWE and terminal test environments to streamline setup. - Removed unnecessary compiled Python files from `__pycache__`.	2026-02-07 09:17:16 +00:00
teknium	ac79725923	Update dependencies and enhance installation scripts - Added `prompt_toolkit` as a direct dependency for interactive CLI support. - Updated `modal` optional dependency to require `swe-rex[modal]>=1.4.0` for improved cloud execution capabilities. - Enhanced `messaging` optional dependencies to include `aiohttp>=3.9.0` for WhatsApp bridge communication. - Refined installation scripts to check for Python version requirements, emphasizing the need for Python 3.11+ for RL training tools. - Improved setup scripts to ensure proper installation of submodules and dependencies, enhancing user experience during setup.	2026-02-07 00:05:04 +00:00
Teknium	8dd38318fc	Merge pull request #15 from NousResearch/rl-capabilities Rl capabilities && File Operator Tools	2026-02-05 03:50:42 -08:00
Teknium	8380895ae3	Update README.md	2026-02-04 00:35:45 -08:00