Eval splits for holdout sets

Changing return type to be ScoredDataGroup to account for multiple trajectories
Added task sppecific metris and evals
2026-03-03 14:42:45 -05:00 · 2026-03-02 11:35:06 -08:00 · 2026-02-27 11:20:18 -08:00 · 2026-02-26 10:41:24 -08:00 · 2026-02-24 19:23:05 -08:00 · 2026-02-24 19:19:39 -08:00
112 changed files with 32532 additions and 1609 deletions
--- a/.cursorrules
+++ b/.cursorrules
@@ -1,201 +0,0 @@
-Hermes-Agent is an agent harness for LLMs with an interactive CLI.
-
-## Development Environment
-
-**IMPORTANT**: Always use the virtual environment if it exists:
-```bash
-source venv/bin/activate  # Before running any Python commands
-```
-
-## Project Structure
-
- `hermes` - CLI launcher script (run with `./hermes`)
- `cli.py` - Interactive CLI with Rich UI, prompt_toolkit, animated spinners
- `cli-config.yaml` - CLI configuration (model, terminal, toolsets, personalities)
- `tools/` - Individual tool implementations (web, terminal, browser, vision, etc.)
- `tools/__init__.py` - Exports all tools for importing
- `model_tools.py` - Consolidates tool schemas and handlers for the agent
- `toolsets.py` - Groups tools into logical toolsets (web, terminal, browser, etc.)
- `toolset_distributions.py` - Probability-based tool selection for data generation
- `run_agent.py` - Primary agent runner with AIAgent class and KawaiiSpinner
- `batch_runner.py` - Parallel batch processing with checkpointing
- `tests/` - Test scripts
-
-## File Dependency Chain
-
-```
-tools/*.py → tools/__init__.py → model_tools.py → toolsets.py → toolset_distributions.py
-                                       ↑
-run_agent.py ──────────────────────────┘
-cli.py → run_agent.py (uses AIAgent with quiet_mode=True)
-batch_runner.py → run_agent.py + toolset_distributions.py
-```
-
-Always ensure consistency between tools, model_tools.py, and toolsets.py when changing any of them.
-
-## CLI Architecture (cli.py)
-
-The interactive CLI uses:
- **Rich** - For the welcome banner and styled panels
- **prompt_toolkit** - For fixed input area with history and `patch_stdout`
- **KawaiiSpinner** (in run_agent.py) - Animated feedback during API calls and tool execution
-
-Key components:
- `HermesCLI` class - Main CLI controller with commands and conversation loop
- `load_cli_config()` - Loads `cli-config.yaml`, sets environment variables for terminal
- `build_welcome_banner()` - Displays ASCII art logo, tools, and skills summary
- `/commands` - Process user commands like `/help`, `/clear`, `/personality`, etc.
-
-CLI uses `quiet_mode=True` when creating AIAgent to suppress verbose logging and enable kawaii-style feedback instead.
-
-### Adding CLI Commands
-
-1. Add to `COMMANDS` dict with description
-2. Add handler in `process_command()` method
-3. For persistent settings, use `save_config_value()` to update `cli-config.yaml`
-
-## Adding a New Tool
-
-Follow this strict order to maintain consistency:
-
-1. Create `tools/your_tool.py` with:
-   - Handler function (sync or async) returning a JSON string via `json.dumps()`
-   - `check_*_requirements()` function to verify dependencies (e.g., API keys)
-   - Schema definition following OpenAI function-calling format
-
-2. Export in `tools/__init__.py`:
-   - Import the handler and check function
-   - Add to `__all__` list
-
-3. Register in `model_tools.py`:
-   - Create `get_*_tool_definitions()` function or add to existing
-   - Add routing in `handle_function_call()` dispatcher
-   - Update `get_all_tool_names()` with the tool name
-   - Update `get_toolset_for_tool()` mapping
-   - Update `get_available_toolsets()` and `check_toolset_requirements()`
-
-4. Add to toolset in `toolsets.py`:
-   - Add to existing toolset or create new one in TOOLSETS dict
-
-5. Optionally add to `toolset_distributions.py` for batch processing
-
-## Tool Implementation Pattern
-
-```python
-# tools/example_tool.py
-import json
-import os
-
-def check_example_requirements() -> bool:
-    """Check if required API keys/dependencies are available."""
-    return bool(os.getenv("EXAMPLE_API_KEY"))
-
-def example_tool(param: str, task_id: str = None) -> str:
-    """Execute the tool and return JSON string result."""
-    try:
-        result = {"success": True, "data": "..."}
-        return json.dumps(result, ensure_ascii=False)
-    except Exception as e:
-        return json.dumps({"error": str(e)}, ensure_ascii=False)
-```
-
-All tool handlers MUST return a JSON string. Never return raw dicts.
-
-## Stateful Tools
-
-Tools that maintain state (terminal, browser) require:
- `task_id` parameter for session isolation between concurrent tasks
- `cleanup_*()` function to release resources
- Cleanup is called automatically in run_agent.py after conversation completes
-
-## Environment Variables
-
-API keys are loaded from `.env` file in repo root:
- `OPENROUTER_API_KEY` - Main LLM API access (primary provider)
- `FIRECRAWL_API_KEY` - Web search/extract tools
- `BROWSERBASE_API_KEY` / `BROWSERBASE_PROJECT_ID` - Browser automation
- `FAL_KEY` - Image generation (FLUX model)
- `NOUS_API_KEY` - Vision and Mixture-of-Agents tools
-
-Terminal tool configuration (can also be set in `cli-config.yaml`):
- `TERMINAL_ENV` - Backend: local, docker, singularity, modal, or ssh
- `TERMINAL_CWD` - Working directory
- `TERMINAL_SSH_HOST`, `TERMINAL_SSH_USER`, `TERMINAL_SSH_KEY` - For SSH backend
-
-## Agent Loop (run_agent.py)
-
-The AIAgent class handles:
- Processing enabled toolsets to provide to the model
- Piping prompts to the agent
- Looping LLM calls when tools are invoked, until natural language response
- Returning the final response
-
-Uses OpenAI-compatible API (primarily OpenRouter) with the OpenAI Python SDK.
-
-## Reasoning Model Support
-
-For models that support chain-of-thought reasoning:
- Extract `reasoning_content` from API responses
- Store in `assistant_msg["reasoning"]` for trajectory export
- Pass back via `reasoning_content` field on subsequent turns
-
-## Trajectory Format
-
-Conversations are saved in ShareGPT format for training:
-```json
-{"from": "system", "value": "System prompt with <tools>...</tools>"}
-{"from": "human", "value": "User message"}
-{"from": "gpt", "value": "<think>reasoning</think>\n<tool_call>{...}</tool_call>"}
-{"from": "tool", "value": "<tool_response>{...}</tool_response>"}
-{"from": "gpt", "value": "Final response"}
-```
-
-Tool calls use `<tool_call>` XML tags, responses use `<tool_response>` tags, reasoning uses `<think>` tags.
-
-## Batch Processing (batch_runner.py)
-
-For processing multiple prompts:
- Parallel execution with multiprocessing
- Content-based resume for fault tolerance (matches on prompt text, not indices)
- Toolset distributions control probabilistic tool availability per prompt
- Output: `data/<run_name>/trajectories.jsonl` (combined) + individual batch files
-
-## Logging
-
-Trajectories restructure tools as a system prompt for storage in a format suitable for later training use.
-
-## Skills System
-
-Skills are on-demand knowledge documents the agent can load. Located in `skills/` directory:
-
-```
-skills/
-├── mlops/                    # Category folder
-│   ├── axolotl/             # Skill folder
-│   │   ├── SKILL.md         # Main instructions (required)
-│   │   ├── references/      # Additional docs, API specs
-│   │   └── templates/       # Output formats, configs
-│   └── vllm/
-│       └── SKILL.md
-└── example-skill/
-    └── SKILL.md
-```
-
-**Progressive disclosure** (token-efficient):
-1. `skills_categories()` - List category names (~50 tokens)
-2. `skills_list(category)` - Name + description per skill (~3k tokens)
-3. `skill_view(name)` - Full content + tags + linked files
-
-SKILL.md files use YAML frontmatter:
-```yaml
---
-name: skill-name
-description: Brief description for listing
-tags: [tag1, tag2]
-related_skills: [other-skill]
-version: 1.0.0
---
-# Skill Content...
-```
-
-Tool files: `tools/skills_tool.py` → `model_tools.py` → `toolsets.py`
--- a/.env.example
+++ b/.env.example
@@ -10,8 +10,8 @@
 OPENROUTER_API_KEY=

 # Default model to use (OpenRouter format: provider/model)
-# Examples: anthropic/claude-sonnet-4, openai/gpt-4o, google/gemini-2.0-flash, zhipuai/glm-4-plus
-LLM_MODEL=anthropic/claude-sonnet-4
+# Examples: anthropic/claude-opus-4.6, openai/gpt-4o, google/gemini-2.0-flash, zhipuai/glm-4-plus
+LLM_MODEL=anthropic/claude-opus-4.6

 # =============================================================================
 # TOOL API KEYS
@@ -30,58 +30,77 @@ NOUS_API_KEY=
 FAL_KEY=

 # =============================================================================
-# TERMINAL TOOL CONFIGURATION
+# TERMINAL TOOL CONFIGURATION (mini-swe-agent backend)
 # =============================================================================
-# Backend type: "local", "singularity", "docker", or "modal"
-# Uncomment ONE configuration block below based on your preferred backend.
+# Backend type: "local", "singularity", "docker", "modal", or "ssh"
+# - local: Runs directly on your machine (fastest, no isolation)
+# - ssh: Runs on remote server via SSH (great for sandboxing - agent can't touch its own code)
+# - singularity: Runs in Apptainer/Singularity containers (HPC clusters, no root needed)
+# - docker: Runs in Docker containers (isolated, requires Docker + docker group)
+# - modal: Runs in Modal cloud sandboxes (scalable, requires Modal account)
+TERMINAL_ENV=local

-# -----------------------------------------------------------------------------
-# OPTION 1: Singularity/Apptainer (RECOMMENDED for HPC clusters)
-# - No root required, common on shared systems
-# - Auto-builds and caches SIF images from docker:// URLs
-# - Uses /scratch if available, otherwise /tmp
-# -----------------------------------------------------------------------------
-TERMINAL_ENV=singularity
+
+# Container images (for singularity/docker/modal backends)
+TERMINAL_DOCKER_IMAGE=nikolaik/python-nodejs:python3.11-nodejs20
 TERMINAL_SINGULARITY_IMAGE=docker://nikolaik/python-nodejs:python3.11-nodejs20
-TERMINAL_CWD=/workspace
+TERMINAL_MODAL_IMAGE=nikolaik/python-nodejs:python3.11-nodejs20
+
+
+# Working directory for terminal commands
+# For local backend: "." means current directory (resolved automatically)
+# For remote backends (ssh/docker/modal/singularity): use an absolute path
+#   INSIDE the target environment, or leave unset for the backend's default
+#   (/root for modal, / for docker, ~ for ssh). Do NOT use a host-local path.
+# Usually managed by config.yaml (terminal.cwd) — uncomment to override
+# TERMINAL_CWD=.
+
+# Default command timeout in seconds
 TERMINAL_TIMEOUT=60
-# Optional: Override scratch directory (auto-detects /scratch or /tmp)
-# TERMINAL_SCRATCH_DIR=/scratch/myuser/hermes

-# -----------------------------------------------------------------------------
-# OPTION 2: Local execution (FASTEST, but no isolation)
-# - Runs directly on your machine
-# - No containers, no setup required
-# - WARNING: Commands run with your user permissions
-# -----------------------------------------------------------------------------
-# TERMINAL_ENV=local
-# TERMINAL_CWD=/tmp
-# TERMINAL_TIMEOUT=60
-
-# -----------------------------------------------------------------------------
-# OPTION 3: Docker (good isolation, requires Docker)
-# - Requires Docker installed and user in 'docker' group
-# - Each task gets an isolated container
-# -----------------------------------------------------------------------------
-# TERMINAL_ENV=docker
-# TERMINAL_DOCKER_IMAGE=nikolaik/python-nodejs:python3.11-nodejs20
-# TERMINAL_CWD=/workspace
-# TERMINAL_TIMEOUT=60
-
-# -----------------------------------------------------------------------------
-# OPTION 4: Modal (cloud execution, scalable)
-# - Requires Modal account: pip install modal && modal setup
-# - Runs in Modal's cloud sandboxes
-# - Good for scaling to many parallel workers
-# -----------------------------------------------------------------------------
-# TERMINAL_ENV=modal
-# TERMINAL_MODAL_IMAGE=nikolaik/python-nodejs:python3.11-nodejs20
-# TERMINAL_CWD=/workspace
-# TERMINAL_TIMEOUT=60
-
-# Common settings for all backends
+# Cleanup inactive environments after this many seconds
 TERMINAL_LIFETIME_SECONDS=300
-TERMINAL_DISK_WARNING_GB=500
+
+# =============================================================================
+# SSH REMOTE EXECUTION (for TERMINAL_ENV=ssh)
+# =============================================================================
+# Run terminal commands on a remote server via SSH.
+# Agent code stays on your machine, commands execute remotely.
+#
+# SECURITY BENEFITS:
+# - Agent cannot read your .env file (API keys protected)
+# - Agent cannot modify its own code
+# - Remote server acts as isolated sandbox
+# - Can safely configure passwordless sudo on remote
+#
+# TERMINAL_SSH_HOST=192.168.1.100
+# TERMINAL_SSH_USER=agent
+# TERMINAL_SSH_PORT=22
+# TERMINAL_SSH_KEY=~/.ssh/id_rsa
+
+# =============================================================================
+# SUDO SUPPORT (works with ALL terminal backends)
+# =============================================================================
+# If set, enables sudo commands by piping password via `sudo -S`.
+# Works with: local, docker, singularity, modal, and ssh backends.
+# 
+# SECURITY WARNING: Password stored in plaintext. Only use on trusted machines.
+# 
+# ALTERNATIVES:
+# - For SSH backend: Configure passwordless sudo on the remote server
+# - For containers: Run as root inside the container (no sudo needed)
+# - For local: Configure /etc/sudoers for specific commands
+# - For CLI: Leave unset - you'll be prompted interactively with 45s timeout
+#
+# SUDO_PASSWORD=your_password_here
+
+# =============================================================================
+# MODAL CLOUD BACKEND (Optional - for TERMINAL_ENV=modal)
+# =============================================================================
+# Modal uses CLI authentication, not environment variables.
+# Run: pip install modal && modal setup
+# This will authenticate via browser and store credentials locally.
+# No API key needed in .env - Modal handles auth automatically.

 # =============================================================================
 # BROWSER TOOL CONFIGURATION (agent-browser + Browserbase)
@@ -101,25 +120,70 @@ BROWSERBASE_API_KEY=
 BROWSERBASE_PROJECT_ID=

 # Enable residential proxies for better CAPTCHA solving (default: true)
+# Routes traffic through residential IPs, significantly improves success rate
 BROWSERBASE_PROXIES=true

 # Enable advanced stealth mode (default: false, requires Scale Plan)
+# Uses custom Chromium build to avoid bot detection altogether
 BROWSERBASE_ADVANCED_STEALTH=false

 # Browser session timeout in seconds (default: 300)
+# Sessions are cleaned up after this duration of inactivity
 BROWSER_SESSION_TIMEOUT=300

+# Browser inactivity timeout - auto-cleanup inactive sessions (default: 120 = 2 min)
+# Browser sessions are automatically closed after this period of no activity
+BROWSER_INACTIVITY_TIMEOUT=120
+
 # =============================================================================
-# LEGACY/OPTIONAL
+# SESSION LOGGING
+# =============================================================================
+# Session trajectories are automatically saved to logs/ directory
+# Format: logs/session_YYYYMMDD_HHMMSS_UUID.json
+# Contains full conversation history in trajectory format for debugging/replay
+
+# =============================================================================
+# VOICE TRANSCRIPTION & OPENAI TTS
+# =============================================================================
+# Required for voice message transcription (Whisper) and OpenAI TTS voices.
+# Uses OpenAI's API directly (not via OpenRouter).
+# Named HERMES_OPENAI_API_KEY to avoid interference with OpenRouter.
+# Get at: https://platform.openai.com/api-keys
+HERMES_OPENAI_API_KEY=
+
+# =============================================================================
+# SLACK INTEGRATION
+# =============================================================================
+# Slack Bot Token - From Slack App settings (OAuth & Permissions)
+# Get at: https://api.slack.com/apps
+# SLACK_BOT_TOKEN=xoxb-...
+
+# Slack App Token - For Socket Mode (App-Level Tokens in Slack App settings)
+# SLACK_APP_TOKEN=xapp-...
+
+# Slack allowed users (comma-separated Slack user IDs)
+# SLACK_ALLOWED_USERS=
+
+# =============================================================================
+# RESPONSE PACING
+# =============================================================================
+# Human-like delays between message chunks on messaging platforms.
+# Makes the bot feel less robotic.
+# HERMES_HUMAN_DELAY_MODE=off     # off | natural | custom
+# HERMES_HUMAN_DELAY_MIN_MS=800   # Min delay in ms (custom mode)
+# HERMES_HUMAN_DELAY_MAX_MS=2500  # Max delay in ms (custom mode)
+
+# =============================================================================
+# LEGACY/OPTIONAL API KEYS
 # =============================================================================

-# Morph API Key - For legacy Hecate terminal backend
+# Morph API Key - For legacy Hecate terminal backend (terminal-hecate tool)
 # Get at: https://morph.so/
-# MORPH_API_KEY=
+MORPH_API_KEY=

 # Hecate VM Settings (only if using terminal-hecate tool)
-# HECATE_VM_LIFETIME_SECONDS=300
-# HECATE_DEFAULT_SNAPSHOT_ID=snapshot_p5294qxt
+HECATE_VM_LIFETIME_SECONDS=300
+HECATE_DEFAULT_SNAPSHOT_ID=snapshot_p5294qxt

 # =============================================================================
 # DEBUG OPTIONS
@@ -128,3 +192,31 @@ WEB_TOOLS_DEBUG=false
 VISION_TOOLS_DEBUG=false
 MOA_TOOLS_DEBUG=false
 IMAGE_TOOLS_DEBUG=false
+
+# =============================================================================
+# CONTEXT COMPRESSION (Auto-shrinks long conversations)
+# =============================================================================
+# When conversation approaches model's context limit, middle turns are
+# automatically summarized to free up space.
+#
+# CONTEXT_COMPRESSION_ENABLED=true        # Enable auto-compression (default: true)
+# CONTEXT_COMPRESSION_THRESHOLD=0.85      # Compress at 85% of context limit
+# CONTEXT_COMPRESSION_MODEL=google/gemini-2.0-flash-001  # Fast model for summaries
+
+# =============================================================================
+# RL TRAINING (Tinker + Atropos)
+# =============================================================================
+# Run reinforcement learning training on language models using the Tinker API.
+# Requires the rl-server to be running (from tinker-atropos package).
+
+# Tinker API Key - RL training service
+# Get at: https://tinker-console.thinkingmachines.ai/keys
+TINKER_API_KEY=
+
+# Weights & Biases API Key - Experiment tracking and metrics
+# Get at: https://wandb.ai/authorize
+WANDB_API_KEY=
+
+# RL API Server URL (default: http://localhost:8080)
+# Change if running the rl-server on a different host/port
+# RL_API_URL=http://localhost:8080
--- a/.gitignore
+++ b/.gitignore
@@ -33,11 +33,16 @@ run_datagen_megascience_glm4-6.sh
 data/*
 node_modules/
 browser-use/
-agent-browser/
-# Private keys
-*.ppk
-*.pem
-privvy*
-
-# CLI config (may contain sensitive SSH paths)
-cli-config.yaml
+agent-browser/
+# Private keys
+*.ppk
+*.pem
+privvy*
+images/
+__pycache__/
+hermes_agent.egg-info/
+wandb/
+testlogs
+
+# CLI config (may contain sensitive SSH paths)
+cli-config.yaml
--- a/.gitmodules
+++ b/.gitmodules
@@ -1,3 +1,6 @@
 [submodule "mini-swe-agent"]
 	path = mini-swe-agent
 	url = https://github.com/SWE-agent/mini-swe-agent
+[submodule "tinker-atropos"]
+	path = tinker-atropos
+	url = https://github.com/nousresearch/tinker-atropos
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -0,0 +1,609 @@
+# Hermes Agent - Development Guide
+
+Instructions for AI coding assistants (GitHub Copilot, Cursor, etc.) and human developers.
+
+Hermes-Agent is an AI agent harness with tool-calling capabilities, interactive CLI, messaging integrations, and scheduled tasks.
+
+## Development Environment
+
+**IMPORTANT**: Always use the virtual environment if it exists:
+```bash
+source venv/bin/activate  # Before running any Python commands
+```
+
+## Project Structure
+
+```
+hermes-agent/
+├── hermes_cli/           # Unified CLI commands
+│   ├── main.py           # Entry point, command dispatcher
+│   ├── setup.py          # Interactive setup wizard
+│   ├── config.py         # Config management & migration
+│   ├── status.py         # Status display
+│   ├── doctor.py         # Diagnostics
+│   ├── gateway.py        # Gateway management
+│   ├── uninstall.py      # Uninstaller
+│   └── cron.py           # Cron job management
+├── tools/                # Tool implementations
+│   ├── process_registry.py     # Background process management (spawn, poll, wait, kill)
+│   ├── transcription_tools.py  # Speech-to-text (Whisper API)
+├── gateway/              # Messaging platform adapters
+│   ├── pairing.py        # DM pairing code system
+│   ├── hooks.py          # Event hook system
+│   ├── sticker_cache.py  # Telegram sticker vision cache
+│   ├── platforms/
+│   │   └── slack.py          # Slack adapter (slack-bolt)
+├── cron/                 # Scheduler implementation
+├── skills/               # Knowledge documents
+├── cli.py                # Interactive CLI (Rich UI)
+├── run_agent.py          # Agent runner with AIAgent class
+├── model_tools.py        # Tool schemas and handlers
+├── toolsets.py           # Tool groupings
+├── toolset_distributions.py  # Probability-based tool selection
+└── batch_runner.py       # Parallel batch processing
+```
+
+**User Configuration** (stored in `~/.hermes/`):
+- `~/.hermes/config.yaml` - Settings (model, terminal, toolsets, etc.)
+- `~/.hermes/.env` - API keys and secrets
+- `~/.hermes/pairing/` - DM pairing data
+- `~/.hermes/hooks/` - Custom event hooks
+- `~/.hermes/image_cache/` - Cached user images
+- `~/.hermes/audio_cache/` - Cached user voice messages
+- `~/.hermes/sticker_cache.json` - Telegram sticker descriptions
+
+## File Dependency Chain
+
+```
+tools/*.py → tools/__init__.py → model_tools.py → toolsets.py → toolset_distributions.py
+                                       ↑
+run_agent.py ──────────────────────────┘
+cli.py → run_agent.py (uses AIAgent with quiet_mode=True)
+batch_runner.py → run_agent.py + toolset_distributions.py
+```
+
+Always ensure consistency between tools, model_tools.py, and toolsets.py when changing any of them.
+
+---
+
+## AIAgent Class
+
+The main agent is implemented in `run_agent.py`:
+
+```python
+class AIAgent:
+    def __init__(
+        self,
+        model: str = "anthropic/claude-sonnet-4",
+        api_key: str = None,
+        base_url: str = "https://openrouter.ai/api/v1",
+        max_iterations: int = 60,        # Max tool-calling loops
+        enabled_toolsets: list = None,
+        disabled_toolsets: list = None,
+        verbose_logging: bool = False,
+        quiet_mode: bool = False,         # Suppress progress output
+        tool_progress_callback: callable = None,  # Called on each tool use
+    ):
+        # Initialize OpenAI client, load tools based on toolsets
+        ...
+    
+    def chat(self, user_message: str, task_id: str = None) -> str:
+        # Main entry point - runs the agent loop
+        ...
+```
+
+### Agent Loop
+
+The core loop in `_run_agent_loop()`:
+
+```
+1. Add user message to conversation
+2. Call LLM with tools
+3. If LLM returns tool calls:
+   - Execute each tool
+   - Add tool results to conversation
+   - Go to step 2
+4. If LLM returns text response:
+   - Return response to user
+```
+
+```python
+while turns < max_turns:
+    response = client.chat.completions.create(
+        model=model,
+        messages=messages,
+        tools=tool_schemas,
+    )
+    
+    if response.tool_calls:
+        for tool_call in response.tool_calls:
+            result = await execute_tool(tool_call)
+            messages.append(tool_result_message(result))
+        turns += 1
+    else:
+        return response.content
+```
+
+### Conversation Management
+
+Messages are stored as a list of dicts following OpenAI format:
+
+```python
+messages = [
+    {"role": "system", "content": "You are a helpful assistant..."},
+    {"role": "user", "content": "Search for Python tutorials"},
+    {"role": "assistant", "content": None, "tool_calls": [...]},
+    {"role": "tool", "tool_call_id": "...", "content": "..."},
+    {"role": "assistant", "content": "Here's what I found..."},
+]
+```
+
+### Reasoning Model Support
+
+For models that support chain-of-thought reasoning:
+- Extract `reasoning_content` from API responses
+- Store in `assistant_msg["reasoning"]` for trajectory export
+- Pass back via `reasoning_content` field on subsequent turns
+
+---
+
+## CLI Architecture (cli.py)
+
+The interactive CLI uses:
+- **Rich** - For the welcome banner and styled panels
+- **prompt_toolkit** - For fixed input area with history and `patch_stdout`
+- **KawaiiSpinner** (in run_agent.py) - Animated feedback during API calls and tool execution
+
+Key components:
+- `HermesCLI` class - Main CLI controller with commands and conversation loop
+- `load_cli_config()` - Loads config, sets environment variables for terminal
+- `build_welcome_banner()` - Displays ASCII art logo, tools, and skills summary
+- `/commands` - Process user commands like `/help`, `/clear`, `/personality`, etc.
+
+CLI uses `quiet_mode=True` when creating AIAgent to suppress verbose logging.
+
+### Adding CLI Commands
+
+1. Add to `COMMANDS` dict with description
+2. Add handler in `process_command()` method
+3. For persistent settings, use `save_config_value()` to update config
+
+---
+
+## Hermes CLI Commands
+
+The unified `hermes` command provides all functionality:
+
+| Command | Description |
+|---------|-------------|
+| `hermes` | Interactive chat (default) |
+| `hermes chat -q "..."` | Single query mode |
+| `hermes setup` | Configure API keys and settings |
+| `hermes config` | View current configuration |
+| `hermes config edit` | Open config in editor |
+| `hermes config set KEY VAL` | Set a specific value |
+| `hermes config check` | Check for missing config |
+| `hermes config migrate` | Prompt for missing config interactively |
+| `hermes status` | Show configuration status |
+| `hermes doctor` | Diagnose issues |
+| `hermes update` | Update to latest (checks for new config) |
+| `hermes uninstall` | Uninstall (can keep configs for reinstall) |
+| `hermes gateway` | Start messaging gateway |
+| `hermes cron list` | View scheduled jobs |
+| `hermes version` | Show version info |
+| `hermes pairing list/approve/revoke` | Manage DM pairing codes |
+
+---
+
+## Messaging Gateway
+
+The gateway connects Hermes to Telegram, Discord, and WhatsApp.
+
+### Configuration (in `~/.hermes/.env`):
+
+```bash
+# Telegram
+TELEGRAM_BOT_TOKEN=123456:ABC-DEF...      # From @BotFather
+TELEGRAM_ALLOWED_USERS=123456789,987654   # Comma-separated user IDs (from @userinfobot)
+
+# Discord  
+DISCORD_BOT_TOKEN=MTIz...                 # From Developer Portal
+DISCORD_ALLOWED_USERS=123456789012345678  # Comma-separated user IDs
+
+# Agent Behavior
+HERMES_MAX_ITERATIONS=60                  # Max tool-calling iterations
+MESSAGING_CWD=/home/myuser                # Terminal working directory for messaging
+
+# Tool Progress (optional)
+HERMES_TOOL_PROGRESS=true                 # Send progress messages
+HERMES_TOOL_PROGRESS_MODE=new             # "new" or "all"
+```
+
+### Working Directory Behavior
+
+- **CLI (`hermes` command)**: Uses current directory (`.` → `os.getcwd()`)
+- **Messaging (Telegram/Discord)**: Uses `MESSAGING_CWD` (default: home directory)
+
+This is intentional: CLI users are in a terminal and expect the agent to work in their current directory, while messaging users need a consistent starting location.
+
+### Security (User Allowlists):
+
+**IMPORTANT**: Without an allowlist, anyone who finds your bot can use it!
+
+The gateway checks `{PLATFORM}_ALLOWED_USERS` environment variables:
+- If set: Only listed user IDs can interact with the bot
+- If unset: All users are allowed (dangerous with terminal access!)
+
+Users can find their IDs:
+- **Telegram**: Message [@userinfobot](https://t.me/userinfobot)
+- **Discord**: Enable Developer Mode, right-click name → Copy ID
+
+### DM Pairing System
+
+Instead of static allowlists, users can pair via one-time codes:
+1. Unknown user DMs the bot → receives pairing code
+2. Owner runs `hermes pairing approve <platform> <code>`
+3. User is permanently authorized
+
+Security: 8-char codes, 1-hour expiry, rate-limited (1/10min/user), max 3 pending per platform, lockout after 5 failed attempts, `chmod 0600` on data files.
+
+Files: `gateway/pairing.py`, `hermes_cli/pairing.py`
+
+### Event Hooks
+
+Hooks fire at lifecycle points. Place hook directories in `~/.hermes/hooks/`:
+
+```
+~/.hermes/hooks/my-hook/
+├── HOOK.yaml    # name, description, events list
+└── handler.py   # async def handle(event_type, context): ...
+```
+
+Events: `gateway:startup`, `session:start`, `session:reset`, `agent:start`, `agent:step`, `agent:end`, `command:*`
+
+The `agent:step` event fires each iteration of the tool-calling loop with tool names and results.
+
+Files: `gateway/hooks.py`
+
+### Tool Progress Notifications
+
+When `HERMES_TOOL_PROGRESS=true`, the bot sends status messages as it works:
+- `💻 \`ls -la\`...` (terminal commands show the actual command)
+- `🔍 web_search...`
+- `📄 web_extract...`
+
+Modes:
+- `new`: Only when switching to a different tool (less spam)
+- `all`: Every single tool call
+
+### Typing Indicator
+
+The gateway keeps the "typing..." indicator active throughout processing, refreshing every 4 seconds. This lets users know the bot is working even during long tool-calling sequences.
+
+### Platform Toolsets:
+
+Each platform has a dedicated toolset in `toolsets.py`:
+- `hermes-telegram`: Full tools including terminal (with safety checks)
+- `hermes-discord`: Full tools including terminal
+- `hermes-whatsapp`: Full tools including terminal
+
+---
+
+## Configuration System
+
+Configuration files are stored in `~/.hermes/` for easy user access:
+- `~/.hermes/config.yaml` - All settings (model, terminal, compression, etc.)
+- `~/.hermes/.env` - API keys and secrets
+
+### Adding New Configuration Options
+
+When adding new configuration variables, you MUST follow this process:
+
+#### For config.yaml options:
+
+1. Add to `DEFAULT_CONFIG` in `hermes_cli/config.py`
+2. **CRITICAL**: Bump `_config_version` in `DEFAULT_CONFIG` when adding required fields
+3. This triggers migration prompts for existing users on next `hermes update` or `hermes setup`
+
+Example:
+```python
+DEFAULT_CONFIG = {
+    # ... existing config ...
+    
+    "new_feature": {
+        "enabled": True,
+        "option": "default_value",
+    },
+    
+    # BUMP THIS when adding required fields
+    "_config_version": 2,  # Was 1, now 2
+}
+```
+
+#### For .env variables (API keys/secrets):
+
+1. Add to `REQUIRED_ENV_VARS` or `OPTIONAL_ENV_VARS` in `hermes_cli/config.py`
+2. Include metadata for the migration system:
+
+```python
+OPTIONAL_ENV_VARS = {
+    # ... existing vars ...
+    "NEW_API_KEY": {
+        "description": "What this key is for",
+        "prompt": "Display name in prompts",
+        "url": "https://where-to-get-it.com/",
+        "tools": ["tools_it_enables"],  # What tools need this
+        "password": True,  # Mask input
+    },
+}
+```
+
+#### Update related files:
+
+- `hermes_cli/setup.py` - Add prompts in the setup wizard
+- `cli-config.yaml.example` - Add example with comments
+- Update README.md if user-facing
+
+### Config Version Migration
+
+The system uses `_config_version` to detect outdated configs:
+
+1. `check_for_missing_config()` compares user config to `DEFAULT_CONFIG`
+2. `migrate_config()` interactively prompts for missing values
+3. Called automatically by `hermes update` and optionally by `hermes setup`
+
+---
+
+## Environment Variables
+
+API keys are loaded from `~/.hermes/.env`:
+- `OPENROUTER_API_KEY` - Main LLM API access (primary provider)
+- `FIRECRAWL_API_KEY` - Web search/extract tools
+- `BROWSERBASE_API_KEY` / `BROWSERBASE_PROJECT_ID` - Browser automation
+- `FAL_KEY` - Image generation (FLUX model)
+- `NOUS_API_KEY` - Vision and Mixture-of-Agents tools
+
+Terminal tool configuration (in `~/.hermes/config.yaml`):
+- `terminal.backend` - Backend: local, docker, singularity, modal, or ssh
+- `terminal.cwd` - Working directory ("." = host CWD for local only; for remote backends set an absolute path inside the target, or omit to use the backend's default)
+- `terminal.docker_image` - Image for Docker backend
+- `terminal.singularity_image` - Image for Singularity backend
+- `terminal.modal_image` - Image for Modal backend
+- SSH: `TERMINAL_SSH_HOST`, `TERMINAL_SSH_USER`, `TERMINAL_SSH_KEY` in .env
+
+Agent behavior (in `~/.hermes/.env`):
+- `HERMES_MAX_ITERATIONS` - Max tool-calling iterations (default: 60)
+- `MESSAGING_CWD` - Working directory for messaging platforms (default: ~)
+- `HERMES_TOOL_PROGRESS` - Enable tool progress messages (`true`/`false`)
+- `HERMES_TOOL_PROGRESS_MODE` - Progress mode: `new` (tool changes) or `all`
+- `OPENAI_API_KEY` - Voice transcription (Whisper STT)
+- `SLACK_BOT_TOKEN` / `SLACK_APP_TOKEN` - Slack integration (Socket Mode)
+- `SLACK_ALLOWED_USERS` - Comma-separated Slack user IDs
+- `HERMES_HUMAN_DELAY_MODE` - Response pacing: off/natural/custom
+- `HERMES_HUMAN_DELAY_MIN_MS` / `HERMES_HUMAN_DELAY_MAX_MS` - Custom delay range
+
+### Dangerous Command Approval
+
+The terminal tool includes safety checks for potentially destructive commands (e.g., `rm -rf`, `DROP TABLE`, `chmod 777`, etc.):
+
+**Behavior by Backend:**
+- **Docker/Singularity/Modal**: Commands run unrestricted (isolated containers)
+- **Local/SSH**: Dangerous commands trigger approval flow
+
+**Approval Flow (CLI):**
+```
+⚠️  Potentially dangerous command detected: recursive delete
+    rm -rf /tmp/test
+
+    [o]nce  |  [s]ession  |  [a]lways  |  [d]eny
+    Choice [o/s/a/D]: 
+```
+
+**Approval Flow (Messaging):**
+- Command is blocked with explanation
+- Agent explains the command was blocked for safety
+- User must add the pattern to their allowlist via `hermes config edit` or run the command directly on their machine
+
+**Configuration:**
+- `command_allowlist` in `~/.hermes/config.yaml` stores permanently allowed patterns
+- Add patterns via "always" approval or edit directly
+
+**Sudo Handling (Messaging):**
+- If sudo fails over messaging, output includes tip to add `SUDO_PASSWORD` to `~/.hermes/.env`
+
+---
+
+## Background Process Management
+
+The `process` tool works alongside `terminal` for managing long-running background processes:
+
+**Starting a background process:**
+```python
+terminal(command="pytest -v tests/", background=true)
+# Returns: {"session_id": "proc_abc123", "pid": 12345, ...}
+```
+
+**Managing it with the process tool:**
+- `process(action="list")` -- show all running/recent processes
+- `process(action="poll", session_id="proc_abc123")` -- check status + new output
+- `process(action="log", session_id="proc_abc123")` -- full output with pagination
+- `process(action="wait", session_id="proc_abc123", timeout=600)` -- block until done
+- `process(action="kill", session_id="proc_abc123")` -- terminate
+- `process(action="write", session_id="proc_abc123", data="y")` -- send stdin
+- `process(action="submit", session_id="proc_abc123", data="yes")` -- send + Enter
+
+**Key behaviors:**
+- Background processes execute through the configured terminal backend (local/Docker/Modal/SSH/Singularity) -- never directly on the host unless `TERMINAL_ENV=local`
+- The `wait` action blocks the tool call until the process finishes, times out, or is interrupted by a new user message
+- PTY mode (`pty=true` on terminal) enables interactive CLI tools (Codex, Claude Code)
+- In RL training, background processes are auto-killed when the episode ends (`tool_context.cleanup()`)
+- In the gateway, sessions with active background processes are exempt from idle reset
+- The process registry checkpoints to `~/.hermes/processes.json` for crash recovery
+
+Files: `tools/process_registry.py` (registry), `model_tools.py` (tool definition + handler), `tools/terminal_tool.py` (spawn integration)
+
+---
+
+## Adding New Tools
+
+Follow this strict order to maintain consistency:
+
+1. Create `tools/your_tool.py` with:
+   - Handler function (sync or async) returning a JSON string via `json.dumps()`
+   - `check_*_requirements()` function to verify dependencies (e.g., API keys)
+   - Schema definition following OpenAI function-calling format
+
+2. Export in `tools/__init__.py`:
+   - Import the handler and check function
+   - Add to `__all__` list
+
+3. Register in `model_tools.py`:
+   - Add to `TOOLSET_REQUIREMENTS` if it needs API keys
+   - Create `get_*_tool_definitions()` function or add to existing
+   - Add routing in `handle_function_call()` dispatcher
+   - Update `get_all_tool_names()` with the tool name
+   - Update `get_toolset_for_tool()` mapping
+   - Update `get_available_toolsets()` and `check_toolset_requirements()`
+
+4. Add to toolset in `toolsets.py`:
+   - Add to existing toolset or create new one in TOOLSETS dict
+
+5. If the tool requires an API key:
+   - Add to `OPTIONAL_ENV_VARS` in `hermes_cli/config.py`
+   - The tool will be auto-disabled if the key is missing
+
+6. Optionally add to `toolset_distributions.py` for batch processing
+
+### Tool Implementation Pattern
+
+```python
+# tools/example_tool.py
+import json
+import os
+
+def check_example_requirements() -> bool:
+    """Check if required API keys/dependencies are available."""
+    return bool(os.getenv("EXAMPLE_API_KEY"))
+
+def example_tool(param: str, task_id: str = None) -> str:
+    """Execute the tool and return JSON string result."""
+    try:
+        result = {"success": True, "data": "..."}
+        return json.dumps(result, ensure_ascii=False)
+    except Exception as e:
+        return json.dumps({"error": str(e)}, ensure_ascii=False)
+```
+
+All tool handlers MUST return a JSON string. Never return raw dicts.
+
+### Dynamic Tool Availability
+
+Tools are automatically disabled when their API keys are missing:
+
+```python
+# In model_tools.py
+TOOLSET_REQUIREMENTS = {
+    "web": {"env_vars": ["FIRECRAWL_API_KEY"]},
+    "browser": {"env_vars": ["BROWSERBASE_API_KEY", "BROWSERBASE_PROJECT_ID"]},
+    "creative": {"env_vars": ["FAL_KEY"]},
+}
+```
+
+The `check_tool_availability()` function determines which tools to include.
+
+### Stateful Tools
+
+Tools that maintain state (terminal, browser) require:
+- `task_id` parameter for session isolation between concurrent tasks
+- `cleanup_*()` function to release resources
+- Cleanup is called automatically in run_agent.py after conversation completes
+
+---
+
+## Trajectory Format
+
+Conversations are saved in ShareGPT format for training:
+```json
+{"from": "system", "value": "System prompt with <tools>...</tools>"}
+{"from": "human", "value": "User message"}
+{"from": "gpt", "value": "<think>reasoning</think>\n<tool_call>{...}</tool_call>"}
+{"from": "tool", "value": "<tool_response>{...}</tool_response>"}
+{"from": "gpt", "value": "Final response"}
+```
+
+Tool calls use `<tool_call>` XML tags, responses use `<tool_response>` tags, reasoning uses `<think>` tags.
+
+### Trajectory Export
+
+```python
+agent = AIAgent(save_trajectories=True)
+agent.chat("Do something")
+# Saves to trajectories/*.jsonl in ShareGPT format
+```
+
+---
+
+## Batch Processing (batch_runner.py)
+
+For processing multiple prompts:
+- Parallel execution with multiprocessing
+- Content-based resume for fault tolerance (matches on prompt text, not indices)
+- Toolset distributions control probabilistic tool availability per prompt
+- Output: `data/<run_name>/trajectories.jsonl` (combined) + individual batch files
+
+```bash
+python batch_runner.py \
+    --dataset_file=prompts.jsonl \
+    --batch_size=20 \
+    --num_workers=4 \
+    --run_name=my_run
+```
+
+---
+
+## Skills System
+
+Skills are on-demand knowledge documents the agent can load. Located in `skills/` directory:
+
+```
+skills/
+├── mlops/                    # Category folder
+│   ├── axolotl/             # Skill folder
+│   │   ├── SKILL.md         # Main instructions (required)
+│   │   ├── references/      # Additional docs, API specs
+│   │   └── templates/       # Output formats, configs
+│   └── vllm/
+│       └── SKILL.md
+└── example-skill/
+    └── SKILL.md
+```
+
+**Progressive disclosure** (token-efficient):
+1. `skills_categories()` - List category names (~50 tokens)
+2. `skills_list(category)` - Name + description per skill (~3k tokens)
+3. `skill_view(name)` - Full content + tags + linked files
+
+SKILL.md files use YAML frontmatter:
+```yaml
+---
+name: skill-name
+description: Brief description for listing
+tags: [tag1, tag2]
+related_skills: [other-skill]
+version: 1.0.0
+---
+# Skill Content...
+```
+
+Tool files: `tools/skills_tool.py` → `model_tools.py` → `toolsets.py`
+
+---
+
+## Testing Changes
+
+After making changes:
+
+1. Run `hermes doctor` to check setup
+2. Run `hermes config check` to verify config
+3. Test with `hermes chat -q "test message"`
+4. For new config options, test fresh install: `rm -rf ~/.hermes && hermes setup`
--- a/README.md
+++ b/README.md
--- a/TODO.md
+++ b/TODO.md
@@ -1,305 +1,63 @@
 # Hermes Agent - Future Improvements

-> Ideas for enhancing the agent's capabilities, generated from self-analysis of the codebase.
-
 ---

-## 1. Memory & Context Management 🧠
+## 1. Subagent Architecture (Context Isolation) 🎯

-**Problem:** Context grows unbounded during long conversations. Trajectory compression exists for training data post-hoc, but live conversations lack intelligent context management.
+The main agent becomes an orchestrator that delegates context-heavy tasks to subagents with isolated context. Each subagent returns a summary, keeping the orchestrator's context clean. `delegate_task(goal, context, toolsets=[])` with fresh conversation, limited toolset, task-specific system prompt.

-**Ideas:**
- [ ] **Incremental summarization** - Compress old tool outputs on-the-fly during conversations
-  - Trigger when context exceeds threshold (e.g., 80% of max tokens)
-  - Preserve recent turns fully, summarize older tool responses
-  - Could reuse logic from `trajectory_compressor.py`
-  
- [ ] **Semantic memory retrieval** - Vector store for long conversation recall
-  - Embed important facts/findings as conversation progresses
-  - Retrieve relevant memories when needed instead of keeping everything in context
-  - Consider lightweight solutions: ChromaDB, FAISS, or even a simple embedding cache
-  
- [ ] **Working vs. episodic memory** distinction
-  - Working memory: Current task state, recent tool results (always in context)
-  - Episodic memory: Past findings, tried approaches (retrieved on demand)
-  - Clear eviction policies for each
+## 2. Planning & Task Management 📋

-**Files to modify:** `run_agent.py` (add memory manager), possibly new `tools/memory_tool.py`
+Task decomposition tool, progress checkpoints after N tool calls, persistent plan storage that survives context compression, failure recovery with replanning.

---
+## 3. Dynamic Skills Expansion 📚

-## 2. Self-Reflection & Course Correction 🔄
+Skill acquisition from successful tasks, parameterized skill templates, skill chaining with dependency graphs.

-**Problem:** Current retry logic handles malformed outputs but not semantic failures. Agent doesn't reason about *why* something failed.
+## 4. Interactive Clarifying Questions ❓

-**Ideas:**
- [ ] **Meta-reasoning after failures** - When a tool returns an error or unexpected result:
-  ```
-  Tool failed → Reflect: "Why did this fail? What assumptions were wrong?"
-  → Adjust approach → Retry with new strategy
-  ```
-  - Could be a lightweight LLM call or structured self-prompt
-  
- [ ] **Planning/replanning module** - For complex multi-step tasks:
-  - Generate plan before execution
-  - After each step, evaluate: "Am I on track? Should I revise the plan?"
-  - Store plan in working memory, update as needed
-  
- [ ] **Approach memory** - Remember what didn't work:
-  - "I tried X for this type of problem and it failed because Y"
-  - Prevents repeating failed strategies in the same conversation
+Multiple-choice prompt tool with rich terminal UI. Up to 4 choices + free-text. CLI-only with graceful fallback for non-interactive modes.

-**Files to modify:** `run_agent.py` (add reflection hooks in tool loop), new `tools/reflection_tool.py`
+## 5. Memory System 🧠

---
+Daily memory logs, long-term curated MEMORY.md, vector/semantic search, pre-compaction memory flush, user profile, learning store for error patterns and discovered fixes. *Inspired by ClawdBot's memory system.*

-## 3. Tool Composition & Learning 🔧
+## 6. Heartbeat System 💓

-**Problem:** Tools are atomic. Complex tasks require repeated manual orchestration of the same tool sequences.
+Periodic agent wake-up that reads HEARTBEAT.md for instructions. Runs inside the main session with full context. Triggers on interval, exec completion, cron events, or manual wake. HEARTBEAT_OK suppression when nothing needs attention. *Inspired by ClawdBot's heartbeat.*

-**Ideas:**
- [ ] **Macro tools / Tool chains** - Define reusable tool sequences:
-  ```yaml
-  research_topic:
-    description: "Deep research on a topic"
-    steps:
-      - web_search: {query: "$topic"}
-      - web_extract: {urls: "$search_results.urls[:3]"}
-      - summarize: {content: "$extracted"}
-  ```
-  - Could be defined in skills or a new `macros/` directory
-  - Agent can invoke macro as single tool call
-  
- [ ] **Tool failure patterns** - Learn from failures:
-  - Track: tool, input pattern, error type, what worked instead
-  - Before calling a tool, check: "Has this pattern failed before?"
-  - Persistent across sessions (stored in skills or separate DB)
-  
- [ ] **Parallel tool execution** - When tools are independent, run concurrently:
-  - Detect independence (no data dependencies between calls)
-  - Use `asyncio.gather()` for parallel execution
-  - Already have async support in some tools, just need orchestration
+## 7. Local Browser Control via CDP 🌐

-**Files to modify:** `model_tools.py`, `toolsets.py`, new `tool_macros.py`
+Support both local Chrome (via CDP, free) and Browserbase (cloud, paid) as browser backends. Local gives persistent login sessions but lacks CAPTCHA solving.

---
+## 8. Signal Integration 📡

-## 4. Dynamic Skills Expansion 📚
+New platform adapter using signal-cli daemon (JSON-RPC HTTP + SSE). Requires Java runtime and phone number registration.

-**Problem:** Skills system is elegant but static. Skills must be manually created and added.
+## 9. Session Transcript Search 🔍

-**Ideas:**
- [ ] **Skill acquisition from successful tasks** - After completing a complex task:
-  - "This approach worked well. Save as a skill?"
-  - Extract: goal, steps taken, tools used, key decisions
-  - Generate SKILL.md automatically
-  - Store in user's skills directory
-  
- [ ] **Skill templates** - Common patterns that can be parameterized:
-  ```markdown
-  # Debug {language} Error
-  1. Reproduce the error
-  2. Search for error message: `web_search("{error_message} {language}")`
-  3. Check common causes: {common_causes}
-  4. Apply fix and verify
-  ```
-  
- [ ] **Skill chaining** - Combine skills for complex workflows:
-  - Skills can reference other skills as dependencies
-  - "To do X, first apply skill Y, then skill Z"
-  - Directed graph of skill dependencies
+`hermes sessions search <query>` CLI command and `session_search` agent tool. Text-based first (ripgrep over JSONL), vector search later.

-**Files to modify:** `tools/skills_tool.py`, `skills/` directory structure, new `skill_generator.py`
+## 10. Plugin/Extension System 🔌

---
+Python plugin interface with `plugin.yaml` + `handler.py`. Discovery from `~/.hermes/plugins/`. Plugins can register tools, hooks, and CLI commands. *Inspired by ClawdBot's 36-plugin extension system.*

-## 5. Task Continuation Hints 🎯
+## 11. Native Companion Apps 📱

-**Problem:** Could be more helpful by suggesting logical next steps.
+macOS (Swift/SwiftUI), iOS, Android apps connecting via WebSocket. Prerequisite: WS API on gateway. MVP: web UI with Flask/FastAPI. *Inspired by ClawdBot's companion apps.*

-**Ideas:**
- [ ] **Suggest next steps** - At end of a task, suggest logical continuations:
-  - "Code is written. Want me to also write tests / docs / deploy?"
-  - Based on common workflows for task type
-  - Non-intrusive, just offer options
+## 12. Evaluation System 📏

-**Files to modify:** `run_agent.py`, response generation logic
+LLM grader mode for batch_runner, action comparison against expected tool calls, string matching baselines.

---
+## 13. Layered Context Architecture 📊

-## 6. Uncertainty & Honesty Calibration 🎚️
+Structured hierarchy: project context > skills > user profile > learnings > external knowledge > runtime introspection.

-**Problem:** Sometimes confidently wrong. Should be better calibrated about what I know vs. don't know.
+## 14. Tools Wishlist 🧰

-**Ideas:**
- [ ] **Source attribution** - Track where information came from:
-  - "According to the docs I just fetched..." vs "From my training data (may be outdated)..."
-  - Let user assess reliability themselves
-
- [ ] **Cross-reference high-stakes claims** - Self-check for made-up details:
-  - When stakes are high, verify with tools before presenting as fact
-  - "Let me verify that before you act on it..."
-
-**Files to modify:** `run_agent.py`, response generation logic
-
---
-
-## 7. Resource Awareness & Efficiency 💰
-
-**Problem:** No awareness of costs, time, or resource usage. Could be smarter about efficiency.
-
-**Ideas:**
- [ ] **Tool result caching** - Don't repeat identical operations:
-  - Cache web searches, extractions within a session
-  - Invalidation based on time-sensitivity of query
-  - Hash-based lookup: same input → cached output
-
- [ ] **Lazy evaluation** - Don't fetch everything upfront:
-  - Get summaries first, full content only if needed
-  - "I found 5 relevant pages. Want me to deep-dive on any?"
-
-**Files to modify:** `model_tools.py`, new `resource_tracker.py`
-
---
-
-## 8. Collaborative Problem Solving 🤝
-
-**Problem:** Interaction is command/response. Complex problems benefit from dialogue.
-
-**Ideas:**
- [ ] **Assumption surfacing** - Make implicit assumptions explicit:
-  - "I'm assuming you want Python 3.11+. Correct?"
-  - "This solution assumes you have sudo access..."
-  - Let user correct before going down wrong path
-
- [ ] **Checkpoint & confirm** - For high-stakes operations:
-  - "About to delete 47 files. Here's the list - proceed?"
-  - "This will modify your database. Want a backup first?"
-  - Configurable threshold for when to ask
-
-**Files to modify:** `run_agent.py`, system prompt configuration
-
---
-
-## 9. Project-Local Context 💾
-
-**Problem:** Valuable context lost between sessions.
-
-**Ideas:**
- [ ] **Project awareness** - Remember project-specific context:
-  - Store `.hermes/context.md` in project directory
-  - "This is a Django project using PostgreSQL"
-  - Coding style preferences, deployment setup, etc.
-  - Load automatically when working in that directory
-
- [ ] **Handoff notes** - Leave notes for future sessions:
-  - Write to `.hermes/notes.md` in project
-  - "TODO for next session: finish implementing X"
-  - "Known issues: Y doesn't work on Windows"
-
-**Files to modify:** New `project_context.py`, auto-load in `run_agent.py`
-
---
-
-## 10. Graceful Degradation & Robustness 🛡️
-
-**Problem:** When things go wrong, recovery is limited. Should fail gracefully.
-
-**Ideas:**
- [ ] **Fallback chains** - When primary approach fails, have backups:
-  - `web_extract` fails → try `browser_navigate` → try `web_search` for cached version
-  - Define fallback order per tool type
-  
- [ ] **Partial progress preservation** - Don't lose work on failure:
-  - Long task fails midway → save what we've got
-  - "I completed 3/5 steps before the error. Here's what I have..."
-  
- [ ] **Self-healing** - Detect and recover from bad states:
-  - Browser stuck → close and retry
-  - Terminal hung → timeout and reset
-
-**Files to modify:** `model_tools.py`, tool implementations, new `fallback_manager.py`
-
---
-
-## 11. Tools & Skills Wishlist 🧰
-
-*Things that would need new tool implementations (can't do well with current tools):*
-
-### High-Impact
-
- [ ] **Audio/Video Transcription** 🎬
-  - Transcribe audio files, podcasts, YouTube videos
-  - Extract key moments from video
-  - Currently blind to multimedia content
-  - *Could potentially use whisper via terminal, but native tool would be cleaner*
-  
- [ ] **Diagram Rendering** 📊
-  - Render Mermaid/PlantUML to actual images
-  - Can generate the code, but rendering requires external service or tool
-  - "Show me how these components connect" → actual visual diagram
-
-### Medium-Impact
-
- [ ] **Document Generation** 📄
-  - Create styled PDFs, Word docs, presentations
-  - *Can do basic PDF via terminal tools, but limited*
-
- [ ] **Diff/Patch Tool** 📝
-  - Surgical code modifications with preview
-  - "Change line 45-50 to X" without rewriting whole file
-  - Show diffs before applying
-  - *Can use `diff`/`patch` but a native tool would be safer*
-
-### Skills to Create
-
- [ ] **Domain-specific skill packs:**
-  - DevOps/Infrastructure (Terraform, K8s, AWS)
-  - Data Science workflows (EDA, model training)
-  - Security/pentesting procedures
-  
- [ ] **Framework-specific skills:**
-  - React/Vue/Angular patterns
-  - Django/Rails/Express conventions
-  - Database optimization playbooks
-
- [ ] **Troubleshooting flowcharts:**
-  - "Docker container won't start" → decision tree
-  - "Production is slow" → systematic diagnosis
-
---
-
-## Priority Order (Suggested)
-
-1. **Memory & Context Management** - Biggest impact on complex tasks
-2. **Self-Reflection** - Improves reliability and reduces wasted tool calls  
-3. **Project-Local Context** - Practical win, keeps useful info across sessions
-4. **Tool Composition** - Quality of life, builds on other improvements
-5. **Dynamic Skills** - Force multiplier for repeated tasks
-
---
-
-## Removed Items (Unrealistic)
-
-The following were removed because they're architecturally impossible:
-
- ~~Proactive suggestions / Prefetching~~ - Agent only runs on user request, can't interject
- ~~Session save/restore across conversations~~ - Agent doesn't control session persistence
- ~~User preference learning across sessions~~ - Same issue
- ~~Clipboard integration~~ - No access to user's local system clipboard
- ~~Voice/TTS playback~~ - Can generate audio but can't play it to user
- ~~Set reminders~~ - No persistent background execution
-
-The following were removed because they're **already possible**:
-
- ~~HTTP/API Client~~ → Use `curl` or Python `requests` in terminal
- ~~Structured Data Manipulation~~ → Use `pandas` in terminal
- ~~Git-Native Operations~~ → Use `git` CLI in terminal
- ~~Symbolic Math~~ → Use `SymPy` in terminal
- ~~Code Quality Tools~~ → Run linters (`eslint`, `black`, `mypy`) in terminal
- ~~Testing Framework~~ → Run `pytest`, `jest`, etc. in terminal
- ~~Translation~~ → LLM handles this fine, or use translation APIs
-
---
-
-*Last updated: $(date +%Y-%m-%d)* 🤖
+- Diagram rendering (Mermaid/PlantUML to images)
+- Document generation (PDFs, Word, presentations)
+- Canvas / visual workspace
+- Coding agent skill (Codex, Claude Code orchestration via PTY)
+- Domain skill packs (DevOps, data science, security)
--- a/pycache/model_tools.cpython-310.pyc
+++ b/pycache/model_tools.cpython-310.pyc
--- a/pycache/web_tools.cpython-310.pyc
+++ b/pycache/web_tools.cpython-310.pyc
--- a/batch_runner.py
+++ b/batch_runner.py
@@ -41,24 +41,17 @@ from toolset_distributions import (
    sample_toolsets_from_distribution,
    validate_distribution
 )
+from model_tools import TOOL_TO_TOOLSET_MAP


 # Global configuration for worker processes
 _WORKER_CONFIG = {}

-# All possible tools - used to ensure consistent schema across all trajectory entries
-# This is required because Arrow/Parquet (used by HuggingFace datasets) needs identical schemas
-ALL_POSSIBLE_TOOLS = {
-    'terminal', 'web_search', 'web_extract',
-    'vision_analyze', 'image_generate', 'mixture_of_agents',
-    # Skills tools
-    'skills_categories', 'skills_list', 'skill_view',
-    # Browser automation tools
-    'browser_navigate', 'browser_snapshot', 'browser_click',
-    'browser_type', 'browser_scroll', 'browser_back',
-    'browser_press', 'browser_close', 'browser_get_images',
-    'browser_vision'
-}
+# All possible tools - auto-derived from the master mapping in model_tools.py.
+# This stays in sync automatically when new tools are added to TOOL_TO_TOOLSET_MAP.
+# Used for consistent schema in Arrow/Parquet (HuggingFace datasets) and for
+# filtering corrupted entries during trajectory combination.
+ALL_POSSIBLE_TOOLS = set(TOOL_TO_TOOLSET_MAP.keys())

 # Default stats for tools that weren't used
 DEFAULT_TOOL_STATS = {'count': 0, 'success': 0, 'failure': 0}
@@ -200,6 +193,42 @@ def _extract_tool_stats(messages: List[Dict[str, Any]]) -> Dict[str, Dict[str, i
    return tool_stats


+def _extract_reasoning_stats(messages: List[Dict[str, Any]]) -> Dict[str, int]:
+    """
+    Count how many assistant turns have reasoning vs no reasoning.
+    
+    Checks for <REASONING_SCRATCHPAD> in content or a non-empty 'reasoning' field
+    (native thinking tokens). Returns counts for tracking reasoning coverage.
+    
+    Args:
+        messages: Message history
+        
+    Returns:
+        Dict with 'total_assistant_turns', 'turns_with_reasoning', 'turns_without_reasoning'
+    """
+    total = 0
+    with_reasoning = 0
+    
+    for msg in messages:
+        if msg.get("role") != "assistant":
+            continue
+        total += 1
+        
+        content = msg.get("content", "") or ""
+        has_scratchpad = "<REASONING_SCRATCHPAD>" in content
+        has_native_reasoning = bool(msg.get("reasoning", "").strip()) if msg.get("reasoning") else False
+        
+        if has_scratchpad or has_native_reasoning:
+            with_reasoning += 1
+    
+    return {
+        "total_assistant_turns": total,
+        "turns_with_reasoning": with_reasoning,
+        "turns_without_reasoning": total - with_reasoning,
+        "has_any_reasoning": with_reasoning > 0,
+    }
+
+
 def _process_single_prompt(
    prompt_index: int,
    prompt_data: Dict[str, Any],
@@ -244,6 +273,10 @@ def _process_single_prompt(
            providers_ignored=config.get("providers_ignored"),
            providers_order=config.get("providers_order"),
            provider_sort=config.get("provider_sort"),
+            max_tokens=config.get("max_tokens"),
+            reasoning_config=config.get("reasoning_config"),
+            prefill_messages=config.get("prefill_messages"),
+            skip_context_files=True,  # Don't pollute trajectories with SOUL.md/AGENTS.md
        )

        # Run the agent with task_id to ensure each task gets its own isolated VM
@@ -252,6 +285,9 @@ def _process_single_prompt(
        # Extract tool usage statistics
        tool_stats = _extract_tool_stats(result["messages"])
        
+        # Extract reasoning coverage stats
+        reasoning_stats = _extract_reasoning_stats(result["messages"])
+        
        # Convert to trajectory format (using existing method)
        trajectory = agent._convert_to_trajectory_format(
            result["messages"],
@@ -264,6 +300,7 @@ def _process_single_prompt(
            "prompt_index": prompt_index,
            "trajectory": trajectory,
            "tool_stats": tool_stats,
+            "reasoning_stats": reasoning_stats,
            "completed": result["completed"],
            "partial": result.get("partial", False),
            "api_calls": result["api_calls"],
@@ -332,7 +369,9 @@ def _process_batch_worker(args: Tuple) -> Dict[str, Any]:
    
    # Initialize aggregated stats for this batch
    batch_tool_stats = {}
+    batch_reasoning_stats = {"total_assistant_turns": 0, "turns_with_reasoning": 0, "turns_without_reasoning": 0}
    completed_in_batch = []
+    discarded_no_reasoning = 0
    
    # Process each prompt sequentially in this batch
    for prompt_index, prompt_data in prompts_to_process:
@@ -346,6 +385,13 @@ def _process_batch_worker(args: Tuple) -> Dict[str, Any]:
        
        # Save trajectory if successful
        if result["success"] and result["trajectory"]:
+            # Discard samples with zero reasoning across all turns
+            reasoning = result.get("reasoning_stats", {})
+            if not reasoning.get("has_any_reasoning", True):
+                print(f"   🚫 Prompt {prompt_index} discarded (no reasoning in any turn)")
+                discarded_no_reasoning += 1
+                continue
+            
            # Get and normalize tool stats for consistent schema across all entries
            raw_tool_stats = result.get("tool_stats", {})
            tool_stats = _normalize_tool_stats(raw_tool_stats)
@@ -386,6 +432,10 @@ def _process_batch_worker(args: Tuple) -> Dict[str, Any]:
            batch_tool_stats[tool_name]["success"] += stats["success"]
            batch_tool_stats[tool_name]["failure"] += stats["failure"]
        
+        # Aggregate reasoning stats
+        for key in batch_reasoning_stats:
+            batch_reasoning_stats[key] += result.get("reasoning_stats", {}).get(key, 0)
+        
        # Only mark as completed if successfully saved (failed prompts can be retried on resume)
        if result["success"] and result["trajectory"]:
            completed_in_batch.append(prompt_index)
@@ -401,6 +451,8 @@ def _process_batch_worker(args: Tuple) -> Dict[str, Any]:
        "processed": len(prompts_to_process),
        "skipped": len(batch_data) - len(prompts_to_process),
        "tool_stats": batch_tool_stats,
+        "reasoning_stats": batch_reasoning_stats,
+        "discarded_no_reasoning": discarded_no_reasoning,
        "completed_prompts": completed_in_batch
    }

@@ -428,6 +480,10 @@ class BatchRunner:
        providers_ignored: List[str] = None,
        providers_order: List[str] = None,
        provider_sort: str = None,
+        max_tokens: int = None,
+        reasoning_config: Dict[str, Any] = None,
+        prefill_messages: List[Dict[str, Any]] = None,
+        max_samples: int = None,
    ):
        """
        Initialize the batch runner.
@@ -449,6 +505,10 @@ class BatchRunner:
            providers_ignored (List[str]): OpenRouter providers to ignore (optional)
            providers_order (List[str]): OpenRouter providers to try in order (optional)
            provider_sort (str): Sort providers by price/throughput/latency (optional)
+            max_tokens (int): Maximum tokens for model responses (optional, uses model default if not set)
+            reasoning_config (Dict): OpenRouter reasoning config override (e.g. {"effort": "none"} to disable thinking)
+            prefill_messages (List[Dict]): Messages to prepend as prefilled conversation context (few-shot priming)
+            max_samples (int): Only process the first N samples from the dataset (optional, processes all if not set)
        """
        self.dataset_file = Path(dataset_file)
        self.batch_size = batch_size
@@ -466,6 +526,10 @@ class BatchRunner:
        self.providers_ignored = providers_ignored
        self.providers_order = providers_order
        self.provider_sort = provider_sort
+        self.max_tokens = max_tokens
+        self.reasoning_config = reasoning_config
+        self.prefill_messages = prefill_messages
+        self.max_samples = max_samples
        
        # Validate distribution
        if not validate_distribution(distribution):
@@ -481,8 +545,12 @@ class BatchRunner:
        # Statistics file
        self.stats_file = self.output_dir / "statistics.json"
        
-        # Load dataset
+        # Load dataset (and optionally truncate to max_samples)
        self.dataset = self._load_dataset()
+        if self.max_samples and self.max_samples < len(self.dataset):
+            full_count = len(self.dataset)
+            self.dataset = self.dataset[:self.max_samples]
+            print(f"✂️  Truncated dataset from {full_count} to {self.max_samples} samples (--max_samples)")
        
        # Create batches
        self.batches = self._create_batches()
@@ -735,6 +803,9 @@ class BatchRunner:
            "providers_ignored": self.providers_ignored,
            "providers_order": self.providers_order,
            "provider_sort": self.provider_sort,
+            "max_tokens": self.max_tokens,
+            "reasoning_config": self.reasoning_config,
+            "prefill_messages": self.prefill_messages,
        }
        
        # For backward compatibility, still track by index (but this is secondary to content matching)
@@ -797,6 +868,8 @@ class BatchRunner:
        
        # Aggregate all batch statistics and update checkpoint
        all_completed_prompts = list(completed_prompts_set)
+        total_reasoning_stats = {"total_assistant_turns": 0, "turns_with_reasoning": 0, "turns_without_reasoning": 0}
+        
        for batch_result in results:
            # Add newly completed prompts
            all_completed_prompts.extend(batch_result.get("completed_prompts", []))
@@ -813,6 +886,10 @@ class BatchRunner:
                total_tool_stats[tool_name]["count"] += stats["count"]
                total_tool_stats[tool_name]["success"] += stats["success"]
                total_tool_stats[tool_name]["failure"] += stats["failure"]
+            
+            # Aggregate reasoning stats
+            for key in total_reasoning_stats:
+                total_reasoning_stats[key] += batch_result.get("reasoning_stats", {}).get(key, 0)
        
        # Save final checkpoint
        checkpoint_data["completed_prompts"] = all_completed_prompts
@@ -835,15 +912,8 @@ class BatchRunner:
        combined_file = self.output_dir / "trajectories.jsonl"
        print(f"\n📦 Combining ALL batch files into {combined_file.name}...")
        
-        VALID_TOOLS = {'web_search', 'web_extract', 'terminal', 'vision_analyze', 
-                       'image_generate', 'mixture_of_agents',
-                       # Skills tools
-                       'skills_categories', 'skills_list', 'skill_view',
-                       # Browser automation tools
-                       'browser_navigate', 'browser_snapshot', 'browser_click',
-                       'browser_type', 'browser_scroll', 'browser_back',
-                       'browser_press', 'browser_close', 'browser_get_images',
-                       'browser_vision'}
+        # Valid tools auto-derived from model_tools.py — no manual updates needed
+        VALID_TOOLS = ALL_POSSIBLE_TOOLS
        
        total_entries = 0
        filtered_entries = 0
@@ -892,7 +962,8 @@ class BatchRunner:
            "model": self.model,
            "completed_at": datetime.now().isoformat(),
            "duration_seconds": round(time.time() - start_time, 2),
-            "tool_statistics": total_tool_stats
+            "tool_statistics": total_tool_stats,
+            "reasoning_statistics": total_reasoning_stats,
        }
        
        with open(self.stats_file, 'w', encoding='utf-8') as f:
@@ -930,6 +1001,25 @@ class BatchRunner:
        else:
            print("No tool calls were made during this run.")
        
+        # Print reasoning coverage stats
+        total_discarded = sum(r.get("discarded_no_reasoning", 0) for r in results)
+        
+        print(f"\n🧠 Reasoning Coverage:")
+        print("-" * 70)
+        total_turns = total_reasoning_stats["total_assistant_turns"]
+        with_reasoning = total_reasoning_stats["turns_with_reasoning"]
+        without_reasoning = total_reasoning_stats["turns_without_reasoning"]
+        if total_turns > 0:
+            pct_with = round(with_reasoning / total_turns * 100, 1)
+            pct_without = round(without_reasoning / total_turns * 100, 1)
+            print(f"   Total assistant turns:    {total_turns:,}")
+            print(f"   With reasoning:           {with_reasoning:,} ({pct_with}%)")
+            print(f"   Without reasoning:        {without_reasoning:,} ({pct_without}%)")
+        else:
+            print("   No assistant turns recorded.")
+        if total_discarded > 0:
+            print(f"   🚫 Samples discarded (zero reasoning): {total_discarded:,}")
+        
        print(f"\n💾 Results saved to: {self.output_dir}")
        print(f"   - Trajectories: trajectories.jsonl (combined)")
        print(f"   - Individual batches: batch_*.jsonl (for debugging)")
@@ -956,6 +1046,11 @@ def main(
    providers_ignored: str = None,
    providers_order: str = None,
    provider_sort: str = None,
+    max_tokens: int = None,
+    reasoning_effort: str = None,
+    reasoning_disabled: bool = False,
+    prefill_messages_file: str = None,
+    max_samples: int = None,
 ):
    """
    Run batch processing of agent prompts from a dataset.
@@ -979,6 +1074,11 @@ def main(
        providers_ignored (str): Comma-separated list of OpenRouter providers to ignore (e.g. "together,deepinfra")
        providers_order (str): Comma-separated list of OpenRouter providers to try in order (e.g. "anthropic,openai,google")
        provider_sort (str): Sort providers by "price", "throughput", or "latency" (OpenRouter only)
+        max_tokens (int): Maximum tokens for model responses (optional, uses model default if not set)
+        reasoning_effort (str): OpenRouter reasoning effort level: "xhigh", "high", "medium", "low", "minimal", "none" (default: "xhigh")
+        reasoning_disabled (bool): Completely disable reasoning/thinking tokens (default: False)
+        prefill_messages_file (str): Path to JSON file containing prefill messages (list of {role, content} dicts)
+        max_samples (int): Only process the first N samples from the dataset (optional, processes all if not set)
        
    Examples:
        # Basic usage
@@ -990,9 +1090,13 @@ def main(
        # Use specific distribution
        python batch_runner.py --dataset_file=data.jsonl --batch_size=10 --run_name=image_test --distribution=image_gen
        
-        # With ephemeral system prompt (not saved to dataset)
+        # With disabled reasoning and max tokens
        python batch_runner.py --dataset_file=data.jsonl --batch_size=10 --run_name=my_run \\
-                               --ephemeral_system_prompt="You are a helpful assistant focused on image generation."
+                               --reasoning_disabled --max_tokens=128000
+        
+        # With prefill messages from file
+        python batch_runner.py --dataset_file=data.jsonl --batch_size=10 --run_name=my_run \\
+                               --prefill_messages_file=configs/prefill_opus.json
        
        # List available distributions
        python batch_runner.py --list_distributions
@@ -1031,6 +1135,36 @@ def main(
    providers_ignored_list = [p.strip() for p in providers_ignored.split(",")] if providers_ignored else None
    providers_order_list = [p.strip() for p in providers_order.split(",")] if providers_order else None
    
+    # Build reasoning_config from CLI flags
+    # --reasoning_disabled takes priority, then --reasoning_effort, then default (xhigh)
+    reasoning_config = None
+    if reasoning_disabled:
+        # Completely disable reasoning/thinking tokens
+        reasoning_config = {"effort": "none"}
+        print("🧠 Reasoning: DISABLED (effort=none)")
+    elif reasoning_effort:
+        # Use specified effort level
+        valid_efforts = ["xhigh", "high", "medium", "low", "minimal", "none"]
+        if reasoning_effort not in valid_efforts:
+            print(f"❌ Error: --reasoning_effort must be one of: {', '.join(valid_efforts)}")
+            return
+        reasoning_config = {"enabled": True, "effort": reasoning_effort}
+        print(f"🧠 Reasoning effort: {reasoning_effort}")
+    
+    # Load prefill messages from JSON file if provided
+    prefill_messages = None
+    if prefill_messages_file:
+        try:
+            with open(prefill_messages_file, 'r', encoding='utf-8') as f:
+                prefill_messages = json.load(f)
+            if not isinstance(prefill_messages, list):
+                print(f"❌ Error: prefill_messages_file must contain a JSON array of messages")
+                return
+            print(f"💬 Loaded {len(prefill_messages)} prefill messages from {prefill_messages_file}")
+        except Exception as e:
+            print(f"❌ Error loading prefill messages: {e}")
+            return
+    
    # Initialize and run batch runner
    try:
        runner = BatchRunner(
@@ -1050,6 +1184,10 @@ def main(
            providers_ignored=providers_ignored_list,
            providers_order=providers_order_list,
            provider_sort=provider_sort,
+            max_tokens=max_tokens,
+            reasoning_config=reasoning_config,
+            prefill_messages=prefill_messages,
+            max_samples=max_samples,
        )

        runner.run(resume=resume)
--- a/cli-config.yaml.example
+++ b/cli-config.yaml.example
@@ -7,7 +7,7 @@
 # =============================================================================
 model:
  # Default model to use (can be overridden with --model flag)
-  default: "anthropic/claude-sonnet-4"
+  default: "anthropic/claude-opus-4.6"
  
  # API configuration (falls back to OPENROUTER_API_KEY env var)
  # api_key: "your-key-here"  # Uncomment to set here instead of .env
@@ -23,11 +23,15 @@ model:
 # OPTION 1: Local execution (default)
 # Commands run directly on your machine in the current directory
 # -----------------------------------------------------------------------------
+# Working directory behavior:
+#   - CLI (`hermes` command): Uses "." (current directory where you run hermes)
+#   - Messaging (Telegram/Discord): Uses MESSAGING_CWD from .env (default: home)
 terminal:
-  env_type: "local"
-  cwd: "."  # Use "." for current directory, or specify absolute path
+  backend: "local"
+  cwd: "."  # For local backend: "." = current directory. Ignored for remote backends.
  timeout: 180
  lifetime_seconds: 300
+  # sudo_password: ""  # Enable sudo commands (pipes via sudo -S) - SECURITY WARNING: plaintext!

 # -----------------------------------------------------------------------------
 # OPTION 2: SSH remote execution
@@ -35,8 +39,8 @@ terminal:
 # Great for: keeping agent isolated from its own code, using powerful remote hardware
 # -----------------------------------------------------------------------------
 # terminal:
-#   env_type: "ssh"
-#   cwd: "/home/myuser/project"
+#   backend: "ssh"
+#   cwd: "/home/myuser/project"  # Path on the REMOTE server
 #   timeout: 180
 #   lifetime_seconds: 300
 #   ssh_host: "my-server.example.com"
@@ -50,11 +54,11 @@ terminal:
 # Great for: reproducible environments, testing, isolation
 # -----------------------------------------------------------------------------
 # terminal:
-#   env_type: "docker"
-#   cwd: "/workspace"
+#   backend: "docker"
+#   cwd: "/workspace"  # Path INSIDE the container (default: /)
 #   timeout: 180
 #   lifetime_seconds: 300
-#   docker_image: "python:3.11"
+#   docker_image: "nikolaik/python-nodejs:python3.11-nodejs20"

 # -----------------------------------------------------------------------------
 # OPTION 4: Singularity/Apptainer container
@@ -62,11 +66,11 @@ terminal:
 # Great for: HPC clusters, shared compute environments
 # -----------------------------------------------------------------------------
 # terminal:
-#   env_type: "singularity"
-#   cwd: "/workspace"
+#   backend: "singularity"
+#   cwd: "/workspace"  # Path INSIDE the container (default: /root)
 #   timeout: 180
 #   lifetime_seconds: 300
-#   singularity_image: "docker://python:3.11"
+#   singularity_image: "docker://nikolaik/python-nodejs:python3.11-nodejs20"

 # -----------------------------------------------------------------------------
 # OPTION 5: Modal cloud execution
@@ -74,18 +78,78 @@ terminal:
 # Great for: GPU access, scalable compute, serverless execution
 # -----------------------------------------------------------------------------
 # terminal:
-#   env_type: "modal"
-#   cwd: "/workspace"
+#   backend: "modal"
+#   cwd: "/workspace"  # Path INSIDE the sandbox (default: /root)
 #   timeout: 180
 #   lifetime_seconds: 300
-#   modal_image: "python:3.11"
+#   modal_image: "nikolaik/python-nodejs:python3.11-nodejs20"
+
+# -----------------------------------------------------------------------------
+# SUDO SUPPORT (works with ALL backends above)
+# -----------------------------------------------------------------------------
+# Add sudo_password to any terminal config above to enable sudo commands.
+# The password is piped via `sudo -S`. Works with local, ssh, docker, etc.
+#
+# SECURITY WARNING: Password stored in plaintext!
+#
+# INTERACTIVE PROMPT: If no sudo_password is set and the CLI is running,
+# you'll be prompted to enter your password when sudo is needed:
+# - 45-second timeout (auto-skips if no input)
+# - Press Enter to skip (command fails gracefully)
+# - Password is hidden while typing
+# - Password is cached for the session
+#
+# ALTERNATIVES:
+# - SSH backend: Configure passwordless sudo on the remote server
+# - Containers: Run as root inside the container (no sudo needed)
+# - Local: Configure /etc/sudoers for specific commands
+#
+# Example (add to your terminal section):
+#   sudo_password: "your-password-here"
+
+# =============================================================================
+# Browser Tool Configuration
+# =============================================================================
+browser:
+  # Inactivity timeout in seconds - browser sessions are automatically closed
+  # after this period of no activity between agent loops (default: 120 = 2 minutes)
+  inactivity_timeout: 120
+
+# =============================================================================
+# Context Compression (Auto-shrinks long conversations)
+# =============================================================================
+# When conversation approaches model's context limit, middle turns are
+# automatically summarized to free up space while preserving important context.
+#
+# HOW IT WORKS:
+# 1. Tracks actual token usage from API responses (not estimates)
+# 2. When prompt_tokens >= threshold% of model's context_length, triggers compression
+# 3. Protects first 3 turns (system prompt, initial request, first response)
+# 4. Protects last 4 turns (recent context is most relevant)
+# 5. Summarizes middle turns using a fast/cheap model
+# 6. Inserts summary as a user message, continues conversation seamlessly
+#
+compression:
+  # Enable automatic context compression (default: true)
+  # Set to false if you prefer to manage context manually or want errors on overflow
+  enabled: true
+  
+  # Trigger compression at this % of model's context limit (default: 0.85 = 85%)
+  # Lower values = more aggressive compression, higher values = compress later
+  threshold: 0.85
+  
+  # Model to use for generating summaries (fast/cheap recommended)
+  # This model compresses the middle turns into a concise summary
+  summary_model: "google/gemini-3-flash-preview"

 # =============================================================================
 # Agent Behavior
 # =============================================================================
 agent:
-  # Maximum conversation turns before stopping
-  max_turns: 20
+  # Maximum tool-calling iterations per conversation
+  # Higher = more room for complex tasks, but costs more tokens
+  # Recommended: 20-30 for focused tasks, 50-100 for open exploration
+  max_turns: 60
  
  # Enable verbose logging
  verbose: false
@@ -180,6 +244,39 @@ toolsets:
 # toolsets:
 #   - safe

+# =============================================================================
+# Voice Transcription (Speech-to-Text)
+# =============================================================================
+# Automatically transcribe voice messages on messaging platforms.
+# Requires OPENAI_API_KEY in .env (uses OpenAI Whisper API directly).
+stt:
+  enabled: true
+  model: "whisper-1"  # whisper-1 (cheapest) | gpt-4o-mini-transcribe | gpt-4o-transcribe
+
+# =============================================================================
+# Response Pacing (Messaging Platforms)
+# =============================================================================
+# Add human-like delays between message chunks.
+# human_delay:
+#   mode: "off"      # "off" | "natural" | "custom"
+#   min_ms: 800      # Min delay (custom mode only)
+#   max_ms: 2500     # Max delay (custom mode only)
+
+# =============================================================================
+# Session Logging
+# =============================================================================
+# Session trajectories are automatically saved to logs/ directory.
+# Each session creates: logs/session_YYYYMMDD_HHMMSS_UUID.json
+#
+# The session ID is displayed in the welcome banner for easy reference.
+# Logs contain full conversation history in trajectory format:
+# - System prompt, user messages, assistant responses
+# - Tool calls with inputs/outputs
+# - Timestamps for debugging
+#
+# No configuration needed - logging is always enabled.
+# To disable, you would need to modify the source code.
+
 # =============================================================================
 # Display
 # =============================================================================
--- a/cli.py
+++ b/cli.py
--- a/cron/init.py
+++ b/cron/init.py
@@ -0,0 +1,36 @@
+"""
+Cron job scheduling system for Hermes Agent.
+
+This module provides scheduled task execution, allowing the agent to:
+- Run automated tasks on schedules (cron expressions, intervals, one-shot)
+- Self-schedule reminders and follow-up tasks
+- Execute tasks in isolated sessions (no prior context)
+
+Usage:
+    # Run due jobs (for system cron integration)
+    python -c "from cron import tick; tick()"
+    
+    # Or via CLI
+    python cli.py --cron-daemon
+"""
+
+from cron.jobs import (
+    create_job,
+    get_job,
+    list_jobs,
+    remove_job,
+    update_job,
+    JOBS_FILE,
+)
+from cron.scheduler import tick, run_daemon
+
+__all__ = [
+    "create_job",
+    "get_job", 
+    "list_jobs",
+    "remove_job",
+    "update_job",
+    "tick",
+    "run_daemon",
+    "JOBS_FILE",
+]
--- a/cron/jobs.py
+++ b/cron/jobs.py
@@ -0,0 +1,383 @@
+"""
+Cron job storage and management.
+
+Jobs are stored in ~/.hermes/cron/jobs.json
+Output is saved to ~/.hermes/cron/output/{job_id}/{timestamp}.md
+"""
+
+import json
+import os
+import re
+import uuid
+from datetime import datetime, timedelta
+from pathlib import Path
+from typing import Optional, Dict, List, Any
+
+try:
+    from croniter import croniter
+    HAS_CRONITER = True
+except ImportError:
+    HAS_CRONITER = False
+
+# =============================================================================
+# Configuration
+# =============================================================================
+
+HERMES_DIR = Path.home() / ".hermes"
+CRON_DIR = HERMES_DIR / "cron"
+JOBS_FILE = CRON_DIR / "jobs.json"
+OUTPUT_DIR = CRON_DIR / "output"
+
+
+def ensure_dirs():
+    """Ensure cron directories exist."""
+    CRON_DIR.mkdir(parents=True, exist_ok=True)
+    OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
+
+
+# =============================================================================
+# Schedule Parsing
+# =============================================================================
+
+def parse_duration(s: str) -> int:
+    """
+    Parse duration string into minutes.
+    
+    Examples:
+        "30m" → 30
+        "2h" → 120
+        "1d" → 1440
+    """
+    s = s.strip().lower()
+    match = re.match(r'^(\d+)\s*(m|min|mins|minute|minutes|h|hr|hrs|hour|hours|d|day|days)$', s)
+    if not match:
+        raise ValueError(f"Invalid duration: '{s}'. Use format like '30m', '2h', or '1d'")
+    
+    value = int(match.group(1))
+    unit = match.group(2)[0]  # First char: m, h, or d
+    
+    multipliers = {'m': 1, 'h': 60, 'd': 1440}
+    return value * multipliers[unit]
+
+
+def parse_schedule(schedule: str) -> Dict[str, Any]:
+    """
+    Parse schedule string into structured format.
+    
+    Returns dict with:
+        - kind: "once" | "interval" | "cron"
+        - For "once": "run_at" (ISO timestamp)
+        - For "interval": "minutes" (int)
+        - For "cron": "expr" (cron expression)
+    
+    Examples:
+        "30m"              → once in 30 minutes
+        "2h"               → once in 2 hours
+        "every 30m"        → recurring every 30 minutes
+        "every 2h"         → recurring every 2 hours
+        "0 9 * * *"        → cron expression
+        "2026-02-03T14:00" → once at timestamp
+    """
+    schedule = schedule.strip()
+    original = schedule
+    schedule_lower = schedule.lower()
+    
+    # "every X" pattern → recurring interval
+    if schedule_lower.startswith("every "):
+        duration_str = schedule[6:].strip()
+        minutes = parse_duration(duration_str)
+        return {
+            "kind": "interval",
+            "minutes": minutes,
+            "display": f"every {minutes}m"
+        }
+    
+    # Check for cron expression (5 or 6 space-separated fields)
+    # Cron fields: minute hour day month weekday [year]
+    parts = schedule.split()
+    if len(parts) >= 5 and all(
+        re.match(r'^[\d\*\-,/]+$', p) for p in parts[:5]
+    ):
+        if not HAS_CRONITER:
+            raise ValueError("Cron expressions require 'croniter' package. Install with: pip install croniter")
+        # Validate cron expression
+        try:
+            croniter(schedule)
+        except Exception as e:
+            raise ValueError(f"Invalid cron expression '{schedule}': {e}")
+        return {
+            "kind": "cron",
+            "expr": schedule,
+            "display": schedule
+        }
+    
+    # ISO timestamp (contains T or looks like date)
+    if 'T' in schedule or re.match(r'^\d{4}-\d{2}-\d{2}', schedule):
+        try:
+            # Parse and validate
+            dt = datetime.fromisoformat(schedule.replace('Z', '+00:00'))
+            return {
+                "kind": "once",
+                "run_at": dt.isoformat(),
+                "display": f"once at {dt.strftime('%Y-%m-%d %H:%M')}"
+            }
+        except ValueError as e:
+            raise ValueError(f"Invalid timestamp '{schedule}': {e}")
+    
+    # Duration like "30m", "2h", "1d" → one-shot from now
+    try:
+        minutes = parse_duration(schedule)
+        run_at = datetime.now() + timedelta(minutes=minutes)
+        return {
+            "kind": "once",
+            "run_at": run_at.isoformat(),
+            "display": f"once in {original}"
+        }
+    except ValueError:
+        pass
+    
+    raise ValueError(
+        f"Invalid schedule '{original}'. Use:\n"
+        f"  - Duration: '30m', '2h', '1d' (one-shot)\n"
+        f"  - Interval: 'every 30m', 'every 2h' (recurring)\n"
+        f"  - Cron: '0 9 * * *' (cron expression)\n"
+        f"  - Timestamp: '2026-02-03T14:00:00' (one-shot at time)"
+    )
+
+
+def compute_next_run(schedule: Dict[str, Any], last_run_at: Optional[str] = None) -> Optional[str]:
+    """
+    Compute the next run time for a schedule.
+    
+    Returns ISO timestamp string, or None if no more runs.
+    """
+    now = datetime.now()
+    
+    if schedule["kind"] == "once":
+        run_at = datetime.fromisoformat(schedule["run_at"])
+        # If in the future, return it; if in the past, no more runs
+        return schedule["run_at"] if run_at > now else None
+    
+    elif schedule["kind"] == "interval":
+        minutes = schedule["minutes"]
+        if last_run_at:
+            # Next run is last_run + interval
+            last = datetime.fromisoformat(last_run_at)
+            next_run = last + timedelta(minutes=minutes)
+        else:
+            # First run is now + interval
+            next_run = now + timedelta(minutes=minutes)
+        return next_run.isoformat()
+    
+    elif schedule["kind"] == "cron":
+        if not HAS_CRONITER:
+            return None
+        cron = croniter(schedule["expr"], now)
+        next_run = cron.get_next(datetime)
+        return next_run.isoformat()
+    
+    return None
+
+
+# =============================================================================
+# Job CRUD Operations
+# =============================================================================
+
+def load_jobs() -> List[Dict[str, Any]]:
+    """Load all jobs from storage."""
+    ensure_dirs()
+    if not JOBS_FILE.exists():
+        return []
+    
+    try:
+        with open(JOBS_FILE, 'r', encoding='utf-8') as f:
+            data = json.load(f)
+            return data.get("jobs", [])
+    except (json.JSONDecodeError, IOError):
+        return []
+
+
+def save_jobs(jobs: List[Dict[str, Any]]):
+    """Save all jobs to storage."""
+    ensure_dirs()
+    with open(JOBS_FILE, 'w', encoding='utf-8') as f:
+        json.dump({"jobs": jobs, "updated_at": datetime.now().isoformat()}, f, indent=2)
+
+
+def create_job(
+    prompt: str,
+    schedule: str,
+    name: Optional[str] = None,
+    repeat: Optional[int] = None,
+    deliver: Optional[str] = None,
+    origin: Optional[Dict[str, Any]] = None
+) -> Dict[str, Any]:
+    """
+    Create a new cron job.
+    
+    Args:
+        prompt: The prompt to run (must be self-contained)
+        schedule: Schedule string (see parse_schedule)
+        name: Optional friendly name
+        repeat: How many times to run (None = forever, 1 = once)
+        deliver: Where to deliver output ("origin", "local", "telegram", etc.)
+        origin: Source info where job was created (for "origin" delivery)
+    
+    Returns:
+        The created job dict
+    """
+    parsed_schedule = parse_schedule(schedule)
+    
+    # Auto-set repeat=1 for one-shot schedules if not specified
+    if parsed_schedule["kind"] == "once" and repeat is None:
+        repeat = 1
+    
+    # Default delivery to origin if available, otherwise local
+    if deliver is None:
+        deliver = "origin" if origin else "local"
+    
+    job_id = uuid.uuid4().hex[:12]
+    now = datetime.now().isoformat()
+    
+    job = {
+        "id": job_id,
+        "name": name or prompt[:50].strip(),
+        "prompt": prompt,
+        "schedule": parsed_schedule,
+        "schedule_display": parsed_schedule.get("display", schedule),
+        "repeat": {
+            "times": repeat,  # None = forever
+            "completed": 0
+        },
+        "enabled": True,
+        "created_at": now,
+        "next_run_at": compute_next_run(parsed_schedule),
+        "last_run_at": None,
+        "last_status": None,
+        "last_error": None,
+        # Delivery configuration
+        "deliver": deliver,
+        "origin": origin,  # Tracks where job was created for "origin" delivery
+    }
+    
+    jobs = load_jobs()
+    jobs.append(job)
+    save_jobs(jobs)
+    
+    return job
+
+
+def get_job(job_id: str) -> Optional[Dict[str, Any]]:
+    """Get a job by ID."""
+    jobs = load_jobs()
+    for job in jobs:
+        if job["id"] == job_id:
+            return job
+    return None
+
+
+def list_jobs(include_disabled: bool = False) -> List[Dict[str, Any]]:
+    """List all jobs, optionally including disabled ones."""
+    jobs = load_jobs()
+    if not include_disabled:
+        jobs = [j for j in jobs if j.get("enabled", True)]
+    return jobs
+
+
+def update_job(job_id: str, updates: Dict[str, Any]) -> Optional[Dict[str, Any]]:
+    """Update a job by ID."""
+    jobs = load_jobs()
+    for i, job in enumerate(jobs):
+        if job["id"] == job_id:
+            jobs[i] = {**job, **updates}
+            save_jobs(jobs)
+            return jobs[i]
+    return None
+
+
+def remove_job(job_id: str) -> bool:
+    """Remove a job by ID."""
+    jobs = load_jobs()
+    original_len = len(jobs)
+    jobs = [j for j in jobs if j["id"] != job_id]
+    if len(jobs) < original_len:
+        save_jobs(jobs)
+        return True
+    return False
+
+
+def mark_job_run(job_id: str, success: bool, error: Optional[str] = None):
+    """
+    Mark a job as having been run.
+    
+    Updates last_run_at, last_status, increments completed count,
+    computes next_run_at, and auto-deletes if repeat limit reached.
+    """
+    jobs = load_jobs()
+    for i, job in enumerate(jobs):
+        if job["id"] == job_id:
+            now = datetime.now().isoformat()
+            job["last_run_at"] = now
+            job["last_status"] = "ok" if success else "error"
+            job["last_error"] = error if not success else None
+            
+            # Increment completed count
+            if job.get("repeat"):
+                job["repeat"]["completed"] = job["repeat"].get("completed", 0) + 1
+                
+                # Check if we've hit the repeat limit
+                times = job["repeat"].get("times")
+                completed = job["repeat"]["completed"]
+                if times is not None and completed >= times:
+                    # Remove the job (limit reached)
+                    jobs.pop(i)
+                    save_jobs(jobs)
+                    return
+            
+            # Compute next run
+            job["next_run_at"] = compute_next_run(job["schedule"], now)
+            
+            # If no next run (one-shot completed), disable
+            if job["next_run_at"] is None:
+                job["enabled"] = False
+            
+            save_jobs(jobs)
+            return
+    
+    save_jobs(jobs)
+
+
+def get_due_jobs() -> List[Dict[str, Any]]:
+    """Get all jobs that are due to run now."""
+    now = datetime.now()
+    jobs = load_jobs()
+    due = []
+    
+    for job in jobs:
+        if not job.get("enabled", True):
+            continue
+        
+        next_run = job.get("next_run_at")
+        if not next_run:
+            continue
+        
+        next_run_dt = datetime.fromisoformat(next_run)
+        if next_run_dt <= now:
+            due.append(job)
+    
+    return due
+
+
+def save_job_output(job_id: str, output: str):
+    """Save job output to file."""
+    ensure_dirs()
+    job_output_dir = OUTPUT_DIR / job_id
+    job_output_dir.mkdir(parents=True, exist_ok=True)
+    
+    timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
+    output_file = job_output_dir / f"{timestamp}.md"
+    
+    with open(output_file, 'w', encoding='utf-8') as f:
+        f.write(output)
+    
+    return output_file
--- a/cron/scheduler.py
+++ b/cron/scheduler.py
@@ -0,0 +1,188 @@
+"""
+Cron job scheduler - executes due jobs.
+
+This module provides:
+- tick(): Run all due jobs once (for system cron integration)
+- run_daemon(): Run continuously, checking every 60 seconds
+"""
+
+import os
+import sys
+import time
+import traceback
+from datetime import datetime
+from pathlib import Path
+from typing import Optional
+
+# Add parent directory to path for imports
+sys.path.insert(0, str(Path(__file__).parent.parent))
+
+from cron.jobs import get_due_jobs, mark_job_run, save_job_output
+
+
+def run_job(job: dict) -> tuple[bool, str, Optional[str]]:
+    """
+    Execute a single cron job.
+    
+    Returns:
+        Tuple of (success, output, error_message)
+    """
+    from run_agent import AIAgent
+    
+    job_id = job["id"]
+    job_name = job["name"]
+    prompt = job["prompt"]
+    
+    print(f"[cron] Running job '{job_name}' (ID: {job_id})")
+    print(f"[cron] Prompt: {prompt[:100]}{'...' if len(prompt) > 100 else ''}")
+    
+    try:
+        # Create agent with default settings
+        # Jobs run in isolated sessions (no prior context)
+        agent = AIAgent(
+            model=os.getenv("HERMES_MODEL", "anthropic/claude-opus-4.6"),
+            quiet_mode=True,
+            session_id=f"cron_{job_id}_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
+        )
+        
+        # Run the conversation
+        result = agent.run_conversation(prompt)
+        
+        # Extract final response
+        final_response = result.get("final_response", "")
+        if not final_response:
+            final_response = "(No response generated)"
+        
+        # Build output document
+        output = f"""# Cron Job: {job_name}
+
+**Job ID:** {job_id}
+**Run Time:** {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}
+**Schedule:** {job.get('schedule_display', 'N/A')}
+
+## Prompt
+
+{prompt}
+
+## Response
+
+{final_response}
+"""
+        
+        print(f"[cron] Job '{job_name}' completed successfully")
+        return True, output, None
+        
+    except Exception as e:
+        error_msg = f"{type(e).__name__}: {str(e)}"
+        print(f"[cron] Job '{job_name}' failed: {error_msg}")
+        
+        # Build error output
+        output = f"""# Cron Job: {job_name} (FAILED)
+
+**Job ID:** {job_id}
+**Run Time:** {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}
+**Schedule:** {job.get('schedule_display', 'N/A')}
+
+## Prompt
+
+{prompt}
+
+## Error
+
+```
+{error_msg}
+
+{traceback.format_exc()}
+```
+"""
+        return False, output, error_msg
+
+
+def tick(verbose: bool = True) -> int:
+    """
+    Check and run all due jobs.
+    
+    This is designed to be called by system cron every minute:
+        */1 * * * * cd ~/hermes-agent && python -c "from cron import tick; tick()"
+    
+    Args:
+        verbose: Whether to print status messages
+    
+    Returns:
+        Number of jobs executed
+    """
+    due_jobs = get_due_jobs()
+    
+    if verbose and not due_jobs:
+        print(f"[cron] {datetime.now().strftime('%H:%M:%S')} - No jobs due")
+        return 0
+    
+    if verbose:
+        print(f"[cron] {datetime.now().strftime('%H:%M:%S')} - {len(due_jobs)} job(s) due")
+    
+    executed = 0
+    for job in due_jobs:
+        try:
+            success, output, error = run_job(job)
+            
+            # Save output to file
+            output_file = save_job_output(job["id"], output)
+            if verbose:
+                print(f"[cron] Output saved to: {output_file}")
+            
+            # Mark job as run (handles repeat counting, next_run computation)
+            mark_job_run(job["id"], success, error)
+            executed += 1
+            
+        except Exception as e:
+            print(f"[cron] Error processing job {job['id']}: {e}")
+            mark_job_run(job["id"], False, str(e))
+    
+    return executed
+
+
+def run_daemon(check_interval: int = 60, verbose: bool = True):
+    """
+    Run the cron daemon continuously.
+    
+    Checks for due jobs every `check_interval` seconds.
+    
+    Args:
+        check_interval: Seconds between checks (default: 60)
+        verbose: Whether to print status messages
+    """
+    print(f"[cron] Starting daemon (checking every {check_interval}s)")
+    print(f"[cron] Press Ctrl+C to stop")
+    print()
+    
+    try:
+        while True:
+            try:
+                tick(verbose=verbose)
+            except Exception as e:
+                print(f"[cron] Tick error: {e}")
+            
+            time.sleep(check_interval)
+            
+    except KeyboardInterrupt:
+        print("\n[cron] Daemon stopped")
+
+
+if __name__ == "__main__":
+    # Allow running directly: python cron/scheduler.py [daemon|tick]
+    import argparse
+    
+    parser = argparse.ArgumentParser(description="Hermes Cron Scheduler")
+    parser.add_argument("mode", choices=["daemon", "tick"], default="tick", nargs="?",
+                        help="Mode: 'tick' to run once, 'daemon' to run continuously")
+    parser.add_argument("--interval", type=int, default=60,
+                        help="Check interval in seconds for daemon mode")
+    parser.add_argument("--quiet", "-q", action="store_true",
+                        help="Suppress status messages")
+    
+    args = parser.parse_args()
+    
+    if args.mode == "daemon":
+        run_daemon(check_interval=args.interval, verbose=not args.quiet)
+    else:
+        tick(verbose=not args.quiet)
--- a/docs/cli.md
+++ b/docs/cli.md
@@ -117,6 +117,29 @@ terminal:
  modal_image: "python:3.11"
 ```

+### Sudo Support
+
+The CLI supports interactive sudo prompts:
+
+```
+┌──────────────────────────────────────────────────────────┐
+│  🔐 SUDO PASSWORD REQUIRED                               │
+├──────────────────────────────────────────────────────────┤
+│  Enter password below (input is hidden), or:             │
+│    • Press Enter to skip (command fails gracefully)      │
+│    • Wait 45s to auto-skip                               │
+└──────────────────────────────────────────────────────────┘
+
+  Password (hidden): 
+```
+
+**Options:**
+- **Interactive**: Leave `sudo_password` unset - you'll be prompted when needed
+- **Configured**: Set `sudo_password` in `cli-config.yaml` to auto-fill
+- **Environment**: Set `SUDO_PASSWORD` in `.env` for all runs
+
+Password is cached for the session once entered.
+
 ### Toolsets

 Control which tools are available:
@@ -202,6 +225,62 @@ This allows you to have different terminal configs for CLI vs batch processing.
 - **History**: Command history is saved to `~/.hermes_history`
 - **Conversations**: Use `/save` to export conversations
 - **Reset**: Use `/clear` for full reset, `/reset` to just clear history
+- **Session Logs**: Every session automatically logs to `logs/session_{session_id}.json`
+
+### Session Logging
+
+Sessions are automatically logged to the `logs/` directory:
+
+```
+logs/
+├── session_20260201_143052_a1b2c3.json
+├── session_20260201_150217_d4e5f6.json
+└── ...
+```
+
+The session ID is displayed in the welcome banner and follows the format: `YYYYMMDD_HHMMSS_UUID`.
+
+Log files contain:
+- Full conversation history in trajectory format
+- Timestamps for session start and last update
+- Model and message count metadata
+
+This is useful for:
+- Debugging agent behavior
+- Replaying conversations
+- Training data inspection
+
+### Context Compression
+
+Long conversations can exceed model context limits. The CLI automatically compresses context when approaching the limit:
+
+```yaml
+# In cli-config.yaml
+compression:
+  enabled: true                    # Enable auto-compression
+  threshold: 0.85                  # Compress at 85% of context limit  
+  summary_model: "google/gemini-2.0-flash-001"
+```
+
+**How it works:**
+1. Tracks actual token usage from each API response
+2. When tokens reach threshold, middle turns are summarized
+3. First 3 and last 4 turns are always protected
+4. Conversation continues seamlessly after compression
+
+**When compression triggers:**
+```
+📦 Context compression triggered (170,000 tokens ≥ 170,000 threshold)
+   📊 Model context limit: 200,000 tokens (85% = 170,000)
+   🗜️  Summarizing turns 4-15 (12 turns)
+   ✅ Compressed: 20 → 9 messages (~45,000 tokens saved)
+```
+
+To disable compression:
+```yaml
+compression:
+  enabled: false
+```

 ## Quiet Mode

--- a/docs/messaging.md
+++ b/docs/messaging.md
@@ -0,0 +1,547 @@
+# Messaging Platform Integrations (Gateway)
+
+Hermes Agent can connect to messaging platforms like Telegram, Discord, and WhatsApp to serve as a conversational AI assistant.
+
+## Quick Start
+
+```bash
+# 1. Set your bot token(s) in .env file
+echo 'TELEGRAM_BOT_TOKEN="your_telegram_bot_token"' >> .env
+echo 'DISCORD_BOT_TOKEN="your_discord_bot_token"' >> .env
+
+# 2. Test the gateway (foreground)
+./scripts/hermes-gateway run
+
+# 3. Install as a system service (runs in background)
+./scripts/hermes-gateway install
+
+# 4. Manage the service
+./scripts/hermes-gateway start
+./scripts/hermes-gateway stop
+./scripts/hermes-gateway restart
+./scripts/hermes-gateway status
+```
+
+**Quick test (without service install):**
+```bash
+python cli.py --gateway  # Runs in foreground, useful for debugging
+```
+
+## Architecture Overview
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                      Hermes Gateway                             │
+├─────────────────────────────────────────────────────────────────┤
+│                                                                 │
+│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐          │
+│  │   Telegram   │  │   Discord    │  │   WhatsApp   │          │
+│  │   Adapter    │  │   Adapter    │  │   Adapter    │          │
+│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘          │
+│         │                 │                 │                   │
+│         └─────────────────┼─────────────────┘                   │
+│                           │                                     │
+│                  ┌────────▼────────┐                            │
+│                  │  Session Store  │                            │
+│                  │  (per-chat)     │                            │
+│                  └────────┬────────┘                            │
+│                           │                                     │
+│                  ┌────────▼────────┐                            │
+│                  │   AIAgent       │                            │
+│                  │   (run_agent)   │                            │
+│                  └─────────────────┘                            │
+│                                                                 │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+## Session Management
+
+### Session Persistence
+
+Sessions persist across messages until they reset. The agent remembers your conversation context.
+
+### Reset Policies
+
+Sessions reset based on configurable policies:
+
+| Policy | Default | Description |
+|--------|---------|-------------|
+| Daily | 4:00 AM | Reset at a specific hour each day |
+| Idle | 120 min | Reset after N minutes of inactivity |
+| Both | (combined) | Whichever triggers first |
+
+### Manual Reset
+
+Send `/new` or `/reset` as a message to start fresh.
+
+### Per-Platform Overrides
+
+Configure different reset policies per platform:
+
+```json
+{
+  "reset_by_platform": {
+    "telegram": { "mode": "idle", "idle_minutes": 240 },
+    "discord": { "mode": "idle", "idle_minutes": 60 }
+  }
+}
+```
+
+## Platform Setup
+
+### Telegram
+
+1. **Create a bot** via [@BotFather](https://t.me/BotFather)
+2. **Get your token** (looks like `123456789:ABCdefGHIjklMNOpqrsTUVwxyz`)
+3. **Set environment variable:**
+   ```bash
+   export TELEGRAM_BOT_TOKEN="your_token_here"
+   ```
+4. **Optional: Set home channel** for cron job delivery:
+   ```bash
+   export TELEGRAM_HOME_CHANNEL="-1001234567890"
+   export TELEGRAM_HOME_CHANNEL_NAME="My Notes"
+   ```
+
+**Requirements:**
+```bash
+pip install python-telegram-bot>=20.0
+```
+
+### Discord
+
+1. **Create an application** at [Discord Developer Portal](https://discord.com/developers/applications)
+2. **Create a bot** under your application
+3. **Get the bot token**
+4. **Enable required intents:**
+   - Message Content Intent
+   - Server Members Intent (optional)
+5. **Invite to your server** using OAuth2 URL generator (scopes: `bot`, `applications.commands`)
+6. **Set environment variable:**
+   ```bash
+   export DISCORD_BOT_TOKEN="your_token_here"
+   ```
+7. **Optional: Set home channel:**
+   ```bash
+   export DISCORD_HOME_CHANNEL="123456789012345678"
+   export DISCORD_HOME_CHANNEL_NAME="#bot-updates"
+   ```
+
+**Requirements:**
+```bash
+pip install discord.py>=2.0
+```
+
+### WhatsApp
+
+WhatsApp integration is more complex due to the lack of a simple bot API.
+
+**Options:**
+1. **WhatsApp Business API** (requires Meta verification)
+2. **whatsapp-web.js** via Node.js bridge (for personal accounts)
+
+**Bridge Setup:**
+1. Install Node.js
+2. Set up the bridge script (see `scripts/whatsapp-bridge/` for reference)
+3. Configure in gateway:
+   ```json
+   {
+     "platforms": {
+       "whatsapp": {
+         "enabled": true,
+         "extra": {
+           "bridge_script": "/path/to/bridge.js",
+           "bridge_port": 3000
+         }
+       }
+     }
+   }
+   ```
+
+## Configuration
+
+There are **three ways** to configure the gateway (in order of precedence):
+
+### 1. Environment Variables (`.env` file) - Recommended for Quick Setup
+
+Add to your `~/.hermes/.env` file:
+
+```bash
+# =============================================================================
+# MESSAGING PLATFORM TOKENS
+# =============================================================================
+
+# Telegram - get from @BotFather on Telegram
+TELEGRAM_BOT_TOKEN=your_telegram_bot_token
+TELEGRAM_ALLOWED_USERS=123456789,987654321    # Security: restrict to these user IDs
+
+# Optional: Default channel for cron job delivery
+TELEGRAM_HOME_CHANNEL=-1001234567890
+TELEGRAM_HOME_CHANNEL_NAME="My Notes"
+
+# Discord - get from Discord Developer Portal
+DISCORD_BOT_TOKEN=your_discord_bot_token
+DISCORD_ALLOWED_USERS=123456789012345678      # Security: restrict to these user IDs
+
+# Optional: Default channel for cron job delivery
+DISCORD_HOME_CHANNEL=123456789012345678
+DISCORD_HOME_CHANNEL_NAME="#bot-updates"
+
+# WhatsApp - requires Node.js bridge setup
+WHATSAPP_ENABLED=true
+
+# =============================================================================
+# AGENT SETTINGS
+# =============================================================================
+
+# Max tool-calling iterations per conversation (default: 60)
+HERMES_MAX_ITERATIONS=60
+
+# Working directory for terminal commands (default: home ~)
+MESSAGING_CWD=/home/myuser
+
+# =============================================================================
+# TOOL PROGRESS NOTIFICATIONS
+# =============================================================================
+
+# Show progress messages as agent uses tools
+HERMES_TOOL_PROGRESS=true
+
+# Mode: "new" (only when tool changes) or "all" (every tool call)
+HERMES_TOOL_PROGRESS_MODE=new
+
+# =============================================================================
+# SESSION SETTINGS
+# =============================================================================
+
+# Reset sessions after N minutes of inactivity (default: 120)
+SESSION_IDLE_MINUTES=120
+
+# Daily reset hour in 24h format (default: 4 = 4am)
+SESSION_RESET_HOUR=4
+```
+
+### 2. Gateway Config File (`~/.hermes/gateway.json`) - Full Control
+
+For advanced configuration, create `~/.hermes/gateway.json`:
+
+```json
+{
+  "platforms": {
+    "telegram": {
+      "enabled": true,
+      "token": "your_telegram_token",
+      "home_channel": {
+        "platform": "telegram",
+        "chat_id": "-1001234567890",
+        "name": "My Notes"
+      }
+    },
+    "discord": {
+      "enabled": true,
+      "token": "your_discord_token",
+      "home_channel": {
+        "platform": "discord",
+        "chat_id": "123456789012345678",
+        "name": "#bot-updates"
+      }
+    }
+  },
+  "default_reset_policy": {
+    "mode": "both",
+    "at_hour": 4,
+    "idle_minutes": 120
+  },
+  "reset_by_platform": {
+    "discord": {
+      "mode": "idle",
+      "idle_minutes": 60
+    }
+  },
+  "always_log_local": true
+}
+```
+
+## Platform-Specific Toolsets
+
+Each platform has its own toolset for security:
+
+| Platform | Toolset | Capabilities |
+|----------|---------|--------------|
+| CLI | `hermes-cli` | Full access (terminal, browser, etc.) |
+| Telegram | `hermes-telegram` | Full tools including terminal |
+| Discord | `hermes-discord` | Full tools including terminal |
+| WhatsApp | `hermes-whatsapp` | Full tools including terminal |
+
+## User Experience Features
+
+### Typing Indicator
+
+The gateway keeps the "typing..." indicator active throughout processing, refreshing every 4 seconds. This lets users know the bot is working even during long tool-calling sequences.
+
+### Tool Progress Notifications
+
+When `HERMES_TOOL_PROGRESS=true`, the bot sends status messages as it works:
+
+```
+💻 `ls -la`...
+🔍 web_search...
+📄 web_extract...
+🎨 image_generate...
+```
+
+Terminal commands show the actual command (truncated to 50 chars). Other tools just show the tool name.
+
+**Modes:**
+- `new`: Only sends message when switching to a different tool (less spam)
+- `all`: Sends message for every single tool call
+
+### Working Directory
+
+- **CLI (`hermes` command)**: Uses current directory where you run the command
+- **Messaging**: Uses `MESSAGING_CWD` (default: home directory `~`)
+
+This is intentional: CLI users are in a terminal and expect the agent to work in their current directory, while messaging users need a consistent starting location.
+
+### Max Iterations
+
+If the agent hits the max iteration limit while working, instead of a generic error, it asks the model to summarize what it found so far. This gives you a useful response even when the task couldn't be fully completed.
+
+## Voice Messages (TTS)
+
+The `text_to_speech` tool generates audio that the gateway delivers as native voice messages on each platform:
+
+| Platform | Delivery | Format |
+|----------|----------|--------|
+| Telegram | Voice bubble (plays inline) | Opus `.ogg` — native from OpenAI/ElevenLabs, converted via ffmpeg for Edge TTS |
+| Discord | Audio file attachment | MP3 |
+| WhatsApp | Audio file attachment | MP3 |
+| CLI | Saved to `~/voice-memos/` | MP3 |
+
+**Providers:**
+- **Edge TTS** (default) — Free, no API key, 322 voices in 74 languages
+- **ElevenLabs** — Premium quality, requires `ELEVENLABS_API_KEY`
+- **OpenAI TTS** — Good quality, requires `OPENAI_API_KEY`
+
+Voice and provider are configured by the user in `~/.hermes/config.yaml` under the `tts:` key. The model only sends text; it does not choose the voice.
+
+The tool returns a `MEDIA:<path>` tag that the gateway send pipeline intercepts and delivers as a native audio message. If `[[audio_as_voice]]` is present (Opus format available), Telegram sends it as a voice bubble instead of an audio file.
+
+**Telegram voice bubbles & ffmpeg:**
+
+Telegram requires Opus/OGG format for native voice bubbles (the round, inline-playable kind). **OpenAI and ElevenLabs** produce Opus natively when on Telegram — no extra setup needed. **Edge TTS** (the default free provider) outputs MP3 and needs `ffmpeg` to convert:
+
+```bash
+sudo apt install ffmpeg    # Ubuntu/Debian
+brew install ffmpeg         # macOS
+sudo dnf install ffmpeg     # Fedora
+```
+
+Without ffmpeg, Edge TTS audio is sent as a regular audio file (still playable, but shows as a rectangular music player instead of a voice bubble).
+
+## Cron Job Delivery
+
+When scheduling cron jobs, you can specify where the output should be delivered:
+
+```
+User: "Remind me to check the server in 30 minutes"
+
+Agent uses: schedule_cronjob(
+  prompt="Check server status...",
+  schedule="30m",
+  deliver="origin"  # Back to this chat
+)
+```
+
+### Delivery Options
+
+| Option | Description |
+|--------|-------------|
+| `"origin"` | Back to where the job was created |
+| `"local"` | Save to local files only |
+| `"telegram"` | Telegram home channel |
+| `"discord"` | Discord home channel |
+| `"telegram:123456"` | Specific Telegram chat |
+
+## Dynamic Context Injection
+
+The agent knows where it is via injected context:
+
+```
+## Current Session Context
+
+**Source:** Telegram (group: Dev Team, ID: -1001234567890)
+**Connected Platforms:** local, telegram, discord
+
+**Home Channels:**
+  - telegram: My Notes (ID: -1001234567890)
+  - discord: #bot-updates (ID: 123456789012345678)
+
+**Delivery options for scheduled tasks:**
+- "origin" → Back to this chat (Dev Team)
+- "local" → Save to local files only
+- "telegram" → Home channel (My Notes)
+- "discord" → Home channel (#bot-updates)
+```
+
+## CLI Commands
+
+| Command | Description |
+|---------|-------------|
+| `/platforms` | Show gateway configuration and status |
+| `--gateway` | Start the gateway (CLI flag) |
+
+## Troubleshooting
+
+### "python-telegram-bot not installed"
+
+```bash
+pip install python-telegram-bot>=20.0
+```
+
+### "discord.py not installed"
+
+```bash
+pip install discord.py>=2.0
+```
+
+### "No platforms connected"
+
+1. Check your environment variables are set
+2. Check your tokens are valid
+3. Try `/platforms` to see configuration status
+
+### Session not persisting
+
+1. Check `~/.hermes/sessions/` exists
+2. Check session policies aren't too aggressive
+3. Verify no errors in gateway logs
+
+## Adding a New Platform
+
+To add a new messaging platform:
+
+### 1. Create the adapter
+
+Create `gateway/platforms/your_platform.py`:
+
+```python
+from gateway.platforms.base import BasePlatformAdapter, MessageEvent, SendResult
+from gateway.config import Platform, PlatformConfig
+
+class YourPlatformAdapter(BasePlatformAdapter):
+    def __init__(self, config: PlatformConfig):
+        super().__init__(config, Platform.YOUR_PLATFORM)
+    
+    async def connect(self) -> bool:
+        # Connect to the platform
+        ...
+    
+    async def disconnect(self) -> None:
+        # Disconnect
+        ...
+    
+    async def send(self, chat_id: str, content: str, ...) -> SendResult:
+        # Send a message
+        ...
+    
+    async def get_chat_info(self, chat_id: str) -> Dict[str, Any]:
+        # Get chat information
+        ...
+```
+
+### 2. Register the platform
+
+Add to `gateway/config.py`:
+
+```python
+class Platform(Enum):
+    # ... existing ...
+    YOUR_PLATFORM = "your_platform"
+```
+
+### 3. Add to gateway runner
+
+Update `gateway/run.py` `_create_adapter()`:
+
+```python
+elif platform == Platform.YOUR_PLATFORM:
+    from gateway.platforms.your_platform import YourPlatformAdapter
+    return YourPlatformAdapter(config)
+```
+
+### 4. Create a toolset (optional)
+
+Add to `toolsets.py`:
+
+```python
+"hermes-your-platform": {
+    "description": "Your platform toolset",
+    "tools": [...],
+    "includes": []
+}
+```
+
+### 5. Configure
+
+Add environment variables to `.env`:
+
+```bash
+YOUR_PLATFORM_TOKEN=...
+YOUR_PLATFORM_HOME_CHANNEL=...
+```
+
+## Service Management
+
+### Linux (systemd)
+
+```bash
+# Install as user service
+./scripts/hermes-gateway install
+
+# Manage
+systemctl --user start hermes-gateway
+systemctl --user stop hermes-gateway
+systemctl --user restart hermes-gateway
+systemctl --user status hermes-gateway
+
+# View logs
+journalctl --user -u hermes-gateway -f
+
+# Enable lingering (keeps running after logout)
+sudo loginctl enable-linger $USER
+```
+
+### macOS (launchd)
+
+```bash
+# Install
+./scripts/hermes-gateway install
+
+# Manage
+launchctl start ai.hermes.gateway
+launchctl stop ai.hermes.gateway
+
+# View logs
+tail -f ~/.hermes/logs/gateway.log
+```
+
+### Manual (any platform)
+
+```bash
+# Run in foreground (for testing/debugging)
+./scripts/hermes-gateway run
+
+# Or via CLI (also foreground)
+python cli.py --gateway
+```
+
+## Storage Locations
+
+| Path | Purpose |
+|------|---------|
+| `~/.hermes/gateway.json` | Gateway configuration |
+| `~/.hermes/sessions/sessions.json` | Session index |
+| `~/.hermes/sessions/{id}.jsonl` | Conversation transcripts |
+| `~/.hermes/cron/output/` | Cron job outputs |
+| `~/.hermes/logs/gateway.log` | Gateway logs (macOS launchd) |
--- a/docs/tools.md
+++ b/docs/tools.md
@@ -40,11 +40,15 @@ async def web_search(query: str) -> dict:
 |----------|--------|-------|
 | **Web** | `web_tools.py` | `web_search`, `web_extract`, `web_crawl` |
 | **Terminal** | `terminal_tool.py` | `terminal` (local/docker/singularity/modal/ssh backends) |
+| **File** | `file_tools.py` | `read_file`, `write_file`, `patch`, `search` |
 | **Browser** | `browser_tool.py` | `browser_navigate`, `browser_click`, `browser_type`, etc. |
 | **Vision** | `vision_tools.py` | `vision_analyze` |
 | **Image Gen** | `image_generation_tool.py` | `image_generate` |
+| **TTS** | `tts_tool.py` | `text_to_speech` (Edge TTS free / ElevenLabs / OpenAI) |
 | **Reasoning** | `mixture_of_agents_tool.py` | `mixture_of_agents` |
-| **Skills** | `skills_tool.py` | `skills_categories`, `skills_list`, `skill_view` |
+| **Skills** | `skills_tool.py` | `skills_list`, `skill_view` |
+| **Cronjob** | `cronjob_tools.py` | `schedule_cronjob`, `list_cronjobs`, `remove_cronjob` |
+| **RL Training** | `rl_training_tool.py` | `rl_list_environments`, `rl_start_training`, `rl_check_status`, etc. |

 ## Tool Registration

--- a/environments/README.md
+++ b/environments/README.md
@@ -0,0 +1,330 @@
+# Hermes-Agent Atropos Environments
+
+This directory contains the integration layer between **hermes-agent's** tool-calling capabilities and the **Atropos** RL training framework. It provides everything needed to run agentic LLMs through multi-turn tool-calling loops, score their output with arbitrary reward functions, and feed results into Atropos for training or evaluation.
+
+## Architecture Overview
+
+```
+                        Atropos Framework
+                    ┌───────────────────────┐
+                    │       BaseEnv          │  (atroposlib)
+                    │  - Server management   │
+                    │  - Worker scheduling   │
+                    │  - Wandb logging       │
+                    │  - CLI (serve/process/ │
+                    │    evaluate)           │
+                    └───────────┬───────────┘
+                                │ inherits
+                    ┌───────────┴───────────┐
+                    │  HermesAgentBaseEnv    │  hermes_base_env.py
+                    │  - Terminal backend    │
+                    │  - Tool resolution     │
+                    │  - Agent loop          │
+                    │  - ToolContext          │
+                    │  - Async patches       │
+                    └───────────┬───────────┘
+                                │ inherits
+              ┌─────────────────┼─────────────────┐
+              │                 │                  │
+     TerminalTestEnv     HermesSweEnv    TerminalBench2EvalEnv
+     (stack testing)     (SWE training)   (TB2 benchmark eval)
+```
+
+### Inheritance Chain
+
+**BaseEnv** (from `atroposlib`) is the Atropos base class. It provides:
+- Server management (OpenAI-compatible API servers, VLLM, SGLang)
+- Worker scheduling for parallel rollouts
+- Wandb integration for metrics and rollout logging
+- CLI interface with three subcommands: `serve`, `process`, `evaluate`
+- `evaluate_log()` for saving eval results to JSON + samples.jsonl
+
+**HermesAgentBaseEnv** (`hermes_base_env.py`) extends BaseEnv with hermes-agent specifics:
+- Sets `os.environ["TERMINAL_ENV"]` to configure the terminal backend (local, docker, modal, ssh, singularity)
+- Resolves hermes-agent toolsets via `_resolve_tools_for_group()` (calls `get_tool_definitions()` from `model_tools.py`)
+- Implements `collect_trajectory()` which runs the full agent loop and computes rewards
+- Supports two-phase operation (Phase 1: OpenAI server, Phase 2: VLLM ManagedServer)
+- Applies monkey patches for async-safe tool operation at import time
+
+Concrete environments inherit from `HermesAgentBaseEnv` and implement:
+- `setup()` -- Load dataset, initialize state
+- `get_next_item()` -- Return the next item for rollout
+- `format_prompt()` -- Convert a dataset item into the user message
+- `compute_reward()` -- Score the rollout using ToolContext
+- `evaluate()` -- Periodic evaluation logic
+
+## Core Components
+
+### Agent Loop (`agent_loop.py`)
+
+`HermesAgentLoop` is the reusable multi-turn agent engine. It runs the same pattern as hermes-agent's `run_agent.py`:
+
+1. Send messages + tools to the API via `server.chat_completion()`
+2. If the response contains `tool_calls`, execute each one via `handle_function_call()` from `model_tools.py`
+3. Append tool results to the conversation and go back to step 1
+4. If the response has no tool_calls, the agent is done
+
+Tool calls are executed in a thread pool (`run_in_executor`) so backends that use `asyncio.run()` internally (Modal, Docker) don't deadlock inside Atropos's event loop.
+
+Returns an `AgentResult` containing the full conversation history, turn count, reasoning content per turn, tool errors, and optional ManagedServer state (for Phase 2).
+
+### Tool Context (`tool_context.py`)
+
+`ToolContext` is a per-rollout handle that gives reward/verification functions direct access to **all** hermes-agent tools, scoped to the rollout's `task_id`. The same `task_id` means the terminal/browser session is the SAME one the model used during its rollout -- all state (files, processes, browser tabs) is preserved.
+
+```python
+async def compute_reward(self, item, result, ctx: ToolContext):
+    # Run tests in the model's terminal sandbox
+    test = ctx.terminal("pytest -v")
+    if test["exit_code"] == 0:
+        return 1.0
+
+    # Check if a file was created
+    content = ctx.read_file("/workspace/solution.py")
+    if content.get("content"):
+        return 0.5
+
+    # Download files locally for verification (binary-safe)
+    ctx.download_file("/remote/output.bin", "/local/output.bin")
+
+    return 0.0
+```
+
+Available methods:
+- **Terminal**: `terminal(command, timeout)` -- run shell commands
+- **Files**: `read_file(path)`, `write_file(path, content)`, `search(query, path)`
+- **Transfers**: `upload_file()`, `upload_dir()`, `download_file()`, `download_dir()` -- binary-safe file transfers between host and sandbox
+- **Web**: `web_search(query)`, `web_extract(urls)`
+- **Browser**: `browser_navigate(url)`, `browser_snapshot()`
+- **Generic**: `call_tool(name, args)` -- call any hermes-agent tool by name
+- **Cleanup**: `cleanup()` -- release all resources (called automatically after `compute_reward`)
+
+### Patches (`patches.py`)
+
+**Problem**: Some hermes-agent tools use `asyncio.run()` internally (e.g., mini-swe-agent's Modal backend via SWE-ReX). This crashes when called from inside Atropos's event loop because `asyncio.run()` cannot be nested.
+
+**Solution**: `patches.py` monkey-patches `SwerexModalEnvironment` to use a dedicated background thread (`_AsyncWorker`) with its own event loop. The calling code sees the same sync interface, but internally the async work happens on a separate thread that doesn't conflict with Atropos's loop.
+
+What gets patched:
+- `SwerexModalEnvironment.__init__` -- creates Modal deployment on a background thread
+- `SwerexModalEnvironment.execute` -- runs commands on the same background thread
+- `SwerexModalEnvironment.stop` -- stops deployment on the background thread
+
+The patches are:
+- **Idempotent** -- calling `apply_patches()` multiple times is safe
+- **Transparent** -- same interface and behavior, only the internal async execution changes
+- **Universal** -- works identically in normal CLI use (no running event loop)
+
+Applied automatically at import time by `hermes_base_env.py`.
+
+### Tool Call Parsers (`tool_call_parsers/`)
+
+Client-side parsers that extract structured `tool_calls` from raw model output text. Used in **Phase 2** (VLLM server type) where ManagedServer's `/generate` endpoint returns raw text without tool call parsing.
+
+Each parser is a standalone reimplementation of the corresponding VLLM parser's `extract_tool_calls()` logic. No VLLM dependency -- only standard library (`re`, `json`, `uuid`) and `openai` types.
+
+Available parsers:
+- `hermes` -- Hermes/ChatML `<tool_call>` XML format
+- `mistral` -- Mistral `[TOOL_CALLS]` format
+- `llama3_json` -- Llama 3 JSON tool calling
+- `qwen` -- Qwen tool calling format
+- `qwen3_coder` -- Qwen3 Coder format
+- `deepseek_v3` -- DeepSeek V3 format
+- `deepseek_v3_1` -- DeepSeek V3.1 format
+- `kimi_k2` -- Kimi K2 format
+- `longcat` -- Longcat format
+- `glm45` / `glm47` -- GLM model formats
+
+Usage:
+```python
+from environments.tool_call_parsers import get_parser
+
+parser = get_parser("hermes")
+content, tool_calls = parser.parse(raw_model_output)
+```
+
+In Phase 1 (OpenAI server type), these parsers are not needed -- the server handles tool call parsing natively.
+
+## Two-Phase Operation
+
+### Phase 1: OpenAI Server (Evaluation / SFT Data Generation)
+
+Uses `server.chat_completion()` with `tools=` parameter. The server (VLLM, SGLang, OpenRouter, OpenAI) handles tool call parsing natively. Returns `ChatCompletion` objects with structured `tool_calls`.
+
+- Good for: evaluation, SFT data generation, testing
+- Run with: `serve` (with `run-api`), `process`, or `evaluate` subcommands
+- Placeholder tokens are created for the Atropos pipeline
+
+### Phase 2: VLLM ManagedServer (Full RL Training)
+
+Uses ManagedServer for exact token IDs + logprobs via `/generate`. Client-side tool call parser (from `tool_call_parsers/`) reconstructs structured `tool_calls` from raw output.
+
+- Good for: full RL training with GRPO/PPO
+- Run with: `serve` subcommand
+- Real tokens, masks, and logprobs flow through the pipeline
+
+## Directory Structure
+
+```
+environments/
+├── README.md                     # This file
+├── __init__.py                   # Package exports
+├── hermes_base_env.py            # Abstract base (HermesAgentBaseEnv)
+├── agent_loop.py                 # Multi-turn agent engine (HermesAgentLoop)
+├── tool_context.py               # Per-rollout tool access for reward functions
+├── patches.py                    # Async-safety patches for Modal backend
+│
+├── tool_call_parsers/            # Phase 2 client-side parsers
+│   ├── __init__.py               # Registry + base class
+│   ├── hermes_parser.py
+│   ├── mistral_parser.py
+│   ├── llama_parser.py
+│   ├── qwen_parser.py
+│   ├── qwen3_coder_parser.py
+│   ├── deepseek_v3_parser.py
+│   ├── deepseek_v3_1_parser.py
+│   ├── kimi_k2_parser.py
+│   ├── longcat_parser.py
+│   ├── glm45_parser.py
+│   └── glm47_parser.py
+│
+├── terminal_test_env/            # Stack validation environment
+│   └── terminal_test_env.py
+│
+├── hermes_swe_env/               # SWE-bench style training environment
+│   └── hermes_swe_env.py
+│
+└── benchmarks/                   # Evaluation benchmarks
+    └── terminalbench_2/
+        └── terminalbench2_env.py
+```
+
+## Concrete Environments
+
+### TerminalTestEnv (`terminal_test_env/`)
+
+A self-contained environment with inline tasks (no external dataset needed) for validating the full stack end-to-end. Each task asks the model to create a file at a known path, and the verifier checks the content matches.
+
+```bash
+# Serve mode (needs run-api)
+run-api
+python environments/terminal_test_env/terminal_test_env.py serve
+
+# Process mode (no run-api, saves to JSONL)
+python environments/terminal_test_env/terminal_test_env.py process \
+    --env.data_path_to_save_groups terminal_test_output.jsonl
+```
+
+### HermesSweEnv (`hermes_swe_env/`)
+
+SWE-bench style training environment. The model gets a coding task, uses terminal + file + web tools to solve it, and the reward function runs tests in the same Modal sandbox.
+
+```bash
+python environments/hermes_swe_env/hermes_swe_env.py serve \
+    --openai.model_name YourModel \
+    --env.dataset_name bigcode/humanevalpack \
+    --env.terminal_backend modal
+```
+
+### TerminalBench2EvalEnv (`benchmarks/terminalbench_2/`)
+
+**Eval-only** environment for the Terminal-Bench 2.0 benchmark (89 tasks). Each task gets a pre-built Docker Hub image, a natural language instruction, and a test suite. The agent uses terminal + file tools to solve the task, then the test suite verifies correctness.
+
+Follows the standard Atropos eval pattern (like GPQA, MMLU, etc.):
+- Run via `evaluate` subcommand (no `run-api` needed)
+- `setup()` loads the dataset, `evaluate()` runs all tasks
+- `rollout_and_score_eval()` handles per-task agent loop + test verification
+- Downloads verifier output locally for reliable reward checking (Harbor pattern)
+
+```bash
+# Run full benchmark
+python environments/benchmarks/terminalbench_2/terminalbench2_env.py evaluate \
+    --openai.model_name anthropic/claude-opus-4.6
+
+# Run subset of tasks
+python environments/benchmarks/terminalbench_2/terminalbench2_env.py evaluate \
+    --openai.model_name anthropic/claude-opus-4.6 \
+    --env.task_filter fix-git,git-multibranch
+
+# Skip specific tasks
+python environments/benchmarks/terminalbench_2/terminalbench2_env.py evaluate \
+    --openai.model_name anthropic/claude-opus-4.6 \
+    --env.skip_tasks heavy-task,slow-task
+```
+
+## Creating a New Environment
+
+### Training Environment
+
+1. Create a new directory under `environments/`
+2. Create your env file inheriting from `HermesAgentBaseEnv`
+3. Implement the four abstract methods + `evaluate()`
+
+```python
+from environments.hermes_base_env import HermesAgentBaseEnv, HermesAgentEnvConfig
+
+class MyEnvConfig(HermesAgentEnvConfig):
+    pass  # Add custom fields as needed
+
+class MyEnv(HermesAgentBaseEnv):
+    name = "my-env"
+    env_config_cls = MyEnvConfig
+
+    @classmethod
+    def config_init(cls):
+        env_config = MyEnvConfig(
+            enabled_toolsets=["terminal", "file"],
+            terminal_backend="modal",
+            # ... other config
+        )
+        server_configs = [APIServerConfig(...)]
+        return env_config, server_configs
+
+    async def setup(self):
+        self.dataset = load_dataset(...)
+        self.iter = 0
+
+    async def get_next_item(self):
+        item = self.dataset[self.iter % len(self.dataset)]
+        self.iter += 1
+        return item
+
+    def format_prompt(self, item):
+        return item["instruction"]
+
+    async def compute_reward(self, item, result, ctx):
+        # ctx gives you full tool access to the rollout's sandbox
+        test = ctx.terminal("pytest -v")
+        return 1.0 if test["exit_code"] == 0 else 0.0
+
+    async def evaluate(self, *args, **kwargs):
+        # Periodic evaluation logic
+        ...
+
+if __name__ == "__main__":
+    MyEnv.cli()
+```
+
+### Eval-Only Environment (Benchmark)
+
+For eval benchmarks, follow the pattern in `terminalbench2_env.py`:
+1. Create under `environments/benchmarks/your-benchmark/`
+2. Inherit from `HermesAgentBaseEnv`
+3. Set eval-only config: `eval_handling=STOP_TRAIN`, `steps_per_eval=1`, `total_steps=1`
+4. Stub the training methods (`collect_trajectories`, `score`)
+5. Implement `rollout_and_score_eval()` and `evaluate()`
+6. Run with `evaluate` subcommand
+
+## Key Config Fields
+
+| Field | Description | Default |
+|-------|-------------|---------|
+| `enabled_toolsets` | Which hermes toolsets to enable | `None` (all) |
+| `disabled_toolsets` | Toolsets to disable | `None` |
+| `distribution` | Probabilistic toolset distribution name | `None` |
+| `max_agent_turns` | Max LLM calls per rollout | `30` |
+| `agent_temperature` | Sampling temperature | `1.0` |
+| `terminal_backend` | `local`, `docker`, `modal`, `ssh`, `singularity` | `local` |
+| `system_prompt` | System message for the agent | `None` |
+| `tool_call_parser` | Parser name for Phase 2 | `hermes` |
+| `eval_handling` | `STOP_TRAIN`, `LIMIT_TRAIN`, `NONE` | `STOP_TRAIN` |
--- a/environments/init.py
+++ b/environments/init.py
@@ -0,0 +1,32 @@
+"""
+Hermes-Agent Atropos Environments
+
+Provides a layered integration between hermes-agent's tool-calling capabilities
+and the Atropos RL training framework.
+
+Core layers:
+    - agent_loop: Reusable multi-turn agent loop with standard OpenAI-spec tool calling
+    - tool_context: Per-rollout tool access handle for reward/verification functions
+    - hermes_base_env: Abstract base environment (BaseEnv subclass) for Atropos
+    - tool_call_parsers: Client-side tool call parser registry for Phase 2 (VLLM /generate)
+
+Concrete environments:
+    - terminal_test_env/: Simple file-creation tasks for testing the stack
+    - hermes_swe_env/: SWE-bench style tasks with Modal sandboxes
+    - endless_terminals/: Terminal tasks from HuggingFace dataset with Apptainer containers
+
+Benchmarks (eval-only):
+    - benchmarks/terminalbench_2/: Terminal-Bench 2.0 evaluation
+"""
+
+from environments.agent_loop import AgentResult, HermesAgentLoop
+from environments.tool_context import ToolContext
+from environments.hermes_base_env import HermesAgentBaseEnv, HermesAgentEnvConfig
+
+__all__ = [
+    "AgentResult",
+    "HermesAgentLoop",
+    "ToolContext",
+    "HermesAgentBaseEnv",
+    "HermesAgentEnvConfig",
+]
--- a/environments/agent_loop.py
+++ b/environments/agent_loop.py
@@ -0,0 +1,421 @@
+"""
+HermesAgentLoop -- Reusable Multi-Turn Agent Engine
+
+Runs the hermes-agent tool-calling loop using standard OpenAI-spec tool calling.
+Works with any server that returns ChatCompletion objects with tool_calls:
+    - Phase 1: OpenAI server type (VLLM, SGLang, OpenRouter, OpenAI API)
+    - Phase 2: ManagedServer with client-side tool call parser
+
+The loop passes tools= and checks response.choices[0].message.tool_calls,
+identical to hermes-agent's run_agent.py. Tool execution is dispatched via
+handle_function_call() from model_tools.py.
+"""
+
+import asyncio
+import concurrent.futures
+import json
+import logging
+import os
+import uuid
+from dataclasses import dataclass, field
+from typing import Any, Dict, List, Optional, Set
+
+from model_tools import handle_function_call
+
+# Thread pool for running sync tool calls that internally use asyncio.run()
+# (e.g., mini-swe-agent's modal/docker backends). Running them in a separate
+# thread gives them a clean event loop so they don't deadlock inside Atropos's loop.
+# Size must be large enough for concurrent eval tasks (e.g., 89 TB2 tasks all
+# making tool calls). Too small = thread pool starvation, tasks queue for minutes.
+# Resized at runtime by HermesAgentBaseEnv.__init__ via resize_tool_pool().
+_tool_executor = concurrent.futures.ThreadPoolExecutor(max_workers=128)
+
+
+def resize_tool_pool(max_workers: int):
+    """
+    Replace the global tool executor with a new one of the given size.
+
+    Called by HermesAgentBaseEnv.__init__ based on config.tool_pool_size.
+    Safe to call before any tasks are submitted.
+    """
+    global _tool_executor
+    _tool_executor = concurrent.futures.ThreadPoolExecutor(max_workers=max_workers)
+    logger.info("Tool thread pool resized to %d workers", max_workers)
+
+logger = logging.getLogger(__name__)
+
+
+@dataclass
+class ToolError:
+    """Record of a tool execution error during the agent loop."""
+
+    turn: int                  # Which turn the error occurred on
+    tool_name: str             # Which tool was called
+    arguments: str             # The arguments passed (truncated)
+    error: str                 # The error message
+    tool_result: str           # The raw result returned to the model
+
+
+@dataclass
+class AgentResult:
+    """Result of running the agent loop."""
+
+    # Full conversation history in OpenAI message format
+    messages: List[Dict[str, Any]]
+    # ManagedServer.get_state() if available (Phase 2), None otherwise
+    managed_state: Optional[Dict[str, Any]] = None
+    # How many LLM calls were made
+    turns_used: int = 0
+    # True if model stopped calling tools naturally (vs hitting max_turns)
+    finished_naturally: bool = False
+    # Extracted reasoning content per turn (from PR #297 helpers)
+    reasoning_per_turn: List[Optional[str]] = field(default_factory=list)
+    # Tool errors encountered during the loop
+    tool_errors: List[ToolError] = field(default_factory=list)
+
+
+def _extract_reasoning_from_message(message) -> Optional[str]:
+    """
+    Extract reasoning content from a ChatCompletion message.
+
+    Handles multiple provider formats:
+    1. message.reasoning_content field (some providers)
+    2. message.reasoning field (some providers)
+    3. message.reasoning_details[].text (OpenRouter style)
+
+    Note: <think> block extraction from content is NOT done here -- that's
+    handled by the response already in Phase 1 (server does it) or by
+    ManagedServer's patch in Phase 2.
+
+    Args:
+        message: The assistant message from ChatCompletion response
+
+    Returns:
+        Extracted reasoning text, or None if not found
+    """
+    # Check reasoning_content field (common across providers)
+    if hasattr(message, "reasoning_content") and message.reasoning_content:
+        return message.reasoning_content
+
+    # Check reasoning field
+    if hasattr(message, "reasoning") and message.reasoning:
+        return message.reasoning
+
+    # Check reasoning_details (OpenRouter style)
+    if hasattr(message, "reasoning_details") and message.reasoning_details:
+        for detail in message.reasoning_details:
+            if hasattr(detail, "text") and detail.text:
+                return detail.text
+            if isinstance(detail, dict) and detail.get("text"):
+                return detail["text"]
+
+    return None
+
+
+class HermesAgentLoop:
+    """
+    Runs hermes-agent's tool-calling loop using standard OpenAI-spec tool calling.
+
+    Same pattern as run_agent.py:
+    - Pass tools= to the API
+    - Check response.choices[0].message.tool_calls
+    - Dispatch via handle_function_call()
+
+    Works identically with any server type -- OpenAI, VLLM, SGLang, OpenRouter,
+    or ManagedServer with a parser. The server determines how tool_calls get
+    populated on the response.
+    """
+
+    def __init__(
+        self,
+        server,
+        tool_schemas: List[Dict[str, Any]],
+        valid_tool_names: Set[str],
+        max_turns: int = 30,
+        task_id: Optional[str] = None,
+        temperature: float = 1.0,
+        max_tokens: Optional[int] = None,
+        extra_body: Optional[Dict[str, Any]] = None,
+    ):
+        """
+        Initialize the agent loop.
+
+        Args:
+            server: Server object with chat_completion() method (OpenAIServer,
+                    ManagedServer, ServerManager, etc.)
+            tool_schemas: OpenAI-format tool definitions from get_tool_definitions()
+            valid_tool_names: Set of tool names the model is allowed to call
+            max_turns: Maximum number of LLM calls before stopping
+            task_id: Unique ID for terminal/browser session isolation
+            temperature: Sampling temperature for generation
+            max_tokens: Max tokens per generation (None for server default)
+            extra_body: Extra parameters passed to the OpenAI client's create() call.
+                        Used for OpenRouter provider preferences, transforms, etc.
+                        e.g. {"provider": {"ignore": ["DeepInfra"]}}
+        """
+        self.server = server
+        self.tool_schemas = tool_schemas
+        self.valid_tool_names = valid_tool_names
+        self.max_turns = max_turns
+        self.task_id = task_id or str(uuid.uuid4())
+        self.temperature = temperature
+        self.max_tokens = max_tokens
+        self.extra_body = extra_body
+
+    async def run(self, messages: List[Dict[str, Any]]) -> AgentResult:
+        """
+        Execute the full agent loop using standard OpenAI tool calling.
+
+        Args:
+            messages: Initial conversation messages (system + user).
+                      Modified in-place as the conversation progresses.
+
+        Returns:
+            AgentResult with full conversation history, managed state, and metadata
+        """
+        reasoning_per_turn = []
+        tool_errors: List[ToolError] = []
+
+        import time as _time
+
+        for turn in range(self.max_turns):
+            turn_start = _time.monotonic()
+
+            # Build the chat_completion kwargs
+            chat_kwargs = {
+                "messages": messages,
+                "n": 1,
+                "temperature": self.temperature,
+            }
+
+            # Only pass tools if we have them
+            if self.tool_schemas:
+                chat_kwargs["tools"] = self.tool_schemas
+
+            # Only pass max_tokens if explicitly set
+            if self.max_tokens is not None:
+                chat_kwargs["max_tokens"] = self.max_tokens
+
+            # Inject extra_body for provider-specific params (e.g., OpenRouter
+            # provider preferences like banned/preferred providers, transforms)
+            if self.extra_body:
+                chat_kwargs["extra_body"] = self.extra_body
+
+            # Make the API call -- standard OpenAI spec
+            api_start = _time.monotonic()
+            try:
+                response = await self.server.chat_completion(**chat_kwargs)
+            except Exception as e:
+                api_elapsed = _time.monotonic() - api_start
+                logger.error("API call failed on turn %d (%.1fs): %s", turn + 1, api_elapsed, e)
+                return AgentResult(
+                    messages=messages,
+                    managed_state=self._get_managed_state(),
+                    turns_used=turn + 1,
+                    finished_naturally=False,
+                    reasoning_per_turn=reasoning_per_turn,
+                    tool_errors=tool_errors,
+                )
+
+            api_elapsed = _time.monotonic() - api_start
+
+            if not response or not response.choices:
+                logger.warning("Empty response on turn %d (api=%.1fs)", turn + 1, api_elapsed)
+                return AgentResult(
+                    messages=messages,
+                    managed_state=self._get_managed_state(),
+                    turns_used=turn + 1,
+                    finished_naturally=False,
+                    reasoning_per_turn=reasoning_per_turn,
+                    tool_errors=tool_errors,
+                )
+
+            assistant_msg = response.choices[0].message
+
+            # Extract reasoning content from the response (all provider formats)
+            reasoning = _extract_reasoning_from_message(assistant_msg)
+            reasoning_per_turn.append(reasoning)
+
+            # Check for tool calls -- standard OpenAI spec
+            if assistant_msg.tool_calls:
+                # Build the assistant message dict for conversation history
+                msg_dict: Dict[str, Any] = {
+                    "role": "assistant",
+                    "content": assistant_msg.content or "",
+                    "tool_calls": [
+                        {
+                            "id": tc.id,
+                            "type": "function",
+                            "function": {
+                                "name": tc.function.name,
+                                "arguments": tc.function.arguments,
+                            },
+                        }
+                        for tc in assistant_msg.tool_calls
+                    ],
+                }
+
+                # Preserve reasoning_content for multi-turn chat template handling
+                # (e.g., Kimi-K2's template renders <think> blocks differently
+                # for history vs. the latest turn based on this field)
+                if reasoning:
+                    msg_dict["reasoning_content"] = reasoning
+
+                messages.append(msg_dict)
+
+                # Execute each tool call via hermes-agent's dispatch
+                for tc in assistant_msg.tool_calls:
+                    tool_name = tc.function.name
+                    tool_args_raw = tc.function.arguments
+
+                    # Validate tool name
+                    if tool_name not in self.valid_tool_names:
+                        tool_result = json.dumps(
+                            {
+                                "error": f"Unknown tool '{tool_name}'. "
+                                f"Available tools: {sorted(self.valid_tool_names)}"
+                            }
+                        )
+                        tool_errors.append(ToolError(
+                            turn=turn + 1, tool_name=tool_name,
+                            arguments=tool_args_raw[:200],
+                            error=f"Unknown tool '{tool_name}'",
+                            tool_result=tool_result,
+                        ))
+                        logger.warning(
+                            "Model called unknown tool '%s' on turn %d",
+                            tool_name, turn + 1,
+                        )
+                    else:
+                        # Parse arguments and dispatch
+                        try:
+                            args = json.loads(tool_args_raw)
+                        except json.JSONDecodeError:
+                            args = {}
+                            logger.warning(
+                                "Invalid JSON in tool call arguments for '%s': %s",
+                                tool_name, tool_args_raw[:200],
+                            )
+
+                        try:
+                            if tool_name == "terminal":
+                                backend = os.getenv("TERMINAL_ENV", "local")
+                                cmd_preview = args.get("command", "")[:80]
+                                logger.info(
+                                    "[%s] $ %s", self.task_id[:8], cmd_preview,
+                                )
+
+                            # Run tool calls in a thread pool so backends that use
+                            # asyncio.run() internally (modal, docker) get a clean
+                            # event loop instead of deadlocking inside Atropos's loop.
+                            tool_submit_time = _time.monotonic()
+                            loop = asyncio.get_event_loop()
+                            tool_result = await loop.run_in_executor(
+                                _tool_executor,
+                                lambda: handle_function_call(
+                                    tool_name, args, task_id=self.task_id
+                                ),
+                            )
+                            tool_elapsed = _time.monotonic() - tool_submit_time
+
+                            # Log slow tools and thread pool stats for debugging
+                            pool_active = _tool_executor._work_queue.qsize()
+                            if tool_elapsed > 30:
+                                logger.warning(
+                                    "[%s] turn %d: %s took %.1fs (pool queue=%d)",
+                                    self.task_id[:8], turn + 1, tool_name,
+                                    tool_elapsed, pool_active,
+                                )
+                        except Exception as e:
+                            tool_result = json.dumps(
+                                {"error": f"Tool execution failed: {type(e).__name__}: {str(e)}"}
+                            )
+                            tool_errors.append(ToolError(
+                                turn=turn + 1, tool_name=tool_name,
+                                arguments=tool_args_raw[:200],
+                                error=f"{type(e).__name__}: {str(e)}",
+                                tool_result=tool_result,
+                            ))
+                            logger.error(
+                                "Tool '%s' execution failed on turn %d: %s",
+                                tool_name, turn + 1, e,
+                            )
+
+                        # Also check if the tool returned an error in its JSON result
+                        try:
+                            result_data = json.loads(tool_result)
+                            if isinstance(result_data, dict):
+                                err = result_data.get("error")
+                                exit_code = result_data.get("exit_code")
+                                if err and exit_code and exit_code < 0:
+                                    tool_errors.append(ToolError(
+                                        turn=turn + 1, tool_name=tool_name,
+                                        arguments=tool_args_raw[:200],
+                                        error=str(err),
+                                        tool_result=tool_result[:500],
+                                    ))
+                        except (json.JSONDecodeError, TypeError):
+                            pass
+
+                    # Add tool response to conversation
+                    messages.append(
+                        {
+                            "role": "tool",
+                            "tool_call_id": tc.id,
+                            "content": tool_result,
+                        }
+                    )
+
+                turn_elapsed = _time.monotonic() - turn_start
+                logger.info(
+                    "[%s] turn %d: api=%.1fs, %d tools, turn_total=%.1fs",
+                    self.task_id[:8], turn + 1, api_elapsed,
+                    len(assistant_msg.tool_calls), turn_elapsed,
+                )
+
+            else:
+                # No tool calls -- model is done
+                msg_dict = {
+                    "role": "assistant",
+                    "content": assistant_msg.content or "",
+                }
+                if reasoning:
+                    msg_dict["reasoning_content"] = reasoning
+                messages.append(msg_dict)
+
+                turn_elapsed = _time.monotonic() - turn_start
+                logger.info(
+                    "[%s] turn %d: api=%.1fs, no tools (finished), turn_total=%.1fs",
+                    self.task_id[:8], turn + 1, api_elapsed, turn_elapsed,
+                )
+
+                return AgentResult(
+                    messages=messages,
+                    managed_state=self._get_managed_state(),
+                    turns_used=turn + 1,
+                    finished_naturally=True,
+                    reasoning_per_turn=reasoning_per_turn,
+                    tool_errors=tool_errors,
+                )
+
+        # Hit max turns without the model stopping
+        logger.info("Agent hit max_turns (%d) without finishing", self.max_turns)
+        return AgentResult(
+            messages=messages,
+            managed_state=self._get_managed_state(),
+            turns_used=self.max_turns,
+            finished_naturally=False,
+            reasoning_per_turn=reasoning_per_turn,
+            tool_errors=tool_errors,
+        )
+
+    def _get_managed_state(self) -> Optional[Dict[str, Any]]:
+        """
+        Get ManagedServer state if the server supports it.
+
+        Returns state dict with SequenceNodes containing tokens/logprobs/masks,
+        or None if the server doesn't support get_state() (e.g., regular OpenAI server).
+        """
+        if hasattr(self.server, "get_state"):
+            return self.server.get_state()
+        return None
--- a/environments/benchmarks/init.py
+++ b/environments/benchmarks/init.py
--- a/environments/benchmarks/terminalbench_2/init.py
+++ b/environments/benchmarks/terminalbench_2/init.py
--- a/environments/benchmarks/terminalbench_2/default.yaml
+++ b/environments/benchmarks/terminalbench_2/default.yaml
@@ -0,0 +1,38 @@
+# Terminal-Bench 2.0 Evaluation -- Default Configuration
+#
+# Eval-only environment for the TB2 benchmark (89 terminal tasks).
+# Uses Modal terminal backend for per-task cloud-isolated sandboxes
+# and OpenRouter for inference.
+#
+# Usage:
+#   python environments/benchmarks/terminalbench_2/terminalbench2_env.py evaluate \
+#       --config environments/benchmarks/terminalbench_2/default.yaml
+#
+#   # Override model:
+#   python environments/benchmarks/terminalbench_2/terminalbench2_env.py evaluate \
+#       --config environments/benchmarks/terminalbench_2/default.yaml \
+#       --openai.model_name anthropic/claude-sonnet-4
+
+env:
+  enabled_toolsets: ["terminal", "file"]
+  max_agent_turns: 60
+  max_token_length: 32000
+  agent_temperature: 0.8
+  terminal_backend: "modal"
+  terminal_timeout: 300        # 5 min per command (builds, pip install)
+  tool_pool_size: 128          # thread pool for 89 parallel tasks
+  dataset_name: "NousResearch/terminal-bench-2"
+  test_timeout: 600
+  task_timeout: 1800           # 30 min wall-clock per task, auto-FAIL if exceeded
+  tokenizer_name: "NousResearch/Hermes-3-Llama-3.1-8B"
+  use_wandb: true
+  wandb_name: "terminal-bench-2"
+  ensure_scores_are_not_same: false
+  data_dir_to_save_evals: "environments/benchmarks/evals/terminal-bench-2"
+
+openai:
+  base_url: "https://openrouter.ai/api/v1"
+  model_name: "anthropic/claude-opus-4.6"
+  server_type: "openai"
+  health_check: false
+  # api_key loaded from OPENROUTER_API_KEY in .env
--- a/environments/benchmarks/terminalbench_2/run_eval.sh
+++ b/environments/benchmarks/terminalbench_2/run_eval.sh
@@ -0,0 +1,32 @@
+#!/bin/bash
+
+# Terminal-Bench 2.0 Evaluation
+#
+# Run from repo root:
+#   bash environments/benchmarks/terminalbench_2/run_eval.sh
+#
+# Override model:
+#   bash environments/benchmarks/terminalbench_2/run_eval.sh \
+#       --openai.model_name anthropic/claude-sonnet-4
+#
+# Run a subset:
+#   bash environments/benchmarks/terminalbench_2/run_eval.sh \
+#       --env.task_filter fix-git,git-multibranch
+
+mkdir -p logs evals/terminal-bench-2
+LOG_FILE="logs/terminalbench2_$(date +%Y%m%d_%H%M%S).log"
+
+echo "Terminal-Bench 2.0 Evaluation"
+echo "Log: $LOG_FILE"
+echo ""
+
+export TERMINAL_ENV=modal
+export TERMINAL_TIMEOUT=300
+
+python environments/benchmarks/terminalbench_2/terminalbench2_env.py evaluate \
+  --config environments/benchmarks/terminalbench_2/default.yaml \
+  "$@" \
+  2>&1 | tee "$LOG_FILE"
+
+echo ""
+echo "Log saved to: $LOG_FILE"
--- a/environments/benchmarks/terminalbench_2/terminalbench2_env.py
+++ b/environments/benchmarks/terminalbench_2/terminalbench2_env.py
@@ -0,0 +1,904 @@
+"""
+TerminalBench2Env -- Terminal-Bench 2.0 Evaluation Environment
+
+Evaluates agentic LLMs on challenging terminal tasks from Terminal-Bench 2.0.
+Each task provides a unique Docker environment (pre-built on Docker Hub), a natural
+language instruction, and a test suite for verification. The agent uses terminal +
+file tools to complete the task, then the test suite runs inside the same sandbox.
+
+This is an eval-only environment (not a training environment). It is designed to
+be run via the `evaluate` subcommand:
+
+    python environments/terminalbench2_env.py evaluate \\
+        --env.dataset_name NousResearch/terminal-bench-2
+
+The evaluate flow:
+    1. setup()     -- Loads the TB2 dataset from HuggingFace
+    2. evaluate()  -- Iterates over all tasks, running each through:
+        a. rollout_and_score_eval()  -- Per-task agent loop + test verification
+            - Resolves Docker image (pre-built Hub image or Dockerfile fallback)
+            - Registers per-task Modal sandbox via register_task_env_overrides()
+            - Runs the HermesAgentLoop (terminal + file tools)
+            - Uploads test suite and runs test.sh in the same sandbox
+            - Returns binary pass/fail result
+        b. Aggregates per-task, per-category, and overall pass rates
+        c. Logs results via evaluate_log() and wandb
+
+Key features:
+  - Per-task Modal sandboxes using pre-built Docker Hub images
+  - Binary reward: 1.0 if all tests pass, 0.0 otherwise
+  - Concurrency-controlled parallel evaluation via asyncio.Semaphore
+  - Per-task, per-category, and aggregate pass rate tracking
+"""
+
+import asyncio
+import base64
+import io
+import json
+import logging
+import os
+import shutil
+import sys
+import tarfile
+import tempfile
+import time
+import uuid
+from collections import defaultdict
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple, Union
+
+# Ensure repo root is on sys.path for imports
+_repo_root = Path(__file__).resolve().parent.parent.parent.parent
+if str(_repo_root) not in sys.path:
+    sys.path.insert(0, str(_repo_root))
+
+from pydantic import Field
+
+from atroposlib.envs.base import EvalHandlingEnum
+from atroposlib.envs.server_handling.server_manager import APIServerConfig
+
+from environments.agent_loop import AgentResult, HermesAgentLoop
+from environments.hermes_base_env import HermesAgentBaseEnv, HermesAgentEnvConfig
+from environments.tool_context import ToolContext
+from tools.terminal_tool import (
+    register_task_env_overrides,
+    clear_task_env_overrides,
+    cleanup_vm,
+)
+
+logger = logging.getLogger(__name__)
+
+
+# =============================================================================
+# Configuration
+# =============================================================================
+
+class TerminalBench2EvalConfig(HermesAgentEnvConfig):
+    """
+    Configuration for the Terminal-Bench 2.0 evaluation environment.
+
+    Extends HermesAgentEnvConfig with TB2-specific settings for dataset loading,
+    test execution, task filtering, and eval concurrency.
+    """
+
+    # --- Dataset ---
+    dataset_name: str = Field(
+        default="NousResearch/terminal-bench-2",
+        description="HuggingFace dataset containing TB2 tasks.",
+    )
+
+    # --- Test execution ---
+    test_timeout: int = Field(
+        default=180,
+        description="Timeout in seconds for running the test suite after agent completes.",
+    )
+
+    # --- Image strategy ---
+    force_build: bool = Field(
+        default=False,
+        description="If True, always build from Dockerfile (ignore docker_image). "
+        "Useful for testing custom Dockerfiles.",
+    )
+
+    # --- Task filtering (comma-separated from CLI) ---
+    task_filter: Optional[str] = Field(
+        default=None,
+        description="Comma-separated task names to run (e.g., 'fix-git,git-multibranch'). "
+        "If not set, all tasks are run.",
+    )
+    skip_tasks: Optional[str] = Field(
+        default=None,
+        description="Comma-separated task names to skip on top of the default skip list.",
+    )
+
+    # --- Per-task wall-clock timeout ---
+    task_timeout: int = Field(
+        default=1800,
+        description="Maximum wall-clock seconds per task (agent loop + verification). "
+        "Tasks exceeding this are scored as FAIL. Default 30 minutes.",
+    )
+
+
+# Tasks that cannot run properly on Modal and are excluded from scoring.
+MODAL_INCOMPATIBLE_TASKS = {
+    "qemu-startup",        # Needs KVM/hardware virtualization
+    "qemu-alpine-ssh",     # Needs KVM/hardware virtualization
+    "crack-7z-hash",       # Password brute-force -- too slow for cloud sandbox timeouts
+}
+
+
+# =============================================================================
+# Tar extraction helper
+# =============================================================================
+
+def _extract_base64_tar(b64_data: str, target_dir: Path):
+    """Extract a base64-encoded tar.gz archive into target_dir."""
+    if not b64_data:
+        return
+    raw = base64.b64decode(b64_data)
+    buf = io.BytesIO(raw)
+    with tarfile.open(fileobj=buf, mode="r:gz") as tar:
+        tar.extractall(path=str(target_dir))
+
+
+# =============================================================================
+# Main Environment
+# =============================================================================
+
+class TerminalBench2EvalEnv(HermesAgentBaseEnv):
+    """
+    Terminal-Bench 2.0 evaluation environment (eval-only, no training).
+
+    Inherits from HermesAgentBaseEnv for:
+      - Terminal backend setup (os.environ["TERMINAL_ENV"])
+      - Tool resolution via _resolve_tools_for_group()
+      - Monkey patches for async-safe tool operation
+      - Wandb trajectory formatting
+
+    The evaluate flow (triggered by `environment.py evaluate`):
+      1. setup()    -- Load dataset from HuggingFace
+      2. evaluate() -- Run all tasks through rollout_and_score_eval()
+
+    Each task in rollout_and_score_eval():
+      1. Resolve Docker image (pre-built Hub image or Dockerfile fallback)
+      2. Register per-task Modal sandbox override
+      3. Run HermesAgentLoop with terminal + file tools
+      4. Upload test suite and execute test.sh in the same sandbox
+      5. Check /logs/verifier/reward.txt for pass/fail
+      6. Clean up sandbox, overrides, and temp files
+    """
+
+    name = "terminal-bench-2"
+    env_config_cls = TerminalBench2EvalConfig
+
+    @classmethod
+    def config_init(cls) -> Tuple[TerminalBench2EvalConfig, List[APIServerConfig]]:
+        """
+        Default configuration for Terminal-Bench 2.0 evaluation.
+
+        Uses eval-only settings:
+          - eval_handling=STOP_TRAIN so the eval flow runs cleanly
+          - steps_per_eval=1, total_steps=1 so eval triggers immediately
+          - group_size=1 (one rollout per group, each task is expensive)
+
+        Uses Modal terminal backend (cloud-isolated sandbox per task) and
+        OpenRouter with Claude for inference.
+        """
+        env_config = TerminalBench2EvalConfig(
+            # Terminal + file tools only (the agent interacts via shell commands)
+            enabled_toolsets=["terminal", "file"],
+            disabled_toolsets=None,
+            distribution=None,
+
+            # Agent settings -- TB2 tasks are complex, need many turns
+            max_agent_turns=60,
+            max_token_length=16000,
+            agent_temperature=0.6,
+            system_prompt=None,
+
+            # Modal backend for per-task cloud-isolated sandboxes
+            terminal_backend="modal",
+            terminal_timeout=300,   # 5 min per command (builds, pip install, etc.)
+
+            # Test execution timeout (TB2 test scripts can install deps like pytest)
+            test_timeout=180,
+
+            # 89 tasks run in parallel, each needs a thread for tool calls
+            tool_pool_size=128,
+
+            # --- Eval-only Atropos settings ---
+            # These settings make the env work as an eval-only environment:
+            #   - STOP_TRAIN: pauses training during eval (standard for eval envs)
+            #   - steps_per_eval=1, total_steps=1: eval triggers immediately
+            #   - group_size=1: one rollout per group (each task is expensive)
+            eval_handling=EvalHandlingEnum.STOP_TRAIN,
+            group_size=1,
+            steps_per_eval=1,
+            total_steps=1,
+
+            tokenizer_name="NousResearch/Hermes-3-Llama-3.1-8B",
+            use_wandb=True,
+            wandb_name="terminal-bench-2",
+            ensure_scores_are_not_same=False,  # Binary rewards may all be 0 or 1
+        )
+
+        # OpenRouter with Claude -- API key loaded from .env
+        server_configs = [
+            APIServerConfig(
+                base_url="https://openrouter.ai/api/v1",
+                model_name="anthropic/claude-sonnet-4",
+                server_type="openai",
+                api_key=os.getenv("OPENROUTER_API_KEY", ""),
+                health_check=False,
+            )
+        ]
+
+        return env_config, server_configs
+
+    # =========================================================================
+    # Setup -- load dataset
+    # =========================================================================
+
+    async def setup(self):
+        """Load the Terminal-Bench 2.0 dataset from HuggingFace."""
+        from datasets import load_dataset
+
+        # Auto-set terminal_lifetime to task_timeout + 120s so sandboxes
+        # never get killed during an active task, but still get cleaned up
+        # promptly after the task times out.
+        lifetime = self.config.task_timeout + 120
+        self.config.terminal_lifetime = lifetime
+        os.environ["TERMINAL_LIFETIME_SECONDS"] = str(lifetime)
+        print(f"  Terminal lifetime auto-set to {lifetime}s (task_timeout + 120s)")
+
+        print(f"Loading TB2 dataset from: {self.config.dataset_name}")
+        ds = load_dataset(self.config.dataset_name, split="train")
+
+        # Apply task filters (comma-separated strings from CLI)
+        tasks = list(ds)
+        if self.config.task_filter:
+            allowed = {name.strip() for name in self.config.task_filter.split(",")}
+            tasks = [t for t in tasks if t["task_name"] in allowed]
+            print(f"  Filtered to {len(tasks)} tasks: {sorted(allowed)}")
+
+        # Skip tasks incompatible with the current backend (e.g., QEMU on Modal)
+        # plus any user-specified skip_tasks
+        skip = set(MODAL_INCOMPATIBLE_TASKS) if self.config.terminal_backend == "modal" else set()
+        if self.config.skip_tasks:
+            skip |= {name.strip() for name in self.config.skip_tasks.split(",")}
+        if skip:
+            before = len(tasks)
+            tasks = [t for t in tasks if t["task_name"] not in skip]
+            skipped = before - len(tasks)
+            if skipped > 0:
+                print(f"  Skipped {skipped} incompatible tasks: {sorted(skip & {t['task_name'] for t in ds})}")
+
+        self.all_eval_items = tasks
+        self.iter = 0
+
+        # Build category index for per-category metrics
+        self.category_index: Dict[str, List[int]] = defaultdict(list)
+        for i, task in enumerate(self.all_eval_items):
+            self.category_index[task.get("category", "unknown")].append(i)
+
+        # Reward tracking for wandb logging
+        self.eval_metrics: List[Tuple[str, float]] = []
+
+        # Streaming JSONL writer -- saves each task's full conversation
+        # immediately on completion so data is preserved even on Ctrl+C.
+        # Timestamped filename so each run produces a unique file.
+        import datetime
+        log_dir = os.path.join(os.path.dirname(__file__), "logs")
+        os.makedirs(log_dir, exist_ok=True)
+        run_ts = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
+        self._streaming_path = os.path.join(log_dir, f"samples_{run_ts}.jsonl")
+        self._streaming_file = open(self._streaming_path, "w")
+        self._streaming_lock = __import__("threading").Lock()
+        print(f"  Streaming results to: {self._streaming_path}")
+
+        print(f"TB2 ready: {len(self.all_eval_items)} tasks across {len(self.category_index)} categories")
+        for cat, indices in sorted(self.category_index.items()):
+            print(f"  {cat}: {len(indices)} tasks")
+
+    def _save_result(self, result: Dict[str, Any]):
+        """Write a single task result to the streaming JSONL file immediately."""
+        if not hasattr(self, "_streaming_file") or self._streaming_file.closed:
+            return
+        with self._streaming_lock:
+            self._streaming_file.write(json.dumps(result, ensure_ascii=False, default=str) + "\n")
+            self._streaming_file.flush()
+
+    # =========================================================================
+    # Training pipeline stubs -- NOT used in eval-only mode
+    # =========================================================================
+    # These satisfy the abstract method requirements from HermesAgentBaseEnv.
+    # The evaluate subcommand calls setup() -> evaluate() directly, bypassing
+    # the training pipeline entirely.
+
+    async def get_next_item(self):
+        """Return next item (stub -- not used in eval-only mode)."""
+        item = self.all_eval_items[self.iter % len(self.all_eval_items)]
+        self.iter += 1
+        return item
+
+    def format_prompt(self, item: Dict[str, Any]) -> str:
+        """Return the task's instruction as the user prompt."""
+        return item["instruction"]
+
+    async def compute_reward(self, item, result, ctx) -> float:
+        """Compute reward (stub -- actual verification is in rollout_and_score_eval)."""
+        return 0.0
+
+    async def collect_trajectories(self, item):
+        """Collect trajectories (stub -- not used in eval-only mode)."""
+        return None, []
+
+    async def score(self, rollout_group_data):
+        """Score rollouts (stub -- not used in eval-only mode)."""
+        return None
+
+    # =========================================================================
+    # Docker image resolution
+    # =========================================================================
+
+    def _resolve_task_image(
+        self, item: Dict[str, Any], task_name: str
+    ) -> Tuple[str, Optional[Path]]:
+        """
+        Resolve the Docker image for a task, with fallback to Dockerfile.
+
+        Strategy (mirrors Harbor's approach):
+        1. If force_build=True, always build from Dockerfile in environment_tar
+        2. If docker_image is available, use the pre-built Docker Hub image (fast)
+        3. Otherwise, extract Dockerfile from environment_tar and build (slow)
+
+        Returns:
+            (modal_image, temp_dir) -- modal_image is a Docker Hub name or a
+            Dockerfile path. temp_dir is set if we extracted files that need
+            cleanup later.
+        """
+        docker_image = item.get("docker_image", "")
+        environment_tar = item.get("environment_tar", "")
+
+        # Fast path: use pre-built Docker Hub image
+        if docker_image and not self.config.force_build:
+            logger.info("Task %s: using pre-built image %s", task_name, docker_image)
+            return docker_image, None
+
+        # Slow path: extract Dockerfile from environment_tar and build
+        if environment_tar:
+            task_dir = Path(tempfile.mkdtemp(prefix=f"tb2-{task_name}-"))
+            _extract_base64_tar(environment_tar, task_dir)
+            dockerfile_path = task_dir / "Dockerfile"
+            if dockerfile_path.exists():
+                logger.info(
+                    "Task %s: building from Dockerfile (force_build=%s, docker_image=%s)",
+                    task_name, self.config.force_build, bool(docker_image),
+                )
+                return str(dockerfile_path), task_dir
+
+        # Neither available -- fall back to Hub image if force_build was True
+        if docker_image:
+            logger.warning(
+                "Task %s: force_build=True but no environment_tar, "
+                "falling back to docker_image %s", task_name, docker_image,
+            )
+            return docker_image, None
+
+        return "", None
+
+    # =========================================================================
+    # Per-task evaluation -- agent loop + test verification
+    # =========================================================================
+
+    async def rollout_and_score_eval(self, eval_item: Dict[str, Any]) -> Dict:
+        """
+        Evaluate a single TB2 task: run the agent loop, then verify with tests.
+
+        This is the core evaluation method. For each task it:
+        1. Resolves the Docker image and registers the Modal sandbox override
+        2. Runs HermesAgentLoop with terminal + file tools
+        3. Uploads the test suite into the sandbox
+        4. Executes test.sh and checks the result
+        5. Cleans up the sandbox and temp files
+
+        Args:
+            eval_item: A single TB2 task dict from the dataset
+
+        Returns:
+            Dict with 'passed' (bool), 'reward' (float), 'task_name' (str),
+            'category' (str), and optional debug info
+        """
+        task_name = eval_item.get("task_name", "unknown")
+        category = eval_item.get("category", "unknown")
+        task_id = str(uuid.uuid4())
+        task_dir = None  # Set if we extract a Dockerfile (needs cleanup)
+
+        from tqdm import tqdm
+        tqdm.write(f"  [START] {task_name} (task_id={task_id[:8]})")
+        task_start = time.time()
+
+        try:
+            # --- 1. Resolve Docker image ---
+            modal_image, task_dir = self._resolve_task_image(eval_item, task_name)
+            if not modal_image:
+                logger.error("Task %s: no docker_image or environment_tar, skipping", task_name)
+                return {
+                    "passed": False, "reward": 0.0,
+                    "task_name": task_name, "category": category,
+                    "error": "no_image",
+                }
+
+            # --- 2. Register per-task Modal image override ---
+            register_task_env_overrides(task_id, {"modal_image": modal_image})
+            logger.info(
+                "Task %s: registered image override for task_id %s",
+                task_name, task_id[:8],
+            )
+
+            # --- 3. Resolve tools and build messages ---
+            tools, valid_names = self._resolve_tools_for_group()
+
+            messages: List[Dict[str, Any]] = []
+            if self.config.system_prompt:
+                messages.append({"role": "system", "content": self.config.system_prompt})
+            messages.append({"role": "user", "content": self.format_prompt(eval_item)})
+
+            # --- 4. Run agent loop ---
+            agent = HermesAgentLoop(
+                server=self.server,
+                tool_schemas=tools,
+                valid_tool_names=valid_names,
+                max_turns=self.config.max_agent_turns,
+                task_id=task_id,
+                temperature=self.config.agent_temperature,
+                max_tokens=self.config.max_token_length,
+                extra_body=self.config.extra_body,
+            )
+            result = await agent.run(messages)
+
+            # --- 5. Verify -- run test suite in the agent's sandbox ---
+            # Skip verification if the agent produced no meaningful output
+            only_system_and_user = all(
+                msg.get("role") in ("system", "user") for msg in result.messages
+            )
+            if result.turns_used == 0 or only_system_and_user:
+                logger.warning(
+                    "Task %s: agent produced no output (turns=%d). Reward=0.",
+                    task_name, result.turns_used,
+                )
+                reward = 0.0
+            else:
+                # Run tests in a thread so the blocking ctx.terminal() calls
+                # don't freeze the entire event loop (which would stall all
+                # other tasks, tqdm updates, and timeout timers).
+                ctx = ToolContext(task_id)
+                try:
+                    loop = asyncio.get_event_loop()
+                    reward = await loop.run_in_executor(
+                        None,  # default thread pool
+                        self._run_tests, eval_item, ctx, task_name,
+                    )
+                except Exception as e:
+                    logger.error("Task %s: test verification failed: %s", task_name, e)
+                    reward = 0.0
+                finally:
+                    ctx.cleanup()
+
+            passed = reward == 1.0
+            status = "PASS" if passed else "FAIL"
+            elapsed = time.time() - task_start
+            tqdm.write(f"  [{status}] {task_name} (turns={result.turns_used}, {elapsed:.0f}s)")
+            logger.info(
+                "Task %s: reward=%.1f, turns=%d, finished=%s",
+                task_name, reward, result.turns_used, result.finished_naturally,
+            )
+
+            out = {
+                "passed": passed,
+                "reward": reward,
+                "task_name": task_name,
+                "category": category,
+                "turns_used": result.turns_used,
+                "finished_naturally": result.finished_naturally,
+                "messages": result.messages,
+            }
+            self._save_result(out)
+            return out
+
+        except Exception as e:
+            elapsed = time.time() - task_start
+            logger.error("Task %s: rollout failed: %s", task_name, e, exc_info=True)
+            tqdm.write(f"  [ERROR] {task_name}: {e} ({elapsed:.0f}s)")
+            out = {
+                "passed": False, "reward": 0.0,
+                "task_name": task_name, "category": category,
+                "error": str(e),
+            }
+            self._save_result(out)
+            return out
+
+        finally:
+            # --- Cleanup: clear overrides, sandbox, and temp files ---
+            clear_task_env_overrides(task_id)
+            try:
+                cleanup_vm(task_id)
+            except Exception as e:
+                logger.debug("VM cleanup for %s: %s", task_id[:8], e)
+            if task_dir and task_dir.exists():
+                shutil.rmtree(task_dir, ignore_errors=True)
+
+    def _run_tests(
+        self, item: Dict[str, Any], ctx: ToolContext, task_name: str
+    ) -> float:
+        """
+        Upload and execute the test suite in the agent's sandbox, then
+        download the verifier output locally to read the reward.
+
+        Follows Harbor's verification pattern:
+        1. Upload tests/ directory into the sandbox
+        2. Execute test.sh inside the sandbox
+        3. Download /logs/verifier/ directory to a local temp dir
+        4. Read reward.txt locally with native Python I/O
+
+        Downloading locally avoids issues with the file_read tool on
+        the Modal VM and matches how Harbor handles verification.
+
+        TB2 test scripts (test.sh) typically:
+        1. Install pytest via uv/pip
+        2. Run pytest against the test files in /tests/
+        3. Write results to /logs/verifier/reward.txt
+
+        Args:
+            item: The TB2 task dict (contains tests_tar, test_sh)
+            ctx: ToolContext scoped to this task's sandbox
+            task_name: For logging
+
+        Returns:
+            1.0 if tests pass, 0.0 otherwise
+        """
+        tests_tar = item.get("tests_tar", "")
+        test_sh = item.get("test_sh", "")
+
+        if not test_sh:
+            logger.warning("Task %s: no test_sh content, reward=0", task_name)
+            return 0.0
+
+        # Create required directories in the sandbox
+        ctx.terminal("mkdir -p /tests /logs/verifier")
+
+        # Upload test files into the sandbox (binary-safe via base64)
+        if tests_tar:
+            tests_temp = Path(tempfile.mkdtemp(prefix=f"tb2-tests-{task_name}-"))
+            try:
+                _extract_base64_tar(tests_tar, tests_temp)
+                ctx.upload_dir(str(tests_temp), "/tests")
+            except Exception as e:
+                logger.warning("Task %s: failed to upload test files: %s", task_name, e)
+            finally:
+                shutil.rmtree(tests_temp, ignore_errors=True)
+
+        # Write the test runner script (test.sh)
+        ctx.write_file("/tests/test.sh", test_sh)
+        ctx.terminal("chmod +x /tests/test.sh")
+
+        # Execute the test suite
+        logger.info(
+            "Task %s: running test suite (timeout=%ds)",
+            task_name, self.config.test_timeout,
+        )
+        test_result = ctx.terminal(
+            "bash /tests/test.sh",
+            timeout=self.config.test_timeout,
+        )
+
+        exit_code = test_result.get("exit_code", -1)
+        output = test_result.get("output", "")
+
+        # Download the verifier output directory locally, then read reward.txt
+        # with native Python I/O. This avoids issues with file_read on the
+        # Modal VM and matches Harbor's verification pattern.
+        reward = 0.0
+        local_verifier_dir = Path(tempfile.mkdtemp(prefix=f"tb2-verifier-{task_name}-"))
+        try:
+            ctx.download_dir("/logs/verifier", str(local_verifier_dir))
+
+            reward_file = local_verifier_dir / "reward.txt"
+            if reward_file.exists() and reward_file.stat().st_size > 0:
+                content = reward_file.read_text().strip()
+                if content == "1":
+                    reward = 1.0
+                elif content == "0":
+                    reward = 0.0
+                else:
+                    # Unexpected content -- try parsing as float
+                    try:
+                        reward = float(content)
+                    except (ValueError, TypeError):
+                        logger.warning(
+                            "Task %s: reward.txt content unexpected (%r), "
+                            "falling back to exit_code=%d",
+                            task_name, content, exit_code,
+                        )
+                        reward = 1.0 if exit_code == 0 else 0.0
+            else:
+                # reward.txt not written -- fall back to exit code
+                logger.warning(
+                    "Task %s: reward.txt not found after download, "
+                    "falling back to exit_code=%d",
+                    task_name, exit_code,
+                )
+                reward = 1.0 if exit_code == 0 else 0.0
+        except Exception as e:
+            logger.warning(
+                "Task %s: failed to download verifier dir: %s, "
+                "falling back to exit_code=%d",
+                task_name, e, exit_code,
+            )
+            reward = 1.0 if exit_code == 0 else 0.0
+        finally:
+            shutil.rmtree(local_verifier_dir, ignore_errors=True)
+
+        # Log test output for debugging failures
+        if reward == 0.0:
+            output_preview = output[-500:] if output else "(no output)"
+            logger.info(
+                "Task %s: FAIL (exit_code=%d)\n%s",
+                task_name, exit_code, output_preview,
+            )
+
+        return reward
+
+    # =========================================================================
+    # Evaluate -- main entry point for the eval subcommand
+    # =========================================================================
+
+    async def _eval_with_timeout(self, item: Dict[str, Any]) -> Dict:
+        """
+        Wrap rollout_and_score_eval with a per-task wall-clock timeout.
+
+        If the task exceeds task_timeout seconds, it's automatically scored
+        as FAIL. This prevents any single task from hanging indefinitely.
+        """
+        task_name = item.get("task_name", "unknown")
+        category = item.get("category", "unknown")
+        try:
+            return await asyncio.wait_for(
+                self.rollout_and_score_eval(item),
+                timeout=self.config.task_timeout,
+            )
+        except asyncio.TimeoutError:
+            from tqdm import tqdm
+            elapsed = self.config.task_timeout
+            tqdm.write(f"  [TIMEOUT] {task_name} (exceeded {elapsed}s wall-clock limit)")
+            logger.error("Task %s: wall-clock timeout after %ds", task_name, elapsed)
+            out = {
+                "passed": False, "reward": 0.0,
+                "task_name": task_name, "category": category,
+                "error": f"timeout ({elapsed}s)",
+            }
+            self._save_result(out)
+            return out
+
+    async def evaluate(self, *args, **kwargs) -> None:
+        """
+        Run Terminal-Bench 2.0 evaluation over all tasks.
+
+        This is the main entry point when invoked via:
+            python environments/terminalbench2_env.py evaluate
+
+        Runs all tasks through rollout_and_score_eval() via asyncio.gather()
+        (same pattern as GPQA and other Atropos eval envs). Each task is
+        wrapped with a wall-clock timeout so hung tasks auto-fail.
+
+        Suppresses noisy Modal/terminal output (HERMES_QUIET) so the tqdm
+        bar stays visible.
+        """
+        start_time = time.time()
+
+        # Route all logging through tqdm.write() so the progress bar stays
+        # pinned at the bottom while log lines scroll above it.
+        from tqdm import tqdm
+
+        class _TqdmHandler(logging.Handler):
+            def emit(self, record):
+                try:
+                    tqdm.write(self.format(record))
+                except Exception:
+                    self.handleError(record)
+
+        handler = _TqdmHandler()
+        handler.setFormatter(logging.Formatter(
+            "%(asctime)s [%(name)s] %(levelname)s: %(message)s",
+            datefmt="%H:%M:%S",
+        ))
+        root = logging.getLogger()
+        root.handlers = [handler]  # Replace any existing handlers
+        root.setLevel(logging.INFO)
+
+        # Silence noisy third-party loggers that flood the output
+        logging.getLogger("httpx").setLevel(logging.WARNING)      # Every HTTP request
+        logging.getLogger("openai").setLevel(logging.WARNING)     # OpenAI client retries
+        logging.getLogger("rex-deploy").setLevel(logging.WARNING) # Swerex deployment
+        logging.getLogger("rex_image_builder").setLevel(logging.WARNING)  # Image builds
+
+        print(f"\n{'='*60}")
+        print("Starting Terminal-Bench 2.0 Evaluation")
+        print(f"{'='*60}")
+        print(f"  Dataset: {self.config.dataset_name}")
+        print(f"  Total tasks: {len(self.all_eval_items)}")
+        print(f"  Max agent turns: {self.config.max_agent_turns}")
+        print(f"  Task timeout: {self.config.task_timeout}s")
+        print(f"  Terminal backend: {self.config.terminal_backend}")
+        print(f"  Tool thread pool: {self.config.tool_pool_size}")
+        print(f"  Terminal timeout: {self.config.terminal_timeout}s/cmd")
+        print(f"  Terminal lifetime: {self.config.terminal_lifetime}s (auto: task_timeout + 120)")
+        print(f"{'='*60}\n")
+
+        # Fire all tasks with wall-clock timeout, track live accuracy on the bar
+        total_tasks = len(self.all_eval_items)
+        eval_tasks = [
+            asyncio.ensure_future(self._eval_with_timeout(item))
+            for item in self.all_eval_items
+        ]
+
+        results = []
+        passed_count = 0
+        pbar = tqdm(total=total_tasks, desc="Evaluating TB2", dynamic_ncols=True)
+        try:
+            for coro in asyncio.as_completed(eval_tasks):
+                result = await coro
+                results.append(result)
+                if result and result.get("passed"):
+                    passed_count += 1
+                done = len(results)
+                pct = (passed_count / done * 100) if done else 0
+                pbar.set_postfix_str(f"pass={passed_count}/{done} ({pct:.1f}%)")
+                pbar.update(1)
+        except (KeyboardInterrupt, asyncio.CancelledError):
+            pbar.close()
+            print(f"\n\nInterrupted! Cleaning up {len(eval_tasks)} tasks...")
+            # Cancel all pending tasks
+            for task in eval_tasks:
+                task.cancel()
+            # Let cancellations propagate (finally blocks run cleanup_vm)
+            await asyncio.gather(*eval_tasks, return_exceptions=True)
+            # Belt-and-suspenders: clean up any remaining sandboxes
+            from tools.terminal_tool import cleanup_all_environments
+            cleanup_all_environments()
+            print("All sandboxes cleaned up.")
+            return
+        finally:
+            pbar.close()
+
+        end_time = time.time()
+
+        # Filter out None results (shouldn't happen, but be safe)
+        valid_results = [r for r in results if r is not None]
+
+        if not valid_results:
+            print("Warning: No valid evaluation results obtained")
+            return
+
+        # ---- Compute metrics ----
+        total = len(valid_results)
+        passed = sum(1 for r in valid_results if r.get("passed"))
+        overall_pass_rate = passed / total if total > 0 else 0.0
+
+        # Per-category breakdown
+        cat_results: Dict[str, List[Dict]] = defaultdict(list)
+        for r in valid_results:
+            cat_results[r.get("category", "unknown")].append(r)
+
+        # Build metrics dict
+        eval_metrics = {
+            "eval/pass_rate": overall_pass_rate,
+            "eval/total_tasks": total,
+            "eval/passed_tasks": passed,
+            "eval/evaluation_time_seconds": end_time - start_time,
+        }
+
+        # Per-category metrics
+        for category, cat_items in sorted(cat_results.items()):
+            cat_passed = sum(1 for r in cat_items if r.get("passed"))
+            cat_total = len(cat_items)
+            cat_pass_rate = cat_passed / cat_total if cat_total > 0 else 0.0
+            cat_key = category.replace(" ", "_").replace("-", "_").lower()
+            eval_metrics[f"eval/pass_rate_{cat_key}"] = cat_pass_rate
+
+        # Store metrics for wandb_log
+        self.eval_metrics = [(k, v) for k, v in eval_metrics.items()]
+
+        # ---- Print summary ----
+        print(f"\n{'='*60}")
+        print("Terminal-Bench 2.0 Evaluation Results")
+        print(f"{'='*60}")
+        print(f"Overall Pass Rate: {overall_pass_rate:.4f} ({passed}/{total})")
+        print(f"Evaluation Time: {end_time - start_time:.1f} seconds")
+
+        print("\nCategory Breakdown:")
+        for category, cat_items in sorted(cat_results.items()):
+            cat_passed = sum(1 for r in cat_items if r.get("passed"))
+            cat_total = len(cat_items)
+            cat_rate = cat_passed / cat_total if cat_total > 0 else 0.0
+            print(f"  {category}: {cat_rate:.1%} ({cat_passed}/{cat_total})")
+
+        # Print individual task results
+        print("\nTask Results:")
+        for r in sorted(valid_results, key=lambda x: x.get("task_name", "")):
+            status = "PASS" if r.get("passed") else "FAIL"
+            turns = r.get("turns_used", "?")
+            error = r.get("error", "")
+            extra = f" (error: {error})" if error else ""
+            print(f"  [{status}] {r['task_name']} (turns={turns}){extra}")
+
+        print(f"{'='*60}\n")
+
+        # Build sample records for evaluate_log (includes full conversations)
+        samples = [
+            {
+                "task_name": r.get("task_name"),
+                "category": r.get("category"),
+                "passed": r.get("passed"),
+                "reward": r.get("reward"),
+                "turns_used": r.get("turns_used"),
+                "error": r.get("error"),
+                "messages": r.get("messages"),
+            }
+            for r in valid_results
+        ]
+
+        # Log evaluation results
+        try:
+            await self.evaluate_log(
+                metrics=eval_metrics,
+                samples=samples,
+                start_time=start_time,
+                end_time=end_time,
+                generation_parameters={
+                    "temperature": self.config.agent_temperature,
+                    "max_tokens": self.config.max_token_length,
+                    "max_agent_turns": self.config.max_agent_turns,
+                    "terminal_backend": self.config.terminal_backend,
+                },
+            )
+        except Exception as e:
+            print(f"Error logging evaluation results: {e}")
+
+        # Close streaming file
+        if hasattr(self, "_streaming_file") and not self._streaming_file.closed:
+            self._streaming_file.close()
+            print(f"  Live results saved to: {self._streaming_path}")
+
+        # Kill all remaining sandboxes. Timed-out tasks leave orphaned thread
+        # pool workers still executing commands -- cleanup_all stops them.
+        from tools.terminal_tool import cleanup_all_environments
+        print("\nCleaning up all sandboxes...")
+        cleanup_all_environments()
+
+        # Shut down the tool thread pool so orphaned workers from timed-out
+        # tasks are killed immediately instead of retrying against dead
+        # sandboxes and spamming the console with TimeoutError warnings.
+        from environments.agent_loop import _tool_executor
+        _tool_executor.shutdown(wait=False, cancel_futures=True)
+        print("Done.")
+
+    # =========================================================================
+    # Wandb logging
+    # =========================================================================
+
+    async def wandb_log(self, wandb_metrics: Optional[Dict] = None):
+        """Log TB2-specific metrics to wandb."""
+        if wandb_metrics is None:
+            wandb_metrics = {}
+
+        # Add stored eval metrics
+        for metric_name, metric_value in self.eval_metrics:
+            wandb_metrics[metric_name] = metric_value
+        self.eval_metrics = []
+
+        await super().wandb_log(wandb_metrics)
+
+
+if __name__ == "__main__":
+    TerminalBench2EvalEnv.cli()
--- a/environments/endless_terminals/init.py
+++ b/environments/endless_terminals/init.py
@@ -0,0 +1,5 @@
+"""Endless Terminals Environment - Terminal task training from HuggingFace dataset."""
+
+from .endless_terminals_env import EndlessTerminalsEnv, EndlessTerminalsEnvConfig
+
+__all__ = ["EndlessTerminalsEnv", "EndlessTerminalsEnvConfig"]
--- a/environments/endless_terminals/default.yaml
+++ b/environments/endless_terminals/default.yaml
@@ -0,0 +1,69 @@
+# Endless Terminals Environment -- Default Configuration
+#
+# Trains agents on terminal tasks from the Endless Terminals HuggingFace dataset.
+# Uses hermes-agent backends (modal/docker/local) with per-task Docker images.
+# Tests run in the same sandbox the agent used (no separate containers needed).
+#
+# Dataset: https://huggingface.co/datasets/obiwan96/endless-terminals-train
+#
+# Prerequisites:
+#   1. Download dataset: huggingface-cli download obiwan96/endless-terminals-train \
+#        --repo-type dataset --local-dir ~/endless-terminals-data \
+#        --local-dir-use-symlinks False
+#   2. Set TASKS_BASE_DIR environment variable or configure tasks_base_dir below
+#   3. For modal backend: Configure Modal CLI (modal token set)
+#   4. For docker backend: Install Docker
+#
+# Usage:
+#   python environments/endless_terminals/endless_terminals_env.py process \
+#       --config environments/endless_terminals/default.yaml
+
+env:
+  # Toolsets
+  enabled_toolsets: ["terminal", "file"]
+
+  # Agent configuration
+  max_agent_turns: 32
+  max_token_length: 4096
+  agent_temperature: 1.0
+
+  # Terminal backend
+  terminal_backend: "local"  # Change to "modal" or "docker" for cloud isolation
+
+  # Dataset settings
+  use_dataset: true
+  dataset_name: "obiwan96/endless-terminals"
+  dataset_split: "train"
+  dataset_cache_dir: "~/.cache/huggingface/datasets"
+  tasks_base_dir: ""  # Set to directory containing task_* folders (e.g., ~/endless-terminals-data)
+
+  # Test execution
+  test_timeout_s: 60
+
+  # Training configuration
+  group_size: 8
+  total_steps: 10000
+  steps_per_eval: 500
+
+  num_eval_tasks: 10
+  eval_split_ratio: 0.1
+
+  # Logging
+  use_wandb: true
+  wandb_name: "endless-terminals"
+
+  # System prompt
+  system_prompt: >
+    You are a skilled Linux system administrator and programmer.
+    You have access to a terminal and file tools to complete system administration
+    and programming tasks. Use the tools effectively to solve the given task,
+    and verify your solution works correctly before finishing.
+
+openai:
+  base_url: "https://openrouter.ai/api/v1"
+  model_name: "anthropic/claude-sonnet-4.5"
+  server_type: "openai"
+  api_key: ""  # Loaded from OPENROUTER_API_KEY env var
+  health_check: false
+  timeout: 30  # 30 second timeout per request
+  max_retries: 2  # Only retry twice
--- a/environments/endless_terminals/endless_terminals_env.py
+++ b/environments/endless_terminals/endless_terminals_env.py
@@ -0,0 +1,921 @@
+"""
+Endless Terminals Environment for Hermes-Agent + Atropos RL.
+
+Loads pre-generated terminal tasks from HuggingFace dataset and scores
+agent performance using test execution in the agent's sandbox.
+
+Uses hermes-agent backends (modal, docker, local) with per-task Docker images
+extracted from container.def files. Tests run in the same sandbox the agent
+used, following the Terminal Bench 2 pattern.
+
+Dataset: https://huggingface.co/datasets/obiwan96/endless-terminals-train
+
+Run:
+  python environments/endless_terminals/endless_terminals_env.py process \
+    --config environments/endless_terminals/default.yaml
+"""
+
+import asyncio
+import logging
+import os
+import random
+import re
+import sys
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple
+
+from pydantic import Field
+
+# Ensure hermes-agent root is on path
+_repo_root = Path(__file__).resolve().parent.parent.parent
+if str(_repo_root) not in sys.path:
+    sys.path.insert(0, str(_repo_root))
+
+from atroposlib.envs.base import ScoredDataGroup, ScoredDataItem
+from atroposlib.type_definitions import Item
+
+from environments.hermes_base_env import HermesAgentBaseEnv, HermesAgentEnvConfig
+from environments.agent_loop import AgentResult
+from environments.tool_context import ToolContext
+from tools.terminal_tool import (
+    register_task_env_overrides,
+    clear_task_env_overrides,
+    cleanup_vm,
+)
+
+logger = logging.getLogger(__name__)
+
+# Add endless-terminals to path for imports
+ENDLESS_TERMINALS_PATH = os.getenv(
+    "ENDLESS_TERMINALS_PATH",
+    str(Path.home() / "Desktop" / "Projects" / "endless-terminals")
+)
+sys.path.insert(0, ENDLESS_TERMINALS_PATH)
+
+
+class EndlessTerminalsEnvConfig(HermesAgentEnvConfig):
+    """Configuration for Endless Terminals environment."""
+
+    # Dataset settings
+    use_dataset: bool = Field(
+        default=True,
+        description="Load tasks from HuggingFace dataset (recommended). If False, generate procedurally."
+    )
+    dataset_name: str = Field(
+        default="obiwan96/endless-terminals-train",
+        description="HuggingFace dataset name"
+    )
+    dataset_split: str = Field(
+        default="train",
+        description="Dataset split to use"
+    )
+    dataset_cache_dir: str = Field(
+        default="~/.cache/huggingface/datasets",
+        description="HuggingFace datasets cache directory"
+    )
+    tasks_base_dir: str = Field(
+        default="",
+        description="Base directory containing task_* folders. If empty, uses paths from dataset."
+    )
+
+    # Test execution
+    test_timeout_s: int = Field(default=60, description="Test execution timeout (seconds)")
+
+    # Docker image fallback
+    default_docker_image: str = Field(
+        default="ubuntu:22.04",
+        description="Default Docker image if container.def parsing fails"
+    )
+
+    # Agent defaults
+    max_agent_turns: int = Field(default=32, description="Max turns for agent (increased for long traces)")
+
+    # Evaluation settings
+    num_eval_tasks: int = Field(
+        default=10,
+        description="Number of tasks to run during periodic evaluation"
+    )
+    eval_split_ratio: float = Field(
+        default=0.1,
+        description="Fraction of dataset to hold out for evaluation (0.0-1.0)"
+    )
+
+
+class EndlessTerminalsEnv(HermesAgentBaseEnv):
+    """
+    Endless Terminals environment using pre-generated HuggingFace dataset.
+
+    Loads terminal tasks from dataset, runs agent with terminal tools,
+    and scores by executing tests in the agent's sandbox using ToolContext.
+    """
+
+    name = "endless_terminals_env"
+    env_config_cls = EndlessTerminalsEnvConfig
+
+    @classmethod
+    def config_init(cls) -> Tuple[EndlessTerminalsEnvConfig, List["APIServerConfig"]]:
+        """
+        Default configuration for Endless Terminals environment.
+
+        This is used when no config file is provided, but note that when using
+        --config, the YAML is loaded differently and this may not be called.
+        """
+        from atroposlib.envs.server_handling.server_manager import APIServerConfig
+
+        env_config = EndlessTerminalsEnvConfig(
+            enabled_toolsets=["terminal", "file"],
+            max_agent_turns=32,
+            terminal_backend="local",
+            use_dataset=True,
+            tasks_base_dir="",
+            group_size=1,
+            total_steps=1,
+            use_wandb=False,
+        )
+
+        server_configs = [
+            APIServerConfig(
+                base_url="https://openrouter.ai/api/v1",
+                model_name="anthropic/claude-sonnet-4.5",
+                server_type="openai",
+                api_key=os.getenv("OPENROUTER_API_KEY", ""),
+                health_check=False,
+            )
+        ]
+
+        return env_config, server_configs
+
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        self._dataset = None
+        self._train_dataset = None
+        self._eval_dataset = None
+        self._dataset_indices = []
+        self._current_index = 0
+
+        # Metrics tracking for wandb - single buffer with dicts
+        self._metrics_buffer = []
+
+        # Debug: check server config
+        if hasattr(self, 'server') and hasattr(self.server, 'servers'):
+            for i, srv in enumerate(self.server.servers):
+                logger.debug(f"Server {i}: model_name={getattr(srv.config, 'model_name', 'NONE')}")
+
+    async def setup(self):
+        """Load dataset from HuggingFace or local directory."""
+        if not self.config.use_dataset:
+            logger.info("Using procedural task generation (not implemented yet)")
+            return
+
+        # If tasks_base_dir is set, load from local directory instead of HuggingFace
+        if self.config.tasks_base_dir:
+            tasks_base = Path(os.path.expanduser(self.config.tasks_base_dir))
+
+            # Resolve to absolute path if relative
+            if not tasks_base.is_absolute():
+                tasks_base = Path.cwd() / tasks_base
+
+            tasks_base = tasks_base.resolve()
+
+            if not tasks_base.exists():
+                raise RuntimeError(f"tasks_base_dir not found: {tasks_base}")
+
+            logger.info(f"Loading tasks from local directory: {tasks_base}")
+
+            # Find all task_* directories
+            task_dirs = sorted(tasks_base.glob("task_*"))
+            logger.info(f"Found {len(task_dirs)} task directories")
+
+            if not task_dirs:
+                # Debug: show what's actually in the directory
+                all_items = list(tasks_base.iterdir())
+                logger.warning(f"Directory contains {len(all_items)} items:")
+                for item in all_items[:10]:
+                    logger.warning(f"  - {item.name} ({'dir' if item.is_dir() else 'file'})")
+                raise RuntimeError(f"No task_* directories found in {tasks_base}")
+
+            # Create fake dataset items (just the directory paths)
+            self._dataset = [
+                {
+                    "description": f"Task from {task_dir.name}",
+                    "extra_info": {"task_dir": str(task_dir)},
+                }
+                for task_dir in task_dirs
+            ]
+
+            logger.info(f"Loaded {len(self._dataset)} tasks from local directory")
+
+            self._split_dataset()
+            return
+
+        # Otherwise, load from HuggingFace
+        logger.info(f"Loading dataset from HuggingFace: {self.config.dataset_name}")
+
+        try:
+            from datasets import load_dataset
+
+            self._dataset = await asyncio.get_event_loop().run_in_executor(
+                None,
+                lambda: load_dataset(
+                    self.config.dataset_name,
+                    split=self.config.dataset_split,
+                    cache_dir=os.path.expanduser(self.config.dataset_cache_dir)
+                )
+            )
+
+            logger.info(f"Loaded {len(self._dataset)} tasks from HuggingFace")
+
+            self._split_dataset()
+
+        except Exception as e:
+            logger.error(f"ERROR loading dataset: {e}")
+            raise
+
+    def _split_dataset(self):
+        """Split dataset into train and eval sets based on eval_split_ratio."""
+        if self._dataset is None or len(self._dataset) == 0:
+            raise RuntimeError("Cannot split empty dataset")
+
+        total_size = len(self._dataset)
+        eval_size = int(total_size * self.config.eval_split_ratio)
+        train_size = total_size - eval_size
+
+        all_indices = list(range(total_size))
+        random.shuffle(all_indices)
+
+        train_indices = all_indices[:train_size]
+        eval_indices = all_indices[train_size:]
+
+        if isinstance(self._dataset, list):
+            self._train_dataset = [self._dataset[i] for i in train_indices]
+            self._eval_dataset = [self._dataset[i] for i in eval_indices]
+        else:
+            self._train_dataset = self._dataset.select(train_indices)
+            self._eval_dataset = self._dataset.select(eval_indices)
+
+        self._dataset_indices = list(range(len(self._train_dataset)))
+        random.shuffle(self._dataset_indices)
+        self._current_index = 0
+
+        logger.info(
+            f"Split dataset: {len(self._train_dataset)} train, "
+            f"{len(self._eval_dataset)} eval "
+            f"(ratio={self.config.eval_split_ratio:.1%})"
+        )
+
+    async def get_next_item(self) -> Item:
+        """Sample next task from training dataset."""
+        if self._train_dataset is None:
+            raise RuntimeError("Dataset not loaded. Call setup() first.")
+
+        # Get next task (with wraparound)
+        idx = self._dataset_indices[self._current_index]
+        task = self._train_dataset[idx]
+
+        # Advance to next task
+        self._current_index += 1
+        if self._current_index >= len(self._dataset_indices):
+            # Reshuffle for next epoch
+            random.shuffle(self._dataset_indices)
+            self._current_index = 0
+            logger.info("Reshuffled dataset (completed one epoch)")
+
+        # Extract task directory path
+        task_dir = task.get("extra_info", {}).get("task_dir")
+        if not task_dir:
+            task_dir = task.get("reward_spec", {}).get("ground_truth")
+
+        # Resolve task directory path
+        if task_dir:
+            task_dir_path = Path(task_dir)
+            # If tasks_base_dir is configured and path doesn't exist, reconstruct it
+            if self.config.tasks_base_dir and not task_dir_path.exists():
+                original_path = Path(task_dir)
+                task_name = original_path.name
+                task_dir_path = Path(os.path.expanduser(self.config.tasks_base_dir)) / task_name
+        else:
+            logger.error("No task directory path found in dataset item")
+            return await self.get_next_item()
+
+        # Verify directory exists
+        if not task_dir_path.exists():
+            logger.warning(f"Task dir not found: {task_dir_path}")
+            logger.warning("Hint: Set tasks_base_dir to directory containing task_* folders")
+            return await self.get_next_item()  # Try next task
+
+        # Look for test file in tests/ subdirectory first, then at root
+        final_test = task_dir_path / "tests" / "test_final_state.py"
+        if not final_test.exists():
+            final_test = task_dir_path / "test_final_state.py"
+
+        # Verify test file exists
+        if not final_test.exists():
+            logger.warning(f"Missing test file in {task_dir_path} (checked tests/ and root)")
+            return await self.get_next_item()
+
+        # Parse container.def to extract Docker image
+        # Check environment/ subdirectory first, then root
+        container_def = task_dir_path / "environment" / "container.def"
+        if not container_def.exists():
+            container_def = task_dir_path / "container.def"
+        docker_image = self._parse_docker_image_from_def(container_def)
+
+        # Try to load description from instruction.md or task.json
+        description = task.get("description", "")
+
+        # First try instruction.md
+        instruction_md = task_dir_path / "instruction.md"
+        if not description and instruction_md.exists():
+            try:
+                description = instruction_md.read_text().strip()
+            except Exception as e:
+                logger.warning(f"Failed to load instruction.md for {task_dir_path.name}: {e}")
+
+        # Fallback to task.json in environment/
+        if not description:
+            task_json = task_dir_path / "environment" / "task.json"
+            if task_json.exists():
+                try:
+                    import json
+                    task_data = json.loads(task_json.read_text())
+                    description = task_data.get("description", "") or task_data.get("instruction", "")
+                except Exception as e:
+                    logger.warning(f"Failed to load task.json for {task_dir_path.name}: {e}")
+
+        if not description:
+            description = f"Complete the task in {task_dir_path.name}"
+
+        return {
+            "task_id": f"{task_dir_path.name}",
+            "task_name": task_dir_path.name,
+            "description": description,
+            "task_dir": str(task_dir_path),
+            "final_test": str(final_test),
+            "docker_image": docker_image,
+            "dataset_index": idx,
+        }
+
+    def format_prompt(self, item: Item) -> str:
+        """Return the task description for the agent."""
+        return str(item.get("description", ""))
+
+    def _parse_docker_image_from_def(self, container_def_path: Path) -> str:
+        """
+        Parse container.def file to extract the Docker base image.
+
+        Apptainer definition files typically look like:
+            Bootstrap: docker
+            From: ubuntu:22.04
+
+        Returns the image from the "From:" line, or falls back to default.
+        """
+        if not container_def_path.exists():
+            logger.warning(f"container.def not found at {container_def_path}, using default image")
+            return self.config.default_docker_image
+
+        try:
+            content = container_def_path.read_text()
+            # Look for "From: <image>" line (case-insensitive)
+            match = re.search(r'^From:\s*(.+)$', content, re.MULTILINE | re.IGNORECASE)
+            if match:
+                image = match.group(1).strip()
+                logger.info(f"Extracted Docker image from container.def: {image}")
+                return image
+        except Exception as e:
+            logger.warning(f"Failed to parse {container_def_path}: {e}")
+
+        logger.warning(f"Could not extract image from {container_def_path}, using default")
+        return self.config.default_docker_image
+
+    async def collect_trajectory(
+        self, item: Item
+    ) -> Tuple[Optional[ScoredDataItem], List[Item]]:
+        """
+        Override to register per-task Docker image before running the agent.
+
+        Follows Terminal Bench 2 pattern: register_task_env_overrides() tells
+        the hermes-agent terminal backend to use a specific Docker image for
+        this task_id.
+
+        This is a copy of HermesAgentBaseEnv.collect_trajectory with Docker
+        image registration added after task_id generation.
+        """
+        import uuid
+        from environments.agent_loop import HermesAgentLoop
+
+        task_id = str(uuid.uuid4())
+        task_name = item.get("task_name", "unknown")
+        docker_image = item.get("docker_image", self.config.default_docker_image)
+
+        logger.debug(f"collect_trajectory START for {task_name}")
+
+        # Register Docker image override for this task_id
+        logger.debug(f"Registering Docker image: {docker_image}")
+        register_task_env_overrides(task_id, {"modal_image": docker_image})
+        logger.info(
+            f"Task {task_name}: registered Docker image {docker_image} for task_id {task_id[:8]}"
+        )
+        logger.debug("Docker image registered")
+
+        try:
+            # Get group-level tools (resolved once in collect_trajectories)
+            logger.debug("Resolving tools...")
+            if self._current_group_tools is None:
+                tools, valid_names = self._resolve_tools_for_group()
+            else:
+                tools, valid_names = self._current_group_tools
+            logger.debug(f"Tools resolved: {len(tools)} tools")
+
+            # Build initial messages
+            logger.debug("Building initial messages...")
+            messages: List[Dict[str, Any]] = []
+            if self.config.system_prompt:
+                messages.append({"role": "system", "content": self.config.system_prompt})
+            messages.append({"role": "user", "content": self.format_prompt(item)})
+            logger.debug("Messages built, starting agent loop...")
+
+            # Run the agent loop
+            result: AgentResult
+            managed_state: Optional[Dict[str, Any]] = None
+
+            if self._use_managed_server():
+                # Phase 2: ManagedServer with parser
+                from environments.tool_call_parsers import get_parser
+                try:
+                    tc_parser = get_parser(self.config.tool_call_parser)
+                except KeyError:
+                    logger.warning(
+                        "Tool call parser '%s' not found, falling back to 'hermes'",
+                        self.config.tool_call_parser,
+                    )
+                    tc_parser = get_parser("hermes")
+
+                try:
+                    async with self.server.managed_server(
+                        tokenizer=self.tokenizer,
+                        tool_call_parser=tc_parser,
+                    ) as managed:
+                        agent = HermesAgentLoop(
+                            server=managed,
+                            tool_schemas=tools,
+                            valid_tool_names=valid_names,
+                            max_turns=self.config.max_agent_turns,
+                            task_id=task_id,
+                            temperature=self.config.agent_temperature,
+                            max_tokens=self.config.max_token_length,
+                            extra_body=self.config.extra_body,
+                        )
+                        result = await agent.run(messages)
+
+                        # Get state directly from managed server while still in context
+                        managed_state = managed.get_state()
+                except NotImplementedError:
+                    # DummyManagedServer not allowed
+                    logger.warning("ManagedServer not available. Falling back to direct server mode.")
+                    agent = HermesAgentLoop(
+                        server=self.server,
+                        tool_schemas=tools,
+                        valid_tool_names=valid_names,
+                        max_turns=self.config.max_agent_turns,
+                        task_id=task_id,
+                        temperature=self.config.agent_temperature,
+                        max_tokens=self.config.max_token_length,
+                        extra_body=self.config.extra_body,
+                    )
+                    result = await agent.run(messages)
+            else:
+                # Phase 1: OpenAI server
+                agent = HermesAgentLoop(
+                    server=self.server,
+                    tool_schemas=tools,
+                    valid_tool_names=valid_names,
+                    max_turns=self.config.max_agent_turns,
+                    task_id=task_id,
+                    temperature=self.config.agent_temperature,
+                    max_tokens=self.config.max_token_length,
+                    extra_body=self.config.extra_body,
+                )
+                result = await agent.run(messages)
+
+            # Skip reward computation if agent produced no output
+            only_system_and_user = all(
+                msg.get("role") in ("system", "user") for msg in result.messages
+            )
+            if result.turns_used == 0 or only_system_and_user:
+                logger.warning(
+                    "Agent loop produced no output (turns=%d). Skipping trajectory.",
+                    result.turns_used,
+                )
+                # Return None to skip this trajectory (likely an API failure)
+                return None, []
+            else:
+                # Compute reward using ToolContext
+                ctx = ToolContext(task_id)
+                try:
+                    reward = await self.compute_reward(item, result, ctx)
+                except Exception as e:
+                    logger.error("compute_reward failed: %s", e)
+                    reward = 0.0
+                finally:
+                    ctx.cleanup()
+
+            # Track metrics for wandb logging
+            task_metrics = {
+                "test_passed": 1.0 if reward > 0.5 else 0.0,
+                "reward": reward,
+                "turns_used": result.turns_used,
+                "finished_naturally": result.finished_naturally,
+                "docker_image": docker_image,
+                "num_tool_errors": len(result.tool_errors),
+            }
+
+            # Include detailed tool errors if any occurred
+            if result.tool_errors:
+                task_metrics["tool_errors"] = [
+                    {
+                        "turn": err.turn,
+                        "tool": err.tool_name,
+                        "error": err.error[:200],
+                    }
+                    for err in result.tool_errors
+                ]
+
+            self._metrics_buffer.append(task_metrics)
+
+            # ============================================================================
+            # Build ScoredDataGroup from ManagedServer state
+            # ============================================================================
+            # Phase 2: Extract pre-computed data from SequenceNodes
+            # We may have multiple trajectories in the nodes due to how interesting
+            # agents can be, so iterate through all nodes and return multiple sequences.
+            #
+            # Each SequenceNode contains:
+            # - tokens: Full unmasked token sequence [1, 2, 3, ..., N]
+            # - masked_tokens: Training format [-100, -100, ..., -100, actual, actual, ...]
+            # - logprobs: Training format [1.0, 1.0, ..., 1.0, -0.5, -0.3, ...]
+            # - full_text: Complete text (prompt + all completions)
+            #
+            # Phase 1: Create placeholder tokens for OpenAI-style servers
+            # ============================================================================
+            nodes = (managed_state or {}).get("nodes", []) if managed_state else []
+
+            # Create ScoredDataGroup with lists for multiple trajectories
+            scored_group = ScoredDataGroup()
+            scored_group["tokens"] = []
+            scored_group["masks"] = []
+            scored_group["scores"] = []
+            scored_group["messages"] = []
+            scored_group["inference_logprobs"] = []
+
+            if nodes:
+                # Phase 2: iterate through all nodes (may have multiple trajectories)
+                for i, node in enumerate(nodes):
+                    scored_group["tokens"].append(node.tokens)
+                    scored_group["masks"].append(node.masked_tokens)
+                    scored_group["scores"].append(reward)
+                    scored_group["messages"].append(result.messages)
+
+                    if hasattr(node, "logprobs") and node.logprobs:
+                        scored_group["inference_logprobs"].append(node.logprobs)
+                    else:
+                        # Placeholder logprobs if not available
+                        scored_group["inference_logprobs"].append([1.0] * len(node.tokens))
+
+                    logger.debug(f"Added trajectory {i+1}/{len(nodes)} with {len(node.tokens)} tokens")
+
+            else:
+                # Phase 1: create placeholder tokens for OpenAI-style servers
+                full_text = "\n".join(
+                    msg.get("content", "") for msg in result.messages if msg.get("content")
+                )
+                if self.tokenizer:
+                    tokens = self.tokenizer.encode(full_text, add_special_tokens=True)
+                else:
+                    tokens = list(range(min(len(full_text) // 4, 128)))
+
+                scored_group["tokens"].append(tokens)
+                scored_group["masks"].append([-100] + tokens[1:])
+                scored_group["scores"].append(reward)
+                scored_group["messages"].append(result.messages)
+                scored_group["inference_logprobs"].append([1.0] * len(tokens))
+
+            # Return None if no trajectories collected
+            if len(scored_group["tokens"]) == 0:
+                return None, []
+
+            logger.debug(f"Returning ScoredDataGroup with {len(scored_group['tokens'])} trajectories")
+            return scored_group, []
+
+        finally:
+            # Clean up task overrides and sandbox
+            clear_task_env_overrides(task_id)
+            try:
+                cleanup_vm(task_id)
+            except Exception as e:
+                logger.debug(f"VM cleanup for {task_id[:8]}: {e}")
+
+    async def compute_reward(
+        self,
+        item: Item,
+        result: AgentResult,
+        ctx: ToolContext
+    ) -> float:
+        """
+        Run final tests in the agent's sandbox and return binary reward.
+
+        Uses ToolContext to execute pytest in the SAME sandbox the agent used,
+        following the Terminal Bench 2 verification pattern. No separate
+        Apptainer execution needed.
+
+        Returns 1.0 if tests pass, 0.0 otherwise.
+        """
+        task_name = item.get("task_name", "unknown")
+        final_test_path = Path(item.get("final_test", ""))
+
+        if not final_test_path.exists():
+            logger.error(f"Task {task_name}: test file not found at {final_test_path}")
+            return 0.0
+
+        logger.info(f"Task {task_name}: running tests in sandbox...")
+
+        try:
+            # Run tests in a thread to avoid blocking the event loop
+            loop = asyncio.get_event_loop()
+            reward = await loop.run_in_executor(
+                None,
+                self._run_tests_in_sandbox,
+                final_test_path,
+                ctx,
+                task_name,
+            )
+
+            status = "PASS" if reward == 1.0 else "FAIL"
+            logger.info(f"Task {task_name}: {status} (reward={reward})")
+            return reward
+
+        except Exception as e:
+            logger.error(f"Task {task_name}: test execution failed: {e}", exc_info=True)
+            return 0.0
+
+    def _run_tests_in_sandbox(
+        self,
+        test_file_path: Path,
+        ctx: ToolContext,
+        task_name: str,
+    ) -> float:
+        """
+        Upload test file to sandbox and execute pytest.
+
+        Runs in thread pool (via run_in_executor) to avoid blocking the event loop
+        with synchronous ToolContext calls.
+
+        Args:
+            test_file_path: Local path to test_final_state.py
+            ctx: ToolContext scoped to the agent's sandbox
+            task_name: For logging
+
+        Returns:
+            1.0 if tests pass, 0.0 otherwise
+        """
+        try:
+            # Upload test file to sandbox
+            test_content = test_file_path.read_text()
+            ctx.write_file("/workspace/test_final_state.py", test_content)
+            logger.debug(f"Task {task_name}: uploaded test file to /workspace/test_final_state.py")
+
+            # Run pytest in the sandbox
+            result = ctx.terminal(
+                "cd /workspace && python -m pytest -q test_final_state.py",
+                timeout=self.config.test_timeout_s,
+            )
+
+            exit_code = result.get("exit_code", -1)
+            output = result.get("output", "")
+
+            if exit_code == 0:
+                logger.debug(f"Task {task_name}: tests passed")
+                return 1.0
+            else:
+                # Log failure output (last 500 chars for debugging)
+                output_preview = output[-500:] if output else "(no output)"
+                logger.info(
+                    f"Task {task_name}: tests failed (exit_code={exit_code})\n{output_preview}"
+                )
+                return 0.0
+
+        except Exception as e:
+            logger.error(f"Task {task_name}: error running tests: {e}")
+            return 0.0
+
+    async def evaluate(self):
+        """
+        Periodic evaluation on holdout eval set.
+
+        Runs the agent on num_eval_tasks from the held-out eval set
+        (never seen during training). Returns metrics for wandb logging.
+        """
+        if self._eval_dataset is None:
+            logger.warning("Cannot evaluate: eval dataset not loaded")
+            return {}
+
+        if len(self._eval_dataset) == 0:
+            logger.warning("Eval dataset is empty")
+            return {}
+
+        # Use min of num_eval_tasks and actual eval set size
+        num_tasks = min(self.config.num_eval_tasks, len(self._eval_dataset))
+        logger.info(f"Starting evaluation on {num_tasks} held-out tasks...")
+
+        eval_metrics = {
+            "rewards": [],
+            "passes": [],
+            "turns": [],
+            "natural_finishes": [],
+        }
+
+        # Sample from eval set (holdout)
+        import random
+        eval_indices = random.sample(range(len(self._eval_dataset)), num_tasks)
+
+        for idx in eval_indices:
+            task = self._eval_dataset[idx]
+
+            # Build item using same logic as get_next_item
+            task_dir = task.get("extra_info", {}).get("task_dir")
+            if not task_dir:
+                task_dir = task.get("reward_spec", {}).get("ground_truth")
+
+            if not task_dir:
+                continue
+
+            task_dir_path = Path(task_dir)
+            if self.config.tasks_base_dir and not task_dir_path.exists():
+                original_path = Path(task_dir)
+                task_name = original_path.name
+                task_dir_path = Path(os.path.expanduser(self.config.tasks_base_dir)) / task_name
+
+            if not task_dir_path.exists():
+                continue
+
+            # Find test file
+            final_test = task_dir_path / "tests" / "test_final_state.py"
+            if not final_test.exists():
+                final_test = task_dir_path / "test_final_state.py"
+            if not final_test.exists():
+                continue
+
+            # Parse Docker image
+            container_def = task_dir_path / "environment" / "container.def"
+            if not container_def.exists():
+                container_def = task_dir_path / "container.def"
+            docker_image = self._parse_docker_image_from_def(container_def)
+
+            # Load description
+            description = task.get("description", "")
+            instruction_md = task_dir_path / "instruction.md"
+            if not description and instruction_md.exists():
+                try:
+                    description = instruction_md.read_text().strip()
+                except Exception:
+                    pass
+
+            item = {
+                "description": description,
+                "final_test": str(final_test),
+                "docker_image": docker_image,
+            }
+
+            # Run agent on this task
+            try:
+                import uuid
+                task_id = str(uuid.uuid4())
+
+                # Register task environment
+                from model_tools import register_task_env_overrides
+                register_task_env_overrides(task_id, {"modal_image": docker_image})
+
+                # Build messages
+                messages = [
+                    {"role": "system", "content": self.config.system_prompt},
+                    {"role": "user", "content": description or "Complete the task."},
+                ]
+
+                # Get tools
+                from model_tools import get_tool_definitions
+                tools = get_tool_definitions(self.config.enabled_toolsets)
+                valid_names = {t["function"]["name"] for t in tools}
+
+                # Run agent
+                from environments.agent_loop import HermesAgentLoop
+                agent = HermesAgentLoop(
+                    server=self.server,
+                    tool_schemas=tools,
+                    valid_tool_names=valid_names,
+                    max_turns=self.config.max_agent_turns,
+                    task_id=task_id,
+                    temperature=self.config.agent_temperature,
+                    max_tokens=self.config.max_token_length,
+                    extra_body=self.config.extra_body,
+                )
+                result = await agent.run(messages)
+
+                # Compute reward
+                from environments.tool_context import ToolContext
+                ctx = ToolContext(task_id)
+                try:
+                    reward = await self.compute_reward(item, result, ctx)
+                except Exception as e:
+                    logger.warning(f"Eval reward computation failed: {e}")
+                    reward = 0.0
+                finally:
+                    ctx.cleanup()
+
+                # Track metrics
+                eval_metrics["rewards"].append(reward)
+                eval_metrics["passes"].append(1.0 if reward > 0.5 else 0.0)
+                eval_metrics["turns"].append(result.turns_used)
+                eval_metrics["natural_finishes"].append(1.0 if result.finished_naturally else 0.0)
+
+            except Exception as e:
+                logger.error(f"Eval task failed: {e}")
+                continue
+            finally:
+                # Cleanup
+                from model_tools import clear_task_env_overrides, cleanup_vm
+                clear_task_env_overrides(task_id)
+                cleanup_vm(task_id)
+
+        # Aggregate metrics
+        if not eval_metrics["rewards"]:
+            logger.warning("No eval tasks completed successfully")
+            return {}
+
+        aggregated = {
+            "eval/pass_rate": sum(eval_metrics["passes"]) / len(eval_metrics["passes"]),
+            "eval/avg_reward": sum(eval_metrics["rewards"]) / len(eval_metrics["rewards"]),
+            "eval/avg_turns": sum(eval_metrics["turns"]) / len(eval_metrics["turns"]),
+            "eval/natural_finish_rate": sum(eval_metrics["natural_finishes"]) / len(eval_metrics["natural_finishes"]),
+            "eval/num_tasks": len(eval_metrics["rewards"]),
+        }
+
+        logger.info(f"Evaluation complete: pass_rate={aggregated['eval/pass_rate']:.2%}, avg_turns={aggregated['eval/avg_turns']:.1f}")
+        return aggregated
+
+    async def wandb_log(self, wandb_metrics: Optional[Dict] = None):
+        """Log Endless Terminals specific metrics to wandb."""
+        if wandb_metrics is None:
+            wandb_metrics = {}
+
+        # Aggregate metrics from buffer
+        if self._metrics_buffer:
+            # Test pass rate
+            test_passes = [m["test_passed"] for m in self._metrics_buffer]
+            wandb_metrics["endless_terminals/test_pass_rate"] = sum(test_passes) / len(test_passes)
+            wandb_metrics["endless_terminals/num_tests_passed"] = sum(test_passes)
+            wandb_metrics["endless_terminals/num_tests_total"] = len(test_passes)
+
+            # Turns used statistics
+            turns = [m["turns_used"] for m in self._metrics_buffer]
+            wandb_metrics["endless_terminals/avg_turns_used"] = sum(turns) / len(turns)
+            wandb_metrics["endless_terminals/max_turns_used"] = max(turns)
+            wandb_metrics["endless_terminals/min_turns_used"] = min(turns)
+
+            # Natural finish rate (did model stop on its own vs hitting max turns)
+            natural_finishes = [1.0 if m["finished_naturally"] else 0.0 for m in self._metrics_buffer]
+            wandb_metrics["endless_terminals/natural_finish_rate"] = sum(natural_finishes) / len(natural_finishes)
+
+            # Tool error statistics
+            total_tool_errors = sum(m["num_tool_errors"] for m in self._metrics_buffer)
+            wandb_metrics["endless_terminals/total_tool_errors"] = total_tool_errors
+            wandb_metrics["endless_terminals/avg_tool_errors_per_task"] = total_tool_errors / len(self._metrics_buffer)
+
+            # Docker image distribution (count unique images used)
+            docker_images = [m["docker_image"] for m in self._metrics_buffer]
+            unique_images = set(docker_images)
+            wandb_metrics["endless_terminals/num_unique_docker_images"] = len(unique_images)
+
+            # Log most common errors if any
+            all_errors = []
+            for m in self._metrics_buffer:
+                if "tool_errors" in m:
+                    all_errors.extend(m["tool_errors"])
+
+            if all_errors:
+                # Count error types
+                error_tools = {}
+                for err in all_errors:
+                    tool = err["tool"]
+                    error_tools[tool] = error_tools.get(tool, 0) + 1
+
+                # Log top 3 error-prone tools
+                for i, (tool, count) in enumerate(sorted(error_tools.items(), key=lambda x: x[1], reverse=True)[:3]):
+                    wandb_metrics[f"endless_terminals/errors_by_tool/{tool}"] = count
+
+            # Clear buffer after logging
+            self._metrics_buffer = []
+
+        await super().wandb_log(wandb_metrics)
+
+
+if __name__ == "__main__":
+    EndlessTerminalsEnv.cli()
--- a/environments/hermes_base_env.py
+++ b/environments/hermes_base_env.py
@@ -0,0 +1,672 @@
+"""
+HermesAgentBaseEnv -- Abstract Base Environment for Hermes-Agent + Atropos
+
+Provides the Atropos integration plumbing that all hermes-agent environments share:
+- Two-mode operation (OpenAI server for Phase 1, VLLM ManagedServer for Phase 2)
+- Per-group toolset/distribution resolution
+- Agent loop orchestration via HermesAgentLoop
+- ToolContext creation for reward functions
+- ScoredDataGroup construction from ManagedServer state
+
+Subclasses only need to implement:
+    setup()           -- Load dataset, initialize state
+    get_next_item()   -- Return the next item from the dataset
+    format_prompt()   -- Convert a dataset item into the user message
+    compute_reward()  -- Score the rollout (has full ToolContext access)
+    evaluate()        -- Periodic evaluation
+"""
+
+import asyncio
+import json
+import logging
+import os
+import sys
+import uuid
+from abc import abstractmethod
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Set, Tuple, Union
+
+# Ensure the hermes-agent repo root is on sys.path so that imports like
+# `from model_tools import ...` and `from environments.X import ...` work
+# regardless of where the script is invoked from.
+_repo_root = Path(__file__).resolve().parent.parent
+if str(_repo_root) not in sys.path:
+    sys.path.insert(0, str(_repo_root))
+
+from dotenv import load_dotenv
+from pydantic import Field
+
+# Load API keys from hermes-agent/.env so all environments can access them
+_env_path = _repo_root / ".env"
+if _env_path.exists():
+    load_dotenv(dotenv_path=_env_path)
+
+# Apply monkey patches for async-safe tool operation inside Atropos's event loop.
+# This patches SwerexModalEnvironment to use a background thread instead of
+# asyncio.run(), which would deadlock inside Atropos. Safe for normal CLI too.
+from environments.patches import apply_patches
+apply_patches()
+
+from atroposlib.envs.base import (
+    BaseEnv,
+    BaseEnvConfig,
+    ScoredDataGroup,
+    ScoredDataItem,
+)
+from atroposlib.envs.server_handling.server_manager import (
+    APIServerConfig,
+    ServerBaseline,
+    ServerManager,
+)
+from atroposlib.type_definitions import Item
+
+from environments.agent_loop import AgentResult, HermesAgentLoop
+from environments.tool_context import ToolContext
+
+# Import hermes-agent toolset infrastructure
+from model_tools import get_tool_definitions
+from toolset_distributions import sample_toolsets_from_distribution
+
+logger = logging.getLogger(__name__)
+
+
+class HermesAgentEnvConfig(BaseEnvConfig):
+    """
+    Configuration for hermes-agent Atropos environments.
+
+    Extends BaseEnvConfig with agent-specific settings for toolsets,
+    terminal backend, dataset loading, and tool call parsing.
+    """
+
+    # --- Toolset configuration ---
+    # Mutually exclusive: use either enabled_toolsets OR distribution
+    enabled_toolsets: Optional[List[str]] = Field(
+        default=None,
+        description="Explicit list of hermes toolsets to enable (e.g., ['terminal', 'file', 'web']). "
+        "If None and distribution is also None, all available toolsets are enabled.",
+    )
+    disabled_toolsets: Optional[List[str]] = Field(
+        default=None,
+        description="Toolsets to disable. Applied as a filter on top of enabled_toolsets or distribution.",
+    )
+    distribution: Optional[str] = Field(
+        default=None,
+        description="Name of a toolset distribution from toolset_distributions.py "
+        "(e.g., 'development', 'terminal_tasks'). Sampled once per group. "
+        "Mutually exclusive with enabled_toolsets.",
+    )
+
+    # --- Agent loop configuration ---
+    max_agent_turns: int = Field(
+        default=30,
+        description="Maximum number of LLM calls (tool-calling iterations) per rollout.",
+    )
+    system_prompt: Optional[str] = Field(
+        default=None,
+        description="System prompt for the agent. Tools are handled via the tools= parameter, "
+        "not embedded in the prompt text.",
+    )
+    agent_temperature: float = Field(
+        default=1.0,
+        description="Sampling temperature for agent generation during rollouts.",
+    )
+
+    # --- Terminal backend ---
+    terminal_backend: str = Field(
+        default="local",
+        description="Terminal backend: 'local', 'docker', 'modal', 'ssh', 'singularity'. "
+        "Modal recommended for production RL (cloud isolation per rollout).",
+    )
+    terminal_timeout: int = Field(
+        default=120,
+        description="Per-command timeout in seconds for terminal tool calls. "
+        "Commands exceeding this are killed. Increase for tasks with long-running "
+        "commands (compilation, pip install, etc.).",
+    )
+    terminal_lifetime: int = Field(
+        default=3600,
+        description="Sandbox inactivity lifetime in seconds. The cleanup thread kills "
+        "sandboxes that have been idle longer than this. Must be longer than "
+        "the longest gap between tool calls (e.g., waiting for LLM response).",
+    )
+
+    # --- Dataset ---
+    dataset_name: Optional[str] = Field(
+        default=None,
+        description="HuggingFace dataset name. Optional if tasks are defined inline.",
+    )
+    dataset_split: str = Field(
+        default="train",
+        description="Dataset split to use.",
+    )
+    prompt_field: str = Field(
+        default="prompt",
+        description="Which field in the dataset contains the prompt.",
+    )
+
+    # --- Thread pool ---
+    tool_pool_size: int = Field(
+        default=128,
+        description="Thread pool size for tool execution. Each concurrent task needs a "
+        "thread for tool calls. Must be large enough for parallel evaluation. "
+        "Too small = thread pool starvation.",
+    )
+
+    # --- Phase 2: Tool call parsing ---
+    tool_call_parser: str = Field(
+        default="hermes",
+        description="Tool call parser name for Phase 2 (VLLM server type). "
+        "Ignored in Phase 1 (OpenAI server type where VLLM parses natively). "
+        "Options: hermes, mistral, llama3_json, qwen, deepseek_v3, etc.",
+    )
+
+    # --- Provider-specific parameters ---
+    # Passed as extra_body to the OpenAI client's chat.completions.create() call.
+    # Useful for OpenRouter provider preferences, transforms, route settings, etc.
+    # Example YAML:
+    #   extra_body:
+    #     provider:
+    #       ignore: ["DeepInfra", "Fireworks"]
+    #       order: ["Together"]
+    #     transforms: ["middle-out"]
+    extra_body: Optional[Dict[str, Any]] = Field(
+        default=None,
+        description="Extra body parameters passed to the OpenAI client's "
+        "chat.completions.create(). Used for OpenRouter provider preferences, "
+        "transforms, and other provider-specific settings.",
+    )
+
+
+class HermesAgentBaseEnv(BaseEnv):
+    """
+    Abstract base environment for hermes-agent Atropos integration.
+
+    Handles two modes of operation:
+    - Phase 1 (OpenAI server type): Uses server.chat_completion() directly.
+      The server (VLLM, SGLang, OpenRouter, OpenAI) handles tool call parsing
+      and reasoning extraction natively. DummyManagedServer provides placeholder
+      tokens. Good for SFT data gen, verifier testing, evaluation.
+
+    - Phase 2 (VLLM server type): Uses ManagedServer for exact token IDs + logprobs
+      via /generate. Client-side tool call parser reconstructs structured tool_calls
+      from raw output. Full RL training capability.
+
+    Subclasses must implement:
+        setup()           -- Load dataset, initialize state
+        get_next_item()   -- Return the next item to roll out
+        format_prompt()   -- Convert a dataset item into the user message string
+        compute_reward()  -- Score the rollout using ToolContext
+        evaluate()        -- Periodic evaluation
+    """
+
+    name: Optional[str] = "hermes-agent"
+    env_config_cls = HermesAgentEnvConfig
+
+    def __init__(
+        self,
+        config: HermesAgentEnvConfig,
+        server_configs: Union[ServerBaseline, List[APIServerConfig]],
+        slurm=False,
+        testing=False,
+    ):
+        super().__init__(config, server_configs, slurm, testing)
+
+        # Set terminal environment variables so hermes tools pick them up.
+        # These can all be overridden per-environment via config fields instead
+        # of requiring users to set shell env vars.
+        if config.terminal_backend:
+            os.environ["TERMINAL_ENV"] = config.terminal_backend
+        os.environ["TERMINAL_TIMEOUT"] = str(config.terminal_timeout)
+        os.environ["TERMINAL_LIFETIME_SECONDS"] = str(config.terminal_lifetime)
+        print(
+            f"🖥️  Terminal: backend={config.terminal_backend}, "
+            f"timeout={config.terminal_timeout}s, lifetime={config.terminal_lifetime}s"
+        )
+
+        # Resize the agent loop's thread pool for tool execution.
+        # This must be large enough for the number of concurrent tasks
+        # (e.g., 89 parallel TB2 eval tasks each need a thread for tool calls).
+        from environments.agent_loop import resize_tool_pool
+        resize_tool_pool(config.tool_pool_size)
+
+        # Current group's resolved tools (set in collect_trajectories)
+        self._current_group_tools: Optional[Tuple[List[Dict], Set[str]]] = None
+
+        # Tool error tracking for wandb logging
+        self._tool_error_buffer: List[Dict[str, Any]] = []
+
+    # =========================================================================
+    # Toolset resolution (per-group)
+    # =========================================================================
+
+    def _resolve_tools_for_group(self) -> Tuple[List[Dict[str, Any]], Set[str]]:
+        """
+        Resolve toolsets for a group. Called once in collect_trajectories(),
+        then shared by all collect_trajectory() calls in the group.
+
+        If distribution is set, samples probabilistically.
+        If enabled_toolsets is set, uses that explicit list.
+        disabled_toolsets is applied as a filter on top.
+
+        Returns:
+            (tool_schemas, valid_tool_names) tuple
+        """
+        config = self.config
+
+        if config.distribution:
+            group_toolsets = sample_toolsets_from_distribution(config.distribution)
+            logger.info("Sampled toolsets from '%s': %s", config.distribution, group_toolsets)
+        else:
+            group_toolsets = config.enabled_toolsets  # None means "all available"
+            if group_toolsets is None:
+                logger.warning(
+                    "enabled_toolsets is None -- loading ALL tools including messaging. "
+                    "Set explicit enabled_toolsets for RL training."
+                )
+
+        tools = get_tool_definitions(
+            enabled_toolsets=group_toolsets,
+            disabled_toolsets=config.disabled_toolsets,
+            quiet_mode=True,
+        )
+
+        valid_names = {t["function"]["name"] for t in tools} if tools else set()
+        logger.info("Resolved %d tools for group: %s", len(valid_names), sorted(valid_names))
+        return tools, valid_names
+
+    # =========================================================================
+    # Server mode detection
+    # =========================================================================
+
+    def _use_managed_server(self) -> bool:
+        """
+        Determine if we should use ManagedServer (Phase 2) or direct server (Phase 1).
+
+        Phase 2 (ManagedServer) is used when the server type is 'vllm' or 'sglang',
+        which go through the /generate endpoint for exact token tracking.
+
+        Phase 1 (direct server) is used for 'openai' server type, which uses
+        /v1/chat/completions with native tool call parsing.
+        """
+        if not self.server.servers:
+            return False
+
+        server = self.server.servers[0]
+        # If the server is an OpenAI server (not VLLM/SGLang), use direct mode
+        from atroposlib.envs.server_handling.openai_server import OpenAIServer
+        return not isinstance(server, OpenAIServer)
+
+    # =========================================================================
+    # Core Atropos integration
+    # =========================================================================
+
+    async def collect_trajectories(
+        self, item: Item
+    ) -> Tuple[
+        Union[Optional[ScoredDataGroup], List[Optional[ScoredDataGroup]]],
+        List[Item],
+    ]:
+        """
+        Override collect_trajectories to resolve toolsets once per group,
+        then delegate to the standard group-level collection.
+
+        The default BaseEnv.collect_trajectories() calls collect_trajectory()
+        group_size times in parallel. We resolve tools once here and store
+        them for all those calls to use.
+        """
+        # Resolve toolsets for this group (shared by all rollouts in the group)
+        self._current_group_tools = self._resolve_tools_for_group()
+
+        # Delegate to the default implementation which calls collect_trajectory()
+        # group_size times via asyncio.gather
+        return await super().collect_trajectories(item)
+
+    # =========================================================================
+    # Wandb rollout display -- format trajectories nicely
+    # =========================================================================
+
+    @staticmethod
+    def _format_trajectory_for_display(messages: List[Dict[str, Any]]) -> str:
+        """
+        Format a conversation's messages into a readable trajectory string
+        for wandb rollout tables. Shows tool calls, tool results, and reasoning
+        in a structured way instead of raw token decoding.
+        """
+        parts = []
+        for msg in messages:
+            role = msg.get("role", "unknown")
+            content = msg.get("content", "")
+
+            if role == "system":
+                parts.append(f"[SYSTEM]\n{content}")
+
+            elif role == "user":
+                parts.append(f"[USER]\n{content}")
+
+            elif role == "assistant":
+                # Show reasoning if present
+                reasoning = msg.get("reasoning_content", "")
+                if reasoning:
+                    # Truncate long reasoning for display
+                    if len(reasoning) > 300:
+                        reasoning = reasoning[:300] + "..."
+                    parts.append(f"[ASSISTANT thinking]\n{reasoning}")
+
+                # Show content
+                if content:
+                    parts.append(f"[ASSISTANT]\n{content}")
+
+                # Show tool calls
+                tool_calls = msg.get("tool_calls", [])
+                for tc in tool_calls:
+                    func = tc.get("function", {})
+                    name = func.get("name", "?")
+                    args = func.get("arguments", "{}")
+                    # Truncate long arguments for display
+                    if len(args) > 200:
+                        args = args[:200] + "..."
+                    parts.append(f"[TOOL CALL] {name}({args})")
+
+            elif role == "tool":
+                tool_id = msg.get("tool_call_id", "")
+                result = content
+                # Truncate long tool results for display
+                if len(result) > 500:
+                    result = result[:500] + "..."
+                parts.append(f"[TOOL RESULT] {result}")
+
+        return "\n\n".join(parts)
+
+    async def add_rollouts_for_wandb(
+        self,
+        scored_data,
+        item=None,
+    ):
+        """
+        Override to show formatted trajectories with tool calls visible,
+        instead of raw token decoding which loses all structure.
+        """
+        num_keep = self.config.num_rollouts_per_group_for_logging
+        if num_keep == -1:
+            num_keep = self.config.group_size
+
+        group = []
+        for i in range(min(num_keep, len(scored_data.get("scores", [])))):
+            score = scored_data["scores"][i]
+
+            # Use messages if available for rich display
+            messages = None
+            if scored_data.get("messages") and i < len(scored_data["messages"]):
+                messages = scored_data["messages"][i]
+
+            if messages:
+                text = self._format_trajectory_for_display(messages)
+            elif scored_data.get("tokens") and i < len(scored_data["tokens"]):
+                text = self.tokenizer.decode(scored_data["tokens"][i])
+            else:
+                text = "(no data)"
+
+            group.append((text, score))
+
+        self.rollouts_for_wandb.append(group)
+        if len(self.rollouts_for_wandb) > self.config.num_rollouts_to_keep:
+            self.rollouts_for_wandb.pop(0)
+
+    async def wandb_log(self, wandb_metrics: Optional[Dict] = None):
+        """Log base metrics including tool errors to wandb."""
+        if wandb_metrics is None:
+            wandb_metrics = {}
+
+        # Log tool error stats
+        if self._tool_error_buffer:
+            wandb_metrics["train/tool_errors_count"] = len(self._tool_error_buffer)
+
+            # Log error details as a summary string (tables can crash wandb on tmp cleanup)
+            error_summaries = []
+            for err in self._tool_error_buffer:
+                error_summaries.append(
+                    f"[turn {err['turn']}] {err['tool']}({err['args'][:80]}) -> {err['error'][:150]}"
+                )
+            wandb_metrics["train/tool_error_details"] = "\n".join(error_summaries)
+
+            # Also print to stdout for immediate visibility
+            for summary in error_summaries:
+                print(f"  Tool Error: {summary}")
+
+            self._tool_error_buffer = []
+        else:
+            wandb_metrics["train/tool_errors_count"] = 0
+
+        await super().wandb_log(wandb_metrics)
+
+    async def collect_trajectory(
+        self, item: Item
+    ) -> Tuple[Optional[Union[ScoredDataItem, Any]], List[Item]]:
+        """
+        Run a single rollout: agent loop + reward computation.
+
+        This is called group_size times in parallel by collect_trajectories().
+        Each call gets its own task_id for terminal/browser session isolation.
+        """
+        task_id = str(uuid.uuid4())
+
+        # Get group-level tools (resolved once in collect_trajectories)
+        if self._current_group_tools is None:
+            # Fallback: resolve per-trajectory if called outside collect_trajectories
+            tools, valid_names = self._resolve_tools_for_group()
+        else:
+            tools, valid_names = self._current_group_tools
+
+        # Build initial messages
+        messages: List[Dict[str, Any]] = []
+        if self.config.system_prompt:
+            messages.append({"role": "system", "content": self.config.system_prompt})
+        messages.append({"role": "user", "content": self.format_prompt(item)})
+
+        # Run the agent loop
+        result: AgentResult
+        if self._use_managed_server():
+            # Phase 2: ManagedServer with parser -- exact tokens + logprobs
+            # Load the tool call parser from registry based on config
+            from environments.tool_call_parsers import get_parser
+            try:
+                tc_parser = get_parser(self.config.tool_call_parser)
+            except KeyError:
+                logger.warning(
+                    "Tool call parser '%s' not found, falling back to 'hermes'",
+                    self.config.tool_call_parser,
+                )
+                tc_parser = get_parser("hermes")
+
+            try:
+                async with self.server.managed_server(
+                    tokenizer=self.tokenizer,
+                    tool_call_parser=tc_parser,
+                ) as managed:
+                    agent = HermesAgentLoop(
+                        server=managed,
+                        tool_schemas=tools,
+                        valid_tool_names=valid_names,
+                        max_turns=self.config.max_agent_turns,
+                        task_id=task_id,
+                        temperature=self.config.agent_temperature,
+                        max_tokens=self.config.max_token_length,
+                        extra_body=self.config.extra_body,
+                    )
+                    result = await agent.run(messages)
+            except NotImplementedError:
+                # DummyManagedServer not allowed -- fall back to Phase 1
+                logger.warning(
+                    "ManagedServer not available (OpenAI server?). "
+                    "Falling back to direct server mode."
+                )
+                agent = HermesAgentLoop(
+                    server=self.server,
+                    tool_schemas=tools,
+                    valid_tool_names=valid_names,
+                    max_turns=self.config.max_agent_turns,
+                    task_id=task_id,
+                    temperature=self.config.agent_temperature,
+                    max_tokens=self.config.max_token_length,
+                    extra_body=self.config.extra_body,
+                )
+                result = await agent.run(messages)
+        else:
+            # Phase 1: OpenAI server -- native tool_calls, placeholder tokens
+            agent = HermesAgentLoop(
+                server=self.server,
+                tool_schemas=tools,
+                valid_tool_names=valid_names,
+                max_turns=self.config.max_agent_turns,
+                task_id=task_id,
+                temperature=self.config.agent_temperature,
+                max_tokens=self.config.max_token_length,
+                extra_body=self.config.extra_body,
+            )
+            result = await agent.run(messages)
+
+        # Skip reward computation if the agent loop produced no meaningful work
+        # (e.g., API call failed on turn 1). No point spinning up a Modal sandbox
+        # just to verify files that were never created.
+        only_system_and_user = all(
+            msg.get("role") in ("system", "user") for msg in result.messages
+        )
+        if result.turns_used == 0 or only_system_and_user:
+            logger.warning(
+                "Agent loop produced no output (turns=%d, msgs=%d). Skipping reward.",
+                result.turns_used, len(result.messages),
+            )
+            reward = 0.0
+        else:
+            # Compute reward using ToolContext (gives verifier full tool access)
+            ctx = ToolContext(task_id)
+            try:
+                reward = await self.compute_reward(item, result, ctx)
+            except Exception as e:
+                logger.error("compute_reward failed: %s", e)
+                reward = 0.0
+            finally:
+                ctx.cleanup()
+
+        # Track tool errors for wandb logging
+        if result.tool_errors:
+            for err in result.tool_errors:
+                self._tool_error_buffer.append({
+                    "turn": err.turn,
+                    "tool": err.tool_name,
+                    "args": err.arguments[:150],
+                    "error": err.error[:300],
+                    "result": err.tool_result[:300],
+                })
+
+        # Build ScoredDataItem from ManagedServer state
+        # Phase 2: real tokens/masks/logprobs from SequenceNodes
+        # Phase 1: placeholder tokens (still need a valid ScoredDataItem for the pipeline)
+        nodes = (result.managed_state or {}).get("nodes", [])
+
+        if nodes:
+            # Phase 2 (or DummyManagedServer): use actual node data
+            node = nodes[-1]  # Final sequence node = full trajectory
+            scored_item: Dict[str, Any] = {
+                "tokens": node.tokens,
+                "masks": node.masked_tokens,
+                "scores": reward,
+            }
+
+            # Include logprobs if available (Phase 2)
+            if hasattr(node, "logprobs") and node.logprobs:
+                scored_item["advantages"] = None  # Computed by trainer
+                scored_item["ref_logprobs"] = None
+        else:
+            # Phase 1 with no managed state: create placeholder tokens
+            # so the data pipeline doesn't break. These are NOT suitable
+            # for training but allow process mode (SFT data gen) to work.
+            # Tokenize the full conversation to get approximate tokens.
+            full_text = "\n".join(
+                msg.get("content", "") for msg in result.messages if msg.get("content")
+            )
+            if self.tokenizer:
+                tokens = self.tokenizer.encode(full_text, add_special_tokens=True)
+            else:
+                tokens = list(range(min(len(full_text) // 4, 128)))
+
+            scored_item = {
+                "tokens": tokens,
+                "masks": [-100] + tokens[1:],  # Mask first token as prompt
+                "scores": reward,
+            }
+
+        # Always include messages for wandb rollout display and data logging
+        scored_item["messages"] = result.messages
+
+        return scored_item, []
+
+    # =========================================================================
+    # Abstract methods -- subclasses must implement
+    # =========================================================================
+
+    @abstractmethod
+    async def setup(self):
+        """
+        Load dataset, initialize state.
+
+        Called once when the environment starts. Typical implementation:
+            self.dataset = load_dataset(self.config.dataset_name, split=self.config.dataset_split)
+            self.iter = 0
+        """
+        raise NotImplementedError
+
+    @abstractmethod
+    async def get_next_item(self) -> Item:
+        """
+        Return the next item from the dataset for rollout.
+
+        Called by the base env's main loop to get items for workers.
+        Should cycle through the dataset.
+        """
+        raise NotImplementedError
+
+    @abstractmethod
+    def format_prompt(self, item: Item) -> str:
+        """
+        Convert a dataset item into the user message for the agent.
+
+        Args:
+            item: Dataset item (dict, tuple, etc.)
+
+        Returns:
+            The prompt string to send to the agent
+        """
+        raise NotImplementedError
+
+    @abstractmethod
+    async def compute_reward(
+        self, item: Item, result: AgentResult, ctx: ToolContext
+    ) -> float:
+        """
+        Score the rollout. Has full access to:
+        - item: the original dataset item (ground truth, test commands, etc.)
+        - result: AgentResult with full messages, turn count, reasoning, etc.
+        - ctx: ToolContext -- call ANY hermes-agent tool (terminal, file, web,
+               browser, vision...) scoped to this rollout's sandbox. Nothing
+               is off-limits.
+
+        Args:
+            item: The dataset item that was rolled out
+            result: The agent's rollout result
+            ctx: ToolContext with full tool access for verification
+
+        Returns:
+            Reward float (typically 0.0 to 1.0, but any float is valid)
+        """
+        raise NotImplementedError
+
+    @abstractmethod
+    async def evaluate(self, *args, **kwargs):
+        """
+        Periodic evaluation. Called every steps_per_eval steps.
+
+        Typical implementation runs the agent on a held-out eval set
+        and logs metrics via wandb/evaluate_log.
+        """
+        raise NotImplementedError
--- a/environments/hermes_swe_env/init.py
+++ b/environments/hermes_swe_env/init.py
--- a/environments/hermes_swe_env/default.yaml
+++ b/environments/hermes_swe_env/default.yaml
@@ -0,0 +1,34 @@
+# SWE Environment -- Default Configuration
+#
+# SWE-bench style tasks with Modal sandboxes for cloud isolation.
+# Uses terminal + file + web toolsets.
+#
+# Usage:
+#   python environments/hermes_swe_env/hermes_swe_env.py serve \
+#       --config environments/hermes_swe_env/default.yaml
+
+env:
+  enabled_toolsets: ["terminal", "file", "web"]
+  max_agent_turns: 30
+  max_token_length: 4096
+  group_size: 4
+  terminal_backend: "modal"
+  tool_call_parser: "hermes"
+  tokenizer_name: "NousResearch/DeepHermes-3-Llama-3-3B-Preview"
+  dataset_name: "bigcode/humanevalpack"
+  dataset_split: "test"
+  prompt_field: "prompt"
+  steps_per_eval: 50
+  total_steps: 500
+  use_wandb: true
+  wandb_name: "hermes-swe"
+  system_prompt: >
+    You are a skilled software engineer. You have access to a terminal,
+    file tools, and web search. Use these tools to complete the coding task.
+    Write clean, working code and verify it runs correctly before finishing.
+
+openai:
+  base_url: "http://localhost:8000/v1"
+  model_name: "NousResearch/DeepHermes-3-Llama-3-3B-Preview"
+  server_type: "openai"
+  api_key: ""
--- a/environments/hermes_swe_env/hermes_swe_env.py
+++ b/environments/hermes_swe_env/hermes_swe_env.py
@@ -0,0 +1,229 @@
+"""
+HermesSweEnv -- SWE-Bench Style Environment with Modal Sandboxes
+
+A concrete environment for software engineering tasks where the model writes code
+and the reward function runs tests to verify correctness. Uses Modal terminal
+backend for cloud-isolated sandboxes per rollout.
+
+The reward function uses ToolContext.terminal() to run test commands in the same
+Modal sandbox the model used during its agentic loop. All filesystem state from
+the model's tool calls is preserved for verification.
+
+Usage:
+    # Phase 1: OpenAI server type
+    vllm serve YourModel --tool-parser hermes
+    run-api
+    python environments/hermes_swe_env.py serve \\
+        --openai.base_url http://localhost:8000/v1 \\
+        --openai.model_name YourModel \\
+        --openai.server_type openai \\
+        --env.dataset_name bigcode/humanevalpack \\
+        --env.terminal_backend modal
+
+    # Phase 2: VLLM server type (full RL training)
+    python environments/hermes_swe_env.py serve \\
+        --openai.base_url http://localhost:8000/v1 \\
+        --openai.model_name YourModel \\
+        --openai.server_type vllm \\
+        --env.tool_call_parser hermes \\
+        --env.terminal_backend modal
+"""
+
+import logging
+import sys
+import time
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple, Union
+
+# Ensure repo root is on sys.path for imports
+_repo_root = Path(__file__).resolve().parent.parent.parent
+if str(_repo_root) not in sys.path:
+    sys.path.insert(0, str(_repo_root))
+
+from datasets import load_dataset
+
+from atroposlib.envs.base import ScoredDataGroup
+from atroposlib.envs.server_handling.server_manager import APIServerConfig
+from atroposlib.type_definitions import Item
+
+from environments.agent_loop import AgentResult
+from environments.hermes_base_env import HermesAgentBaseEnv, HermesAgentEnvConfig
+from environments.tool_context import ToolContext
+
+logger = logging.getLogger(__name__)
+
+
+class HermesSweEnvConfig(HermesAgentEnvConfig):
+    """Config with defaults for SWE-bench style tasks."""
+
+    pass  # Inherits all fields, overrides defaults in config_init
+
+
+class HermesSweEnv(HermesAgentBaseEnv):
+    """
+    SWE-bench style environment using Modal terminal backend.
+
+    The model gets a coding task, uses terminal + file + web tools to solve it,
+    and the reward function runs tests in the same Modal sandbox to verify.
+
+    Subclass this for specific SWE datasets (HumanEval, SWE-bench, etc.)
+    and customize format_prompt() and compute_reward() as needed.
+    """
+
+    name = "hermes-swe"
+    env_config_cls = HermesSweEnvConfig
+
+    @classmethod
+    def config_init(cls) -> Tuple[HermesSweEnvConfig, List[APIServerConfig]]:
+        """
+        Default configuration for the SWE environment.
+
+        Uses Modal terminal backend for cloud isolation and terminal + file + web toolsets.
+        """
+        env_config = HermesSweEnvConfig(
+            # Toolsets: terminal for running code, file for reading/writing, web for docs
+            enabled_toolsets=["terminal", "file", "web"],
+            disabled_toolsets=None,
+            distribution=None,
+            # Agent settings -- SWE tasks need more turns
+            max_agent_turns=30,
+            max_token_length=4096,
+            agent_temperature=1.0,
+            system_prompt=(
+                "You are a skilled software engineer. You have access to a terminal, "
+                "file tools, and web search. Use these tools to complete the coding task. "
+                "Write clean, working code and verify it runs correctly before finishing."
+            ),
+            # Modal backend for cloud-isolated sandboxes
+            terminal_backend="modal",
+            # Dataset -- override via CLI for your specific SWE dataset
+            dataset_name="bigcode/humanevalpack",
+            dataset_split="test",
+            prompt_field="prompt",
+            # Atropos settings
+            group_size=4,
+            tokenizer_name="NousResearch/DeepHermes-3-Llama-3-3B-Preview",
+            tool_call_parser="hermes",
+            steps_per_eval=50,
+            total_steps=500,
+            use_wandb=True,
+            wandb_name="hermes-swe",
+        )
+
+        server_configs = [
+            APIServerConfig(
+                base_url="http://localhost:8000/v1",
+                model_name="NousResearch/DeepHermes-3-Llama-3-3B-Preview",
+                server_type="openai",  # Phase 1; switch to "vllm" for Phase 2
+                api_key="",
+            )
+        ]
+
+        return env_config, server_configs
+
+    async def setup(self):
+        """Load the SWE dataset."""
+        if self.config.dataset_name:
+            self.dataset = load_dataset(
+                self.config.dataset_name, split=self.config.dataset_split
+            )
+        else:
+            # Placeholder if no dataset specified
+            self.dataset = []
+        self.iter = 0
+        self.reward_buffer: List[float] = []
+
+    async def get_next_item(self) -> Dict[str, Any]:
+        """Cycle through the SWE dataset."""
+        if not self.dataset:
+            raise ValueError("No dataset loaded. Set dataset_name in config.")
+        item = self.dataset[self.iter % len(self.dataset)]
+        self.iter += 1
+        return item
+
+    def format_prompt(self, item: Dict[str, Any]) -> str:
+        """
+        Format the SWE task prompt.
+
+        Override this in subclasses for different dataset formats.
+        Default assumes the dataset has a 'prompt' field and optionally a 'test' field.
+        """
+        prompt = item.get(self.config.prompt_field, "")
+
+        # If the dataset has test information, include it in the prompt
+        test_info = item.get("test", item.get("test_code", item.get("tests", "")))
+        if test_info:
+            prompt += f"\n\nTests to pass:\n{test_info}"
+
+        return prompt
+
+    async def compute_reward(
+        self, item: Dict[str, Any], result: AgentResult, ctx: ToolContext
+    ) -> float:
+        """
+        Score by running tests in the model's Modal sandbox.
+
+        Default implementation:
+        - If the dataset item has a 'test' or 'test_code' field, run it
+        - Check exit code: 0 = pass, non-zero = fail
+        - Partial credit for file creation
+
+        Override this in subclasses for more sophisticated reward logic.
+        """
+        # Find the test command from the dataset item
+        test_code = item.get("test", item.get("test_code", item.get("tests", "")))
+
+        if test_code:
+            # Run the test in the model's sandbox
+            test_result = ctx.terminal(
+                f'cd /workspace && python3 -c "{test_code}"', timeout=60
+            )
+
+            if test_result["exit_code"] == 0:
+                self.reward_buffer.append(1.0)
+                return 1.0
+
+        # Partial credit: check if the model created any Python files
+        file_check = ctx.terminal("find /workspace -name '*.py' -newer /tmp/.start_marker 2>/dev/null | head -5")
+        if file_check["exit_code"] == 0 and file_check.get("output", "").strip():
+            self.reward_buffer.append(0.1)
+            return 0.1
+
+        self.reward_buffer.append(0.0)
+        return 0.0
+
+    async def evaluate(self, *args, **kwargs):
+        """
+        Run evaluation on a held-out set.
+
+        Override for dataset-specific evaluation logic.
+        """
+        start_time = time.time()
+        end_time = time.time()
+
+        eval_metrics = {"eval/placeholder": 0.0}
+        await self.evaluate_log(
+            metrics=eval_metrics,
+            start_time=start_time,
+            end_time=end_time,
+        )
+
+    async def wandb_log(self, wandb_metrics: Optional[Dict] = None):
+        """Log SWE-specific metrics."""
+        if wandb_metrics is None:
+            wandb_metrics = {}
+
+        if self.reward_buffer:
+            wandb_metrics["train/avg_reward"] = sum(self.reward_buffer) / len(
+                self.reward_buffer
+            )
+            wandb_metrics["train/pass_rate"] = sum(
+                1 for r in self.reward_buffer if r == 1.0
+            ) / len(self.reward_buffer)
+            self.reward_buffer = []
+
+        await super().wandb_log(wandb_metrics)
+
+
+if __name__ == "__main__":
+    HermesSweEnv.cli()
--- a/environments/patches.py
+++ b/environments/patches.py
@@ -0,0 +1,188 @@
+"""
+Monkey patches for making hermes-agent tools work inside async frameworks (Atropos).
+
+Problem:
+    Some tools use asyncio.run() internally (e.g., mini-swe-agent's Modal backend,
+    web_extract). This crashes when called from inside Atropos's event loop because
+    asyncio.run() can't be nested.
+
+Solution:
+    Replace the problematic methods with versions that use a dedicated background
+    thread with its own event loop. The calling code sees the same sync interface --
+    call a function, get a result -- but internally the async work happens on a
+    separate thread that doesn't conflict with Atropos's loop.
+
+    These patches are safe for normal CLI use too: when there's no running event
+    loop, the behavior is identical (the background thread approach works regardless).
+
+What gets patched:
+    - SwerexModalEnvironment.__init__ -- creates Modal deployment on a background thread
+    - SwerexModalEnvironment.execute -- runs commands on the same background thread
+    - SwerexModalEnvironment.stop -- stops deployment on the background thread
+
+Usage:
+    Call apply_patches() once at import time (done automatically by hermes_base_env.py).
+    This is idempotent -- calling it multiple times is safe.
+"""
+
+import asyncio
+import logging
+import threading
+from typing import Any
+
+logger = logging.getLogger(__name__)
+
+_patches_applied = False
+
+
+class _AsyncWorker:
+    """
+    A dedicated background thread with its own event loop.
+
+    Allows sync code to submit async coroutines and block for results,
+    even when called from inside another running event loop. Used to
+    bridge sync tool interfaces with async backends (Modal, SWE-ReX).
+    """
+
+    def __init__(self):
+        self._loop: asyncio.AbstractEventLoop = None
+        self._thread: threading.Thread = None
+        self._started = threading.Event()
+
+    def start(self):
+        """Start the background event loop thread."""
+        self._thread = threading.Thread(target=self._run_loop, daemon=True)
+        self._thread.start()
+        self._started.wait(timeout=30)
+
+    def _run_loop(self):
+        """Background thread entry point -- runs the event loop forever."""
+        self._loop = asyncio.new_event_loop()
+        asyncio.set_event_loop(self._loop)
+        self._started.set()
+        self._loop.run_forever()
+
+    def run_coroutine(self, coro, timeout=600):
+        """
+        Submit a coroutine to the background loop and block until it completes.
+
+        Safe to call from any thread, including threads that already have
+        a running event loop.
+        """
+        if self._loop is None or self._loop.is_closed():
+            raise RuntimeError("AsyncWorker loop is not running")
+        future = asyncio.run_coroutine_threadsafe(coro, self._loop)
+        return future.result(timeout=timeout)
+
+    def stop(self):
+        """Stop the background event loop and join the thread."""
+        if self._loop and self._loop.is_running():
+            self._loop.call_soon_threadsafe(self._loop.stop)
+        if self._thread:
+            self._thread.join(timeout=10)
+
+
+def _patch_swerex_modal():
+    """
+    Monkey patch SwerexModalEnvironment to use a background thread event loop
+    instead of asyncio.run(). This makes it safe to call from inside Atropos's
+    async event loop.
+
+    The patched methods have the exact same interface and behavior -- the only
+    difference is HOW the async work is executed internally.
+    """
+    try:
+        from minisweagent.environments.extra.swerex_modal import (
+            SwerexModalEnvironment,
+            SwerexModalEnvironmentConfig,
+        )
+        from swerex.deployment.modal import ModalDeployment
+        from swerex.runtime.abstract import Command as RexCommand
+    except ImportError:
+        # mini-swe-agent or swe-rex not installed -- nothing to patch
+        logger.debug("mini-swe-agent Modal backend not available, skipping patch")
+        return
+
+    # Save original methods so we can refer to config handling
+    _original_init = SwerexModalEnvironment.__init__
+
+    def _patched_init(self, **kwargs):
+        """Patched __init__: creates Modal deployment on a background thread."""
+        self.config = SwerexModalEnvironmentConfig(**kwargs)
+
+        # Start a dedicated event loop thread for all Modal async operations
+        self._worker = _AsyncWorker()
+        self._worker.start()
+
+        # Create AND start the deployment entirely on the worker's loop/thread
+        # so all gRPC channels and async state are bound to that loop
+        async def _create_and_start():
+            deployment = ModalDeployment(
+                image=self.config.image,
+                startup_timeout=self.config.startup_timeout,
+                runtime_timeout=self.config.runtime_timeout,
+                deployment_timeout=self.config.deployment_timeout,
+                install_pipx=self.config.install_pipx,
+                modal_sandbox_kwargs=self.config.modal_sandbox_kwargs,
+            )
+            await deployment.start()
+            return deployment
+
+        self.deployment = self._worker.run_coroutine(_create_and_start())
+
+    def _patched_execute(self, command: str, cwd: str = "", *, timeout: int | None = None) -> dict[str, Any]:
+        """Patched execute: runs commands on the background thread's loop."""
+        async def _do_execute():
+            return await self.deployment.runtime.execute(
+                RexCommand(
+                    command=command,
+                    shell=True,
+                    check=False,
+                    cwd=cwd or self.config.cwd,
+                    timeout=timeout or self.config.timeout,
+                    merge_output_streams=True,
+                    env=self.config.env if self.config.env else None,
+                )
+            )
+
+        output = self._worker.run_coroutine(_do_execute())
+        return {
+            "output": output.stdout,
+            "returncode": output.exit_code,
+        }
+
+    def _patched_stop(self):
+        """Patched stop: stops deployment on the background thread, then stops the thread."""
+        try:
+            self._worker.run_coroutine(
+                asyncio.wait_for(self.deployment.stop(), timeout=10),
+                timeout=15,
+            )
+        except Exception:
+            pass
+        finally:
+            self._worker.stop()
+
+    # Apply the patches
+    SwerexModalEnvironment.__init__ = _patched_init
+    SwerexModalEnvironment.execute = _patched_execute
+    SwerexModalEnvironment.stop = _patched_stop
+
+    logger.debug("Patched SwerexModalEnvironment for async-safe operation")
+
+
+def apply_patches():
+    """
+    Apply all monkey patches needed for Atropos compatibility.
+
+    Safe to call multiple times -- patches are only applied once.
+    Safe for normal CLI use -- patched code works identically when
+    there is no running event loop.
+    """
+    global _patches_applied
+    if _patches_applied:
+        return
+
+    _patch_swerex_modal()
+
+    _patches_applied = True
--- a/environments/terminal_test_env/init.py
+++ b/environments/terminal_test_env/init.py
--- a/environments/terminal_test_env/default.yaml
+++ b/environments/terminal_test_env/default.yaml
@@ -0,0 +1,34 @@
+# Terminal Test Environment -- Default Configuration
+#
+# Simple file-creation tasks for validating the full Atropos + hermes-agent stack.
+# Uses Modal terminal backend and OpenRouter (Claude) for inference.
+# API keys loaded from ~/hermes-agent/.env
+#
+# Usage:
+#   run-api
+#   python environments/terminal_test_env/terminal_test_env.py serve \
+#       --config environments/terminal_test_env/default.yaml
+
+env:
+  enabled_toolsets: ["terminal", "file"]
+  max_agent_turns: 10
+  max_token_length: 2048
+  group_size: 3
+  total_steps: 3
+  steps_per_eval: 3
+  terminal_backend: "modal"
+  tool_call_parser: "hermes"
+  tokenizer_name: "NousResearch/DeepHermes-3-Llama-3-3B-Preview"
+  ensure_scores_are_not_same: false
+  use_wandb: false
+  system_prompt: >
+    You are a helpful assistant with access to a terminal and file tools.
+    Complete the user's request by using the available tools.
+    Be precise and follow instructions exactly.
+
+openai:
+  base_url: "https://openrouter.ai/api/v1"
+  model_name: "anthropic/claude-opus-4.6"
+  server_type: "openai"
+  health_check: false
+  # api_key loaded from OPENROUTER_API_KEY in .env
--- a/environments/terminal_test_env/terminal_test_env.py
+++ b/environments/terminal_test_env/terminal_test_env.py
@@ -0,0 +1,292 @@
+"""
+TerminalTestEnv -- Simple Test Environment for Validating the Stack
+
+A self-contained environment with inline tasks (no external dataset needed).
+Each task asks the model to create a file at a known path with specific content.
+The reward verifier cats the file and checks if the content matches.
+
+Enables only terminal + file toolsets. Uses Modal terminal backend with
+OpenRouter (Claude) by default.
+
+Training tasks (3):
+    1. Create ~/greeting.txt with "Hello from Hermes Agent"
+    2. Create ~/count.txt with numbers 1-5, one per line
+    3. Create ~/answer.txt with the result of 123 + 456
+
+Eval task (1):
+    1. Create ~/result.txt with the result of 6 * 7
+
+Usage:
+    # Start Atropos API server
+    run-api
+
+    # Run environment (uses OpenRouter + Modal by default)
+    python environments/terminal_test_env.py serve
+
+    # Process mode (no run-api needed, saves to JSONL)
+    python environments/terminal_test_env.py process \\
+        --env.data_path_to_save_groups terminal_test_output.jsonl
+"""
+
+import logging
+import os
+import sys
+import time
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple, Union
+
+# Ensure repo root is on sys.path for imports
+_repo_root = Path(__file__).resolve().parent.parent.parent
+if str(_repo_root) not in sys.path:
+    sys.path.insert(0, str(_repo_root))
+
+from atroposlib.envs.base import ScoredDataGroup
+from atroposlib.envs.server_handling.server_manager import APIServerConfig
+from atroposlib.type_definitions import Item
+
+from environments.agent_loop import AgentResult
+from environments.hermes_base_env import HermesAgentBaseEnv, HermesAgentEnvConfig
+from environments.tool_context import ToolContext
+
+logger = logging.getLogger(__name__)
+
+
+# =============================================================================
+# Inline task definitions -- no external dataset needed
+# =============================================================================
+
+TRAIN_TASKS = [
+    {
+        "prompt": "Create a file at ~/greeting.txt containing exactly the text: Hello from Hermes Agent",
+        "verify_path": "~/greeting.txt",
+        "expected_content": "Hello from Hermes Agent",
+    },
+    {
+        "prompt": "Create a file at ~/count.txt containing the numbers 1 through 5, one per line",
+        "verify_path": "~/count.txt",
+        "expected_content": "1\n2\n3\n4\n5",
+    },
+    {
+        "prompt": "Create a file at ~/answer.txt containing the result of 123 + 456",
+        "verify_path": "~/answer.txt",
+        "expected_content": "579",
+    },
+]
+
+EVAL_TASKS = [
+    {
+        "prompt": "Create a file at ~/result.txt containing the result of 6 * 7",
+        "verify_path": "~/result.txt",
+        "expected_content": "42",
+    },
+]
+
+
+class TerminalTestEnvConfig(HermesAgentEnvConfig):
+    """Config with defaults suitable for terminal testing."""
+
+    pass  # Inherits all fields, overrides defaults in config_init
+
+
+class TerminalTestEnv(HermesAgentBaseEnv):
+    """
+    Simple test environment with inline file-creation tasks.
+
+    All tasks follow the same pattern: "create a file at ~/X.txt with content Y".
+    The verifier runs `cat ~/X.txt` in the rollout's terminal and checks the output
+    against the expected string. Same verifier logic for all tasks.
+
+    This environment is designed to validate the full stack end-to-end:
+    - Agent loop executes tool calls (terminal/file)
+    - ToolContext provides terminal access to the reward function
+    - Reward function verifies file content via cat
+    - Scored data flows through the Atropos pipeline
+    """
+
+    name = "terminal-test"
+    env_config_cls = TerminalTestEnvConfig
+
+    @classmethod
+    def config_init(cls) -> Tuple[TerminalTestEnvConfig, List[APIServerConfig]]:
+        """
+        Default configuration for the terminal test environment.
+
+        Uses Modal terminal backend for cloud isolation and OpenRouter with
+        Claude for inference. API keys loaded from ~/hermes-agent/.env.
+        """
+        env_config = TerminalTestEnvConfig(
+            # Terminal + file tools only
+            enabled_toolsets=["terminal", "file"],
+            disabled_toolsets=None,
+            distribution=None,
+            # Agent settings
+            max_agent_turns=10,  # Simple tasks, don't need many turns
+            max_token_length=16000,
+            agent_temperature=1.0,
+            system_prompt=(
+                "You are a helpful assistant with access to a terminal and file tools. "
+                "Complete the user's request by using the available tools. "
+                "Be precise and follow instructions exactly."
+            ),
+            # Modal terminal backend for cloud-isolated sandboxes per rollout
+            terminal_backend="modal",
+            # Atropos settings
+            group_size=3,              # 3 rollouts per group
+            tokenizer_name="NousResearch/q-30b-t-h45-e1",
+            tool_call_parser="hermes",
+            steps_per_eval=3,          # Eval after all 3 steps
+            total_steps=3,             # 3 groups total (1 group per step)
+            use_wandb=True,
+            wandb_name="terminal-test",
+            ensure_scores_are_not_same=False,  # Allow all-same scores for simple tasks
+            # No external dataset
+            dataset_name=None,
+        )
+
+        # OpenRouter with Claude -- API key loaded from .env (OPENROUTER_API_KEY)
+        server_configs = [
+            APIServerConfig(
+                base_url="https://openrouter.ai/api/v1",
+                model_name="anthropic/claude-opus-4.6",
+                server_type="openai",
+                api_key=os.getenv("OPENROUTER_API_KEY", ""),
+                health_check=False,  # OpenRouter doesn't have a /health endpoint
+            )
+        ]
+
+        return env_config, server_configs
+
+    async def setup(self):
+        """Initialize inline task lists."""
+        self.train_tasks = list(TRAIN_TASKS)
+        self.eval_tasks = list(EVAL_TASKS)
+        self.iter = 0
+        # Track reward stats for wandb logging
+        self.reward_buffer: List[float] = []
+
+    async def get_next_item(self) -> Dict[str, str]:
+        """Cycle through training tasks."""
+        item = self.train_tasks[self.iter % len(self.train_tasks)]
+        self.iter += 1
+        return item
+
+    def format_prompt(self, item: Dict[str, str]) -> str:
+        """The prompt is directly in the task item."""
+        return item["prompt"]
+
+    async def compute_reward(
+        self, item: Dict[str, str], result: AgentResult, ctx: ToolContext
+    ) -> float:
+        """
+        Verify by cat-ing the expected file path and checking content matches.
+        Same verifier for all tasks -- they all write a file at a known path.
+
+        Scoring:
+            1.0 = exact match
+            0.5 = expected content is present but has extra stuff
+            0.0 = file doesn't exist or content doesn't match
+        """
+        verify_result = ctx.terminal(f"cat {item['verify_path']}")
+
+        # File doesn't exist or can't be read
+        if verify_result["exit_code"] != 0:
+            self.reward_buffer.append(0.0)
+            return 0.0
+
+        actual = verify_result.get("output", "").strip()
+        expected = item["expected_content"].strip()
+
+        # Exact match
+        if actual == expected:
+            self.reward_buffer.append(1.0)
+            return 1.0
+
+        # Partial credit: expected content is present but has extra stuff
+        if expected in actual:
+            self.reward_buffer.append(0.5)
+            return 0.5
+
+        self.reward_buffer.append(0.0)
+        return 0.0
+
+    async def evaluate(self, *args, **kwargs):
+        """
+        Run eval tasks using the agent loop and verify results.
+        Logs accuracy metrics.
+        """
+        start_time = time.time()
+        correct = 0
+        total = len(self.eval_tasks)
+        samples = []
+
+        for eval_item in self.eval_tasks:
+            try:
+                # For eval, we do a simple single-turn completion (not full agent loop)
+                # to keep eval fast. The agent loop is tested via training.
+                completion = await self.server.chat_completion(
+                    messages=[
+                        {"role": "system", "content": self.config.system_prompt or ""},
+                        {"role": "user", "content": eval_item["prompt"]},
+                    ],
+                    n=1,
+                    max_tokens=self.config.max_token_length,
+                    temperature=0.0,
+                    split="eval",
+                )
+
+                response_content = (
+                    completion.choices[0].message.content if completion.choices else ""
+                )
+
+                samples.append(
+                    {
+                        "prompt": eval_item["prompt"],
+                        "response": response_content,
+                        "expected": eval_item["expected_content"],
+                    }
+                )
+
+            except Exception as e:
+                logger.error("Eval failed for item: %s", e)
+                samples.append(
+                    {
+                        "prompt": eval_item["prompt"],
+                        "response": f"ERROR: {e}",
+                        "expected": eval_item["expected_content"],
+                    }
+                )
+
+        end_time = time.time()
+
+        eval_metrics = {
+            "eval/num_samples": total,
+        }
+
+        await self.evaluate_log(
+            metrics=eval_metrics,
+            samples=samples,
+            start_time=start_time,
+            end_time=end_time,
+        )
+
+    async def wandb_log(self, wandb_metrics: Optional[Dict] = None):
+        """Log training metrics including reward stats and accuracy."""
+        if wandb_metrics is None:
+            wandb_metrics = {}
+
+        if self.reward_buffer:
+            total = len(self.reward_buffer)
+            correct = sum(1 for r in self.reward_buffer if r == 1.0)
+            partial = sum(1 for r in self.reward_buffer if r == 0.5)
+
+            wandb_metrics["train/avg_reward"] = sum(self.reward_buffer) / total
+            wandb_metrics["train/accuracy"] = correct / total
+            wandb_metrics["train/partial_match_rate"] = partial / total
+            wandb_metrics["train/total_rollouts"] = total
+            self.reward_buffer = []
+
+        await super().wandb_log(wandb_metrics)
+
+
+if __name__ == "__main__":
+    TerminalTestEnv.cli()
--- a/environments/tool_call_parsers/init.py
+++ b/environments/tool_call_parsers/init.py
@@ -0,0 +1,120 @@
+"""
+Tool Call Parser Registry
+
+Client-side parsers that extract structured tool_calls from raw model output text.
+Used in Phase 2 (VLLM server type) where ManagedServer's /generate endpoint returns
+raw text without tool call parsing.
+
+Each parser is a standalone reimplementation of the corresponding VLLM parser's
+non-streaming extract_tool_calls() logic. No VLLM dependency -- only standard library
+(re, json, uuid) and openai types.
+
+Usage:
+    from environments.tool_call_parsers import get_parser
+
+    parser = get_parser("hermes")
+    content, tool_calls = parser.parse(raw_model_output)
+    # content = text with tool call markup stripped
+    # tool_calls = list of ChatCompletionMessageToolCall objects, or None
+"""
+
+import logging
+from abc import ABC, abstractmethod
+from typing import Dict, List, Optional, Tuple, Type
+
+from openai.types.chat.chat_completion_message_tool_call import (
+    ChatCompletionMessageToolCall,
+)
+
+logger = logging.getLogger(__name__)
+
+# Type alias for parser return value
+ParseResult = Tuple[Optional[str], Optional[List[ChatCompletionMessageToolCall]]]
+
+
+class ToolCallParser(ABC):
+    """
+    Base class for tool call parsers.
+
+    Each parser knows how to extract structured tool_calls from a specific
+    model family's raw output text format.
+    """
+
+    @abstractmethod
+    def parse(self, text: str) -> ParseResult:
+        """
+        Parse raw model output text for tool calls.
+
+        Args:
+            text: Raw decoded text from the model's completion
+
+        Returns:
+            Tuple of (content, tool_calls) where:
+            - content: text with tool call markup stripped (the message 'content' field),
+                       or None if the entire output was tool calls
+            - tool_calls: list of ChatCompletionMessageToolCall objects,
+                          or None if no tool calls were found
+        """
+        raise NotImplementedError
+
+
+# Global parser registry: name -> parser class
+PARSER_REGISTRY: Dict[str, Type[ToolCallParser]] = {}
+
+
+def register_parser(name: str):
+    """
+    Decorator to register a parser class under a given name.
+
+    Usage:
+        @register_parser("hermes")
+        class HermesToolCallParser(ToolCallParser):
+            ...
+    """
+
+    def decorator(cls: Type[ToolCallParser]) -> Type[ToolCallParser]:
+        PARSER_REGISTRY[name] = cls
+        return cls
+
+    return decorator
+
+
+def get_parser(name: str) -> ToolCallParser:
+    """
+    Get a parser instance by name.
+
+    Args:
+        name: Parser name (e.g., "hermes", "mistral", "llama3_json")
+
+    Returns:
+        Instantiated parser
+
+    Raises:
+        KeyError: If parser name is not found in registry
+    """
+    if name not in PARSER_REGISTRY:
+        available = sorted(PARSER_REGISTRY.keys())
+        raise KeyError(
+            f"Tool call parser '{name}' not found. Available parsers: {available}"
+        )
+    return PARSER_REGISTRY[name]()
+
+
+def list_parsers() -> List[str]:
+    """Return sorted list of registered parser names."""
+    return sorted(PARSER_REGISTRY.keys())
+
+
+# Import all parser modules to trigger registration via @register_parser decorators
+# Each module registers itself when imported
+from environments.tool_call_parsers.hermes_parser import HermesToolCallParser  # noqa: E402, F401
+from environments.tool_call_parsers.longcat_parser import LongcatToolCallParser  # noqa: E402, F401
+from environments.tool_call_parsers.mistral_parser import MistralToolCallParser  # noqa: E402, F401
+from environments.tool_call_parsers.llama_parser import LlamaToolCallParser  # noqa: E402, F401
+from environments.tool_call_parsers.qwen_parser import QwenToolCallParser  # noqa: E402, F401
+from environments.tool_call_parsers.deepseek_v3_parser import DeepSeekV3ToolCallParser  # noqa: E402, F401
+from environments.tool_call_parsers.deepseek_v3_1_parser import DeepSeekV31ToolCallParser  # noqa: E402, F401
+from environments.tool_call_parsers.kimi_k2_parser import KimiK2ToolCallParser  # noqa: E402, F401
+from environments.tool_call_parsers.glm45_parser import Glm45ToolCallParser  # noqa: E402, F401
+from environments.tool_call_parsers.glm47_parser import Glm47ToolCallParser  # noqa: E402, F401
+from environments.tool_call_parsers.qwen3_coder_parser import Qwen3CoderToolCallParser  # noqa: E402, F401
--- a/environments/tool_call_parsers/deepseek_v3_1_parser.py
+++ b/environments/tool_call_parsers/deepseek_v3_1_parser.py
@@ -0,0 +1,71 @@
+"""
+DeepSeek V3.1 tool call parser.
+
+Similar to V3 but with a slightly different format:
+    <｜tool▁call▁begin｜>function_name<｜tool▁sep｜>arguments<｜tool▁call▁end｜>
+
+Note: V3 has type+name before the separator, V3.1 has name before and args after.
+
+Based on VLLM's DeepSeekV31ToolParser.extract_tool_calls()
+"""
+
+import re
+import uuid
+from typing import List, Optional
+
+from openai.types.chat.chat_completion_message_tool_call import (
+    ChatCompletionMessageToolCall,
+    Function,
+)
+
+from environments.tool_call_parsers import ParseResult, ToolCallParser, register_parser
+
+
+@register_parser("deepseek_v3_1")
+@register_parser("deepseek_v31")
+class DeepSeekV31ToolCallParser(ToolCallParser):
+    """
+    Parser for DeepSeek V3.1 tool calls.
+
+    Slightly different regex than V3: function_name comes before the separator,
+    arguments come after (no type field, no json code block wrapper).
+    """
+
+    START_TOKEN = "<｜tool▁calls▁begin｜>"
+
+    # Regex captures: function_name, function_arguments
+    PATTERN = re.compile(
+        r"<｜tool▁call▁begin｜>(?P<function_name>.*?)<｜tool▁sep｜>(?P<function_arguments>.*?)<｜tool▁call▁end｜>"
+    )
+
+    def parse(self, text: str) -> ParseResult:
+        if self.START_TOKEN not in text:
+            return text, None
+
+        try:
+            matches = self.PATTERN.findall(text)
+            if not matches:
+                return text, None
+
+            tool_calls: List[ChatCompletionMessageToolCall] = []
+            for match in matches:
+                func_name, func_args = match
+                tool_calls.append(
+                    ChatCompletionMessageToolCall(
+                        id=f"call_{uuid.uuid4().hex[:8]}",
+                        type="function",
+                        function=Function(
+                            name=func_name.strip(),
+                            arguments=func_args.strip(),
+                        ),
+                    )
+                )
+
+            if not tool_calls:
+                return text, None
+
+            content = text[: text.find(self.START_TOKEN)].strip()
+            return content if content else None, tool_calls
+
+        except Exception:
+            return text, None
--- a/environments/tool_call_parsers/deepseek_v3_parser.py
+++ b/environments/tool_call_parsers/deepseek_v3_parser.py
@@ -0,0 +1,75 @@
+"""
+DeepSeek V3 tool call parser.
+
+Format uses special unicode tokens:
+    <｜tool▁calls▁begin｜>
+    <｜tool▁call▁begin｜>type<｜tool▁sep｜>function_name
+    ```json
+    {"arg": "value"}
+    ```
+    <｜tool▁call▁end｜>
+    <｜tool▁calls▁end｜>
+
+Based on VLLM's DeepSeekV3ToolParser.extract_tool_calls()
+"""
+
+import re
+import uuid
+from typing import List, Optional
+
+from openai.types.chat.chat_completion_message_tool_call import (
+    ChatCompletionMessageToolCall,
+    Function,
+)
+
+from environments.tool_call_parsers import ParseResult, ToolCallParser, register_parser
+
+
+@register_parser("deepseek_v3")
+class DeepSeekV3ToolCallParser(ToolCallParser):
+    """
+    Parser for DeepSeek V3 tool calls.
+
+    Uses special unicode tokens with fullwidth angle brackets and block elements.
+    Extracts type, function name, and JSON arguments from the structured format.
+    """
+
+    START_TOKEN = "<｜tool▁calls▁begin｜>"
+
+    # Regex captures: type, function_name, function_arguments
+    PATTERN = re.compile(
+        r"<｜tool▁call▁begin｜>(?P<type>.*)<｜tool▁sep｜>(?P<function_name>.*)\n```json\n(?P<function_arguments>.*)\n```<｜tool▁call▁end｜>"
+    )
+
+    def parse(self, text: str) -> ParseResult:
+        if self.START_TOKEN not in text:
+            return text, None
+
+        try:
+            matches = self.PATTERN.findall(text)
+            if not matches:
+                return text, None
+
+            tool_calls: List[ChatCompletionMessageToolCall] = []
+            for match in matches:
+                tc_type, func_name, func_args = match
+                tool_calls.append(
+                    ChatCompletionMessageToolCall(
+                        id=f"call_{uuid.uuid4().hex[:8]}",
+                        type="function",
+                        function=Function(
+                            name=func_name.strip(),
+                            arguments=func_args.strip(),
+                        ),
+                    )
+                )
+
+            if not tool_calls:
+                return text, None
+
+            # Content is everything before the tool calls section
+            content = text[: text.find(self.START_TOKEN)].strip()
+            return content if content else None, tool_calls
+
+        except Exception:
+            return text, None
--- a/environments/tool_call_parsers/glm45_parser.py
+++ b/environments/tool_call_parsers/glm45_parser.py
@@ -0,0 +1,109 @@
+"""
+GLM 4.5 (GLM-4-MoE) tool call parser.
+
+Format uses custom arg_key/arg_value tags rather than standard JSON:
+    <tool_call>function_name
+    <arg_key>param1</arg_key><arg_value>value1</arg_value>
+    <arg_key>param2</arg_key><arg_value>value2</arg_value>
+    </tool_call>
+
+Values are deserialized using json.loads -> ast.literal_eval -> raw string fallback.
+
+Based on VLLM's Glm4MoeModelToolParser.extract_tool_calls()
+"""
+
+import ast
+import json
+import re
+import uuid
+from typing import Any, Dict, List, Optional
+
+from openai.types.chat.chat_completion_message_tool_call import (
+    ChatCompletionMessageToolCall,
+    Function,
+)
+
+from environments.tool_call_parsers import ParseResult, ToolCallParser, register_parser
+
+
+def _deserialize_value(value: str) -> Any:
+    """
+    Try to deserialize a string value to its native Python type.
+    Attempts json.loads, then ast.literal_eval, then returns raw string.
+    """
+    try:
+        return json.loads(value)
+    except (json.JSONDecodeError, TypeError):
+        pass
+
+    try:
+        return ast.literal_eval(value)
+    except (ValueError, SyntaxError, TypeError):
+        pass
+
+    return value
+
+
+@register_parser("glm45")
+class Glm45ToolCallParser(ToolCallParser):
+    """
+    Parser for GLM 4.5 (GLM-4-MoE) tool calls.
+
+    Uses <tool_call>...</tool_call> tags with <arg_key>/<arg_value> pairs
+    instead of standard JSON arguments.
+    """
+
+    FUNC_CALL_REGEX = re.compile(r"<tool_call>.*?</tool_call>", re.DOTALL)
+    FUNC_DETAIL_REGEX = re.compile(r"<tool_call>([^\n]*)\n(.*)</tool_call>", re.DOTALL)
+    FUNC_ARG_REGEX = re.compile(
+        r"<arg_key>(.*?)</arg_key>\s*<arg_value>(.*?)</arg_value>", re.DOTALL
+    )
+
+    START_TOKEN = "<tool_call>"
+
+    def parse(self, text: str) -> ParseResult:
+        if self.START_TOKEN not in text:
+            return text, None
+
+        try:
+            matched_calls = self.FUNC_CALL_REGEX.findall(text)
+            if not matched_calls:
+                return text, None
+
+            tool_calls: List[ChatCompletionMessageToolCall] = []
+
+            for match in matched_calls:
+                detail = self.FUNC_DETAIL_REGEX.search(match)
+                if not detail:
+                    continue
+
+                func_name = detail.group(1).strip()
+                func_args_raw = detail.group(2)
+
+                # Parse arg_key/arg_value pairs
+                pairs = self.FUNC_ARG_REGEX.findall(func_args_raw) if func_args_raw else []
+                arg_dict: Dict[str, Any] = {}
+                for key, value in pairs:
+                    arg_key = key.strip()
+                    arg_val = _deserialize_value(value.strip())
+                    arg_dict[arg_key] = arg_val
+
+                tool_calls.append(
+                    ChatCompletionMessageToolCall(
+                        id=f"call_{uuid.uuid4().hex[:8]}",
+                        type="function",
+                        function=Function(
+                            name=func_name,
+                            arguments=json.dumps(arg_dict, ensure_ascii=False),
+                        ),
+                    )
+                )
+
+            if not tool_calls:
+                return text, None
+
+            content = text[: text.find(self.START_TOKEN)].strip()
+            return content if content else None, tool_calls
+
+        except Exception:
+            return text, None
--- a/environments/tool_call_parsers/glm47_parser.py
+++ b/environments/tool_call_parsers/glm47_parser.py
@@ -0,0 +1,35 @@
+"""
+GLM 4.7 tool call parser.
+
+Same as GLM 4.5 but with slightly different regex patterns.
+The tool_call tags may wrap differently and arg parsing handles
+newlines between key/value pairs.
+
+Based on VLLM's Glm47MoeModelToolParser (extends Glm4MoeModelToolParser).
+"""
+
+import re
+
+from environments.tool_call_parsers import ParseResult, register_parser
+from environments.tool_call_parsers.glm45_parser import Glm45ToolCallParser
+
+
+@register_parser("glm47")
+class Glm47ToolCallParser(Glm45ToolCallParser):
+    """
+    Parser for GLM 4.7 tool calls.
+    Extends GLM 4.5 with updated regex patterns.
+    """
+
+    def __init__(self):
+        super().__init__()
+        # GLM 4.7 uses a slightly different detail regex that includes
+        # the <tool_call> wrapper and optional arg_key content
+        self.FUNC_DETAIL_REGEX = re.compile(
+            r"<tool_call>(.*?)(<arg_key>.*?)?</tool_call>", re.DOTALL
+        )
+        # GLM 4.7 handles newlines between arg_key and arg_value tags
+        self.FUNC_ARG_REGEX = re.compile(
+            r"<arg_key>(.*?)</arg_key>(?:\\n|\s)*<arg_value>(.*?)</arg_value>",
+            re.DOTALL,
+        )
--- a/environments/tool_call_parsers/hermes_parser.py
+++ b/environments/tool_call_parsers/hermes_parser.py
@@ -0,0 +1,73 @@
+"""
+Hermes tool call parser.
+
+Format: <tool_call>{"name": "func", "arguments": {...}}</tool_call>
+Based on VLLM's Hermes2ProToolParser.extract_tool_calls()
+"""
+
+import json
+import re
+import uuid
+from typing import List, Optional, Tuple
+
+from openai.types.chat.chat_completion_message_tool_call import (
+    ChatCompletionMessageToolCall,
+    Function,
+)
+
+from environments.tool_call_parsers import ParseResult, ToolCallParser, register_parser
+
+
+@register_parser("hermes")
+class HermesToolCallParser(ToolCallParser):
+    """
+    Parser for Hermes-format tool calls.
+
+    Matches <tool_call>...</tool_call> tags containing JSON with "name" and "arguments".
+    Also handles unclosed <tool_call> at end-of-string (truncated generation).
+    """
+
+    # Matches both closed and unclosed tool_call tags
+    PATTERN = re.compile(
+        r"<tool_call>\s*(.*?)\s*</tool_call>|<tool_call>\s*(.*)", re.DOTALL
+    )
+
+    def parse(self, text: str) -> ParseResult:
+        if "<tool_call>" not in text:
+            return text, None
+
+        try:
+            matches = self.PATTERN.findall(text)
+            if not matches:
+                return text, None
+
+            tool_calls: List[ChatCompletionMessageToolCall] = []
+            for match in matches:
+                # match is a tuple: (closed_content, unclosed_content)
+                raw_json = match[0] if match[0] else match[1]
+                if not raw_json.strip():
+                    continue
+
+                tc_data = json.loads(raw_json)
+                tool_calls.append(
+                    ChatCompletionMessageToolCall(
+                        id=f"call_{uuid.uuid4().hex[:8]}",
+                        type="function",
+                        function=Function(
+                            name=tc_data["name"],
+                            arguments=json.dumps(
+                                tc_data.get("arguments", {}), ensure_ascii=False
+                            ),
+                        ),
+                    )
+                )
+
+            if not tool_calls:
+                return text, None
+
+            # Content is everything before the first <tool_call> tag
+            content = text[: text.find("<tool_call>")].strip()
+            return content if content else None, tool_calls
+
+        except Exception:
+            return text, None
--- a/environments/tool_call_parsers/kimi_k2_parser.py
+++ b/environments/tool_call_parsers/kimi_k2_parser.py
@@ -0,0 +1,93 @@
+"""
+Kimi K2 tool call parser.
+
+Format:
+    <|tool_calls_section_begin|>
+    <|tool_call_begin|>function_id:0<|tool_call_argument_begin|>{"arg": "val"}<|tool_call_end|>
+    <|tool_calls_section_end|>
+
+The function_id format is typically "functions.func_name:index" or "func_name:index".
+
+Based on VLLM's KimiK2ToolParser.extract_tool_calls()
+"""
+
+import re
+import uuid
+from typing import List, Optional
+
+from openai.types.chat.chat_completion_message_tool_call import (
+    ChatCompletionMessageToolCall,
+    Function,
+)
+
+from environments.tool_call_parsers import ParseResult, ToolCallParser, register_parser
+
+
+@register_parser("kimi_k2")
+class KimiK2ToolCallParser(ToolCallParser):
+    """
+    Parser for Kimi K2 tool calls.
+
+    Uses section begin/end tokens wrapping individual tool call begin/end tokens.
+    The tool_call_id contains the function name (after last dot, before colon).
+    """
+
+    # Support both singular and plural variants
+    START_TOKENS = [
+        "<|tool_calls_section_begin|>",
+        "<|tool_call_section_begin|>",
+    ]
+
+    # Regex captures: tool_call_id (e.g., "functions.get_weather:0"), function_arguments
+    PATTERN = re.compile(
+        r"<\|tool_call_begin\|>\s*(?P<tool_call_id>[^<]+:\d+)\s*"
+        r"<\|tool_call_argument_begin\|>\s*"
+        r"(?P<function_arguments>(?:(?!<\|tool_call_begin\|>).)*?)\s*"
+        r"<\|tool_call_end\|>",
+        re.DOTALL,
+    )
+
+    def parse(self, text: str) -> ParseResult:
+        # Check for any variant of the start token
+        has_start = any(token in text for token in self.START_TOKENS)
+        if not has_start:
+            return text, None
+
+        try:
+            matches = self.PATTERN.findall(text)
+            if not matches:
+                return text, None
+
+            tool_calls: List[ChatCompletionMessageToolCall] = []
+            for match in matches:
+                function_id, function_args = match
+
+                # Extract function name from ID format: "functions.get_weather:0" -> "get_weather"
+                function_name = function_id.split(":")[0].split(".")[-1]
+
+                tool_calls.append(
+                    ChatCompletionMessageToolCall(
+                        id=function_id,  # Preserve the original ID format
+                        type="function",
+                        function=Function(
+                            name=function_name,
+                            arguments=function_args.strip(),
+                        ),
+                    )
+                )
+
+            if not tool_calls:
+                return text, None
+
+            # Content is everything before the tool calls section
+            earliest_start = len(text)
+            for token in self.START_TOKENS:
+                idx = text.find(token)
+                if idx >= 0 and idx < earliest_start:
+                    earliest_start = idx
+
+            content = text[:earliest_start].strip()
+            return content if content else None, tool_calls
+
+        except Exception:
+            return text, None
--- a/environments/tool_call_parsers/llama_parser.py
+++ b/environments/tool_call_parsers/llama_parser.py
@@ -0,0 +1,96 @@
+"""
+Llama 3.x / 4 tool call parser.
+
+Format: The model outputs JSON objects with "name" and "arguments" (or "parameters") keys.
+May be preceded by <|python_tag|> token. Supports multiple JSON objects separated
+by content or semicolons.
+
+Based on VLLM's Llama3JsonToolParser.extract_tool_calls()
+"""
+
+import json
+import re
+import uuid
+from typing import List, Optional
+
+from openai.types.chat.chat_completion_message_tool_call import (
+    ChatCompletionMessageToolCall,
+    Function,
+)
+
+from environments.tool_call_parsers import ParseResult, ToolCallParser, register_parser
+
+
+@register_parser("llama3_json")
+@register_parser("llama4_json")
+class LlamaToolCallParser(ToolCallParser):
+    """
+    Parser for Llama 3.x and 4 JSON-format tool calls.
+
+    Finds JSON objects containing "name" + ("arguments" or "parameters") keys.
+    Uses Python's json.JSONDecoder.raw_decode for robust extraction of
+    JSON objects from mixed text.
+    """
+
+    BOT_TOKEN = "<|python_tag|>"
+
+    # Regex to find the start of potential JSON objects
+    JSON_START = re.compile(r"\{")
+
+    def parse(self, text: str) -> ParseResult:
+        # Quick check: need either the bot token or a JSON brace
+        if self.BOT_TOKEN not in text and "{" not in text:
+            return text, None
+
+        try:
+            decoder = json.JSONDecoder()
+            tool_calls: List[ChatCompletionMessageToolCall] = []
+            end_index = -1  # Track where the last parsed JSON ended
+
+            for match in self.JSON_START.finditer(text):
+                start = match.start()
+                # Skip if this brace is inside a previously parsed JSON object
+                if start <= end_index:
+                    continue
+
+                try:
+                    obj, json_end = decoder.raw_decode(text[start:])
+                    end_index = start + json_end
+
+                    # Must have "name" and either "arguments" or "parameters"
+                    name = obj.get("name")
+                    args = obj.get("arguments", obj.get("parameters"))
+
+                    if not name or args is None:
+                        continue
+
+                    # Normalize arguments to JSON string
+                    if isinstance(args, dict):
+                        args = json.dumps(args, ensure_ascii=False)
+                    elif not isinstance(args, str):
+                        args = json.dumps(args, ensure_ascii=False)
+
+                    tool_calls.append(
+                        ChatCompletionMessageToolCall(
+                            id=f"call_{uuid.uuid4().hex[:8]}",
+                            type="function",
+                            function=Function(name=name, arguments=args),
+                        )
+                    )
+                except (json.JSONDecodeError, KeyError, ValueError):
+                    continue
+
+            if not tool_calls:
+                return text, None
+
+            # Content is everything before the first tool call JSON
+            # Find where the first tool call starts in the text
+            first_tc_start = text.find("{")
+            if self.BOT_TOKEN in text:
+                first_tc_start = text.find(self.BOT_TOKEN)
+            content = text[:first_tc_start].strip() if first_tc_start > 0 else None
+
+            return content, tool_calls
+
+        except Exception:
+            return text, None
--- a/environments/tool_call_parsers/longcat_parser.py
+++ b/environments/tool_call_parsers/longcat_parser.py
@@ -0,0 +1,69 @@
+"""
+Longcat Flash Chat tool call parser.
+
+Same as Hermes but uses <longcat_tool_call> tags instead of <tool_call>.
+Based on VLLM's LongcatFlashToolParser (extends Hermes2ProToolParser).
+"""
+
+import json
+import re
+import uuid
+from typing import List, Optional
+
+from openai.types.chat.chat_completion_message_tool_call import (
+    ChatCompletionMessageToolCall,
+    Function,
+)
+
+from environments.tool_call_parsers import ParseResult, ToolCallParser, register_parser
+
+
+@register_parser("longcat")
+class LongcatToolCallParser(ToolCallParser):
+    """
+    Parser for Longcat Flash Chat tool calls.
+    Identical logic to Hermes, just different tag names.
+    """
+
+    PATTERN = re.compile(
+        r"<longcat_tool_call>\s*(.*?)\s*</longcat_tool_call>|<longcat_tool_call>\s*(.*)",
+        re.DOTALL,
+    )
+
+    def parse(self, text: str) -> ParseResult:
+        if "<longcat_tool_call>" not in text:
+            return text, None
+
+        try:
+            matches = self.PATTERN.findall(text)
+            if not matches:
+                return text, None
+
+            tool_calls: List[ChatCompletionMessageToolCall] = []
+            for match in matches:
+                raw_json = match[0] if match[0] else match[1]
+                if not raw_json.strip():
+                    continue
+
+                tc_data = json.loads(raw_json)
+                tool_calls.append(
+                    ChatCompletionMessageToolCall(
+                        id=f"call_{uuid.uuid4().hex[:8]}",
+                        type="function",
+                        function=Function(
+                            name=tc_data["name"],
+                            arguments=json.dumps(
+                                tc_data.get("arguments", {}), ensure_ascii=False
+                            ),
+                        ),
+                    )
+                )
+
+            if not tool_calls:
+                return text, None
+
+            content = text[: text.find("<longcat_tool_call>")].strip()
+            return content if content else None, tool_calls
+
+        except Exception:
+            return text, None
--- a/environments/tool_call_parsers/mistral_parser.py
+++ b/environments/tool_call_parsers/mistral_parser.py
@@ -0,0 +1,130 @@
+"""
+Mistral tool call parser.
+
+Supports two formats depending on tokenizer version:
+- Pre-v11: content[TOOL_CALLS] [{"name": ..., "arguments": {...}}, ...]
+- v11+:    content[TOOL_CALLS]tool_name1{"arg": "val"}[TOOL_CALLS]tool_name2{"arg": "val"}
+
+Based on VLLM's MistralToolParser.extract_tool_calls()
+The [TOOL_CALLS] token is the bot_token used by Mistral models.
+"""
+
+import json
+import re
+import uuid
+from typing import List, Optional
+
+from openai.types.chat.chat_completion_message_tool_call import (
+    ChatCompletionMessageToolCall,
+    Function,
+)
+
+from environments.tool_call_parsers import ParseResult, ToolCallParser, register_parser
+
+
+def _generate_mistral_id() -> str:
+    """Mistral tool call IDs are 9-char alphanumeric strings."""
+    import random
+    import string
+
+    return "".join(random.choices(string.ascii_letters + string.digits, k=9))
+
+
+@register_parser("mistral")
+class MistralToolCallParser(ToolCallParser):
+    """
+    Parser for Mistral-format tool calls.
+
+    Detects format by checking if the content after [TOOL_CALLS] starts with '['
+    (pre-v11 JSON array) or with a tool name (v11+ format).
+    """
+
+    # The [TOOL_CALLS] token -- may appear as different strings depending on tokenizer
+    BOT_TOKEN = "[TOOL_CALLS]"
+
+    # Fallback regex for pre-v11 format when JSON parsing fails
+    TOOL_CALL_REGEX = re.compile(r"\[?\s*(\{.*?\})\s*\]?", re.DOTALL)
+
+    def parse(self, text: str) -> ParseResult:
+        if self.BOT_TOKEN not in text:
+            return text, None
+
+        try:
+            parts = text.split(self.BOT_TOKEN)
+            content = parts[0].strip()
+            raw_tool_calls = parts[1:]
+
+            # Detect format: if the first raw part starts with '[', it's pre-v11
+            first_raw = raw_tool_calls[0].strip() if raw_tool_calls else ""
+            is_pre_v11 = first_raw.startswith("[") or first_raw.startswith("{")
+
+            tool_calls: List[ChatCompletionMessageToolCall] = []
+
+            if not is_pre_v11:
+                # v11+ format: [TOOL_CALLS]tool_name{args}[TOOL_CALLS]tool_name2{args2}
+                for raw in raw_tool_calls:
+                    raw = raw.strip()
+                    if not raw or "{" not in raw:
+                        continue
+
+                    brace_idx = raw.find("{")
+                    tool_name = raw[:brace_idx].strip()
+                    args_str = raw[brace_idx:]
+
+                    tool_calls.append(
+                        ChatCompletionMessageToolCall(
+                            id=_generate_mistral_id(),
+                            type="function",
+                            function=Function(name=tool_name, arguments=args_str),
+                        )
+                    )
+            else:
+                # Pre-v11 format: [TOOL_CALLS] [{"name": ..., "arguments": {...}}]
+                try:
+                    parsed = json.loads(first_raw)
+                    if isinstance(parsed, dict):
+                        parsed = [parsed]
+
+                    for tc in parsed:
+                        args = tc.get("arguments", {})
+                        if isinstance(args, dict):
+                            args = json.dumps(args, ensure_ascii=False)
+
+                        tool_calls.append(
+                            ChatCompletionMessageToolCall(
+                                id=_generate_mistral_id(),
+                                type="function",
+                                function=Function(
+                                    name=tc["name"], arguments=args
+                                ),
+                            )
+                        )
+                except json.JSONDecodeError:
+                    # Fallback regex extraction
+                    match = self.TOOL_CALL_REGEX.findall(first_raw)
+                    if match:
+                        for raw_json in match:
+                            try:
+                                tc = json.loads(raw_json)
+                                args = tc.get("arguments", {})
+                                if isinstance(args, dict):
+                                    args = json.dumps(args, ensure_ascii=False)
+                                tool_calls.append(
+                                    ChatCompletionMessageToolCall(
+                                        id=_generate_mistral_id(),
+                                        type="function",
+                                        function=Function(
+                                            name=tc["name"], arguments=args
+                                        ),
+                                    )
+                                )
+                            except (json.JSONDecodeError, KeyError):
+                                continue
+
+            if not tool_calls:
+                return text, None
+
+            return content if content else None, tool_calls
+
+        except Exception:
+            return text, None
--- a/environments/tool_call_parsers/qwen3_coder_parser.py
+++ b/environments/tool_call_parsers/qwen3_coder_parser.py
@@ -0,0 +1,163 @@
+"""
+Qwen3-Coder tool call parser.
+
+Format uses XML-style nested tags:
+    <tool_call>
+    <function=function_name>
+    <parameter=param_name>value</parameter>
+    <parameter=param_name2>value2</parameter>
+    </function>
+    </tool_call>
+
+Parameters are extracted from <parameter=name>value</parameter> tags and
+type-converted using the schema if available, otherwise treated as strings.
+
+Based on VLLM's Qwen3CoderToolParser.extract_tool_calls()
+"""
+
+import ast
+import json
+import re
+import uuid
+from typing import Any, Dict, List, Optional
+
+from openai.types.chat.chat_completion_message_tool_call import (
+    ChatCompletionMessageToolCall,
+    Function,
+)
+
+from environments.tool_call_parsers import ParseResult, ToolCallParser, register_parser
+
+
+def _try_convert_value(value: str) -> Any:
+    """
+    Try to convert a parameter value string to a native Python type.
+    Handles null, numbers, booleans, JSON objects/arrays, and falls back to string.
+    """
+    stripped = value.strip()
+
+    # Handle null
+    if stripped.lower() == "null":
+        return None
+
+    # Try JSON first (handles objects, arrays, strings, numbers, booleans)
+    try:
+        return json.loads(stripped)
+    except (json.JSONDecodeError, TypeError):
+        pass
+
+    # Try Python literal eval (handles tuples, etc.)
+    try:
+        return ast.literal_eval(stripped)
+    except (ValueError, SyntaxError, TypeError):
+        pass
+
+    # Return as string
+    return stripped
+
+
+@register_parser("qwen3_coder")
+class Qwen3CoderToolCallParser(ToolCallParser):
+    """
+    Parser for Qwen3-Coder XML-format tool calls.
+
+    Uses nested XML tags: <tool_call><function=name><parameter=key>val</parameter></function></tool_call>
+    """
+
+    START_TOKEN = "<tool_call>"
+    FUNCTION_PREFIX = "<function="
+
+    # Find complete tool_call blocks (or unclosed at end)
+    TOOL_CALL_REGEX = re.compile(
+        r"<tool_call>(.*?)</tool_call>|<tool_call>(.*?)$", re.DOTALL
+    )
+
+    # Find function blocks within a tool_call
+    FUNCTION_REGEX = re.compile(
+        r"<function=(.*?)</function>|<function=(.*)$", re.DOTALL
+    )
+
+    # Find parameter blocks within a function
+    PARAMETER_REGEX = re.compile(
+        r"<parameter=(.*?)(?:</parameter>|(?=<parameter=)|(?=</function>)|$)",
+        re.DOTALL,
+    )
+
+    def _parse_function_call(self, function_str: str) -> Optional[ChatCompletionMessageToolCall]:
+        """Parse a single <function=name>...</function> block into a ToolCall."""
+        try:
+            # Extract function name: everything before the first '>'
+            gt_idx = function_str.index(">")
+            func_name = function_str[:gt_idx].strip()
+            params_str = function_str[gt_idx + 1:]
+
+            # Extract parameters
+            param_dict: Dict[str, Any] = {}
+            for match_text in self.PARAMETER_REGEX.findall(params_str):
+                if ">" not in match_text:
+                    continue
+                eq_idx = match_text.index(">")
+                param_name = match_text[:eq_idx].strip()
+                param_value = match_text[eq_idx + 1:]
+
+                # Clean up whitespace
+                if param_value.startswith("\n"):
+                    param_value = param_value[1:]
+                if param_value.endswith("\n"):
+                    param_value = param_value[:-1]
+
+                param_dict[param_name] = _try_convert_value(param_value)
+
+            return ChatCompletionMessageToolCall(
+                id=f"call_{uuid.uuid4().hex[:24]}",
+                type="function",
+                function=Function(
+                    name=func_name,
+                    arguments=json.dumps(param_dict, ensure_ascii=False),
+                ),
+            )
+        except (ValueError, IndexError):
+            return None
+
+    def parse(self, text: str) -> ParseResult:
+        if self.FUNCTION_PREFIX not in text:
+            return text, None
+
+        try:
+            # Find all tool_call blocks
+            tc_matches = self.TOOL_CALL_REGEX.findall(text)
+            raw_blocks = [m[0] if m[0] else m[1] for m in tc_matches]
+
+            # Fallback: if no tool_call tags, try the whole text
+            if not raw_blocks:
+                raw_blocks = [text]
+
+            # Find function blocks within each tool_call
+            function_strs: List[str] = []
+            for block in raw_blocks:
+                func_matches = self.FUNCTION_REGEX.findall(block)
+                function_strs.extend(m[0] if m[0] else m[1] for m in func_matches)
+
+            if not function_strs:
+                return text, None
+
+            # Parse each function call
+            tool_calls: List[ChatCompletionMessageToolCall] = []
+            for func_str in function_strs:
+                tc = self._parse_function_call(func_str)
+                if tc is not None:
+                    tool_calls.append(tc)
+
+            if not tool_calls:
+                return text, None
+
+            # Content before tool calls
+            first_tc = text.find(self.START_TOKEN)
+            if first_tc < 0:
+                first_tc = text.find(self.FUNCTION_PREFIX)
+            content = text[:first_tc].strip() if first_tc > 0 else None
+
+            return content, tool_calls
+
+        except Exception:
+            return text, None
--- a/environments/tool_call_parsers/qwen_parser.py
+++ b/environments/tool_call_parsers/qwen_parser.py
@@ -0,0 +1,19 @@
+"""
+Qwen 2.5 tool call parser.
+
+Uses the same <tool_call> format as Hermes.
+Registered as a separate parser name for clarity when using --tool-parser=qwen.
+"""
+
+from environments.tool_call_parsers import register_parser
+from environments.tool_call_parsers.hermes_parser import HermesToolCallParser
+
+
+@register_parser("qwen")
+class QwenToolCallParser(HermesToolCallParser):
+    """
+    Parser for Qwen 2.5 tool calls.
+    Same <tool_call>{"name": ..., "arguments": ...}</tool_call> format as Hermes.
+    """
+
+    pass  # Identical format -- inherits everything from Hermes
--- a/environments/tool_context.py
+++ b/environments/tool_context.py
@@ -0,0 +1,473 @@
+"""
+ToolContext -- Unrestricted Tool Access for Reward Functions
+
+A per-rollout handle that gives reward/verification functions direct access to
+ALL hermes-agent tools, scoped to the rollout's task_id. The same task_id means
+the terminal/browser session is the SAME one the model used during its rollout --
+all state (files, processes, browser tabs) is preserved.
+
+The verifier author decides which tools to use. Nothing is hardcoded or gated.
+
+Example usage in a compute_reward():
+    async def compute_reward(self, item, result, ctx):
+        # Run tests in the model's terminal sandbox
+        test = ctx.terminal("pytest -v")
+        if test["exit_code"] == 0:
+            return 1.0
+
+        # Check if a file was created
+        content = ctx.read_file("/workspace/solution.py")
+        if content.get("content"):
+            return 0.5
+
+        return 0.0
+"""
+
+import json
+import logging
+import os
+from typing import Any, Dict, List, Optional
+
+import asyncio
+import concurrent.futures
+
+from model_tools import handle_function_call
+from tools.terminal_tool import cleanup_vm
+from tools.browser_tool import cleanup_browser
+
+logger = logging.getLogger(__name__)
+
+# Thread pool for running sync tool calls that internally use asyncio.run()
+_tool_executor = concurrent.futures.ThreadPoolExecutor(max_workers=4)
+
+
+def _run_tool_in_thread(tool_name: str, arguments: Dict[str, Any], task_id: str) -> str:
+    """
+    Run a tool call in a thread pool executor so backends that use asyncio.run()
+    internally (modal, docker) get a clean event loop.
+
+    If we're already in an async context, uses run_in_executor.
+    If not (e.g., called from sync code), runs directly.
+    """
+    try:
+        loop = asyncio.get_running_loop()
+        # We're in an async context -- need to run in thread
+        import concurrent.futures
+        with concurrent.futures.ThreadPoolExecutor(max_workers=1) as pool:
+            future = pool.submit(
+                handle_function_call, tool_name, arguments, task_id
+            )
+            return future.result(timeout=300)
+    except RuntimeError:
+        # No running event loop -- safe to call directly
+        return handle_function_call(tool_name, arguments, task_id)
+
+
+class ToolContext:
+    """
+    Open-ended access to all hermes-agent tools for a specific rollout.
+
+    Passed to compute_reward() so verifiers can use any tool they need:
+    terminal commands, file reads/writes, web searches, browser automation, etc.
+    All calls share the rollout's task_id for session isolation.
+    """
+
+    def __init__(self, task_id: str):
+        self.task_id = task_id
+
+    # -------------------------------------------------------------------------
+    # Terminal tools
+    # -------------------------------------------------------------------------
+
+    def terminal(self, command: str, timeout: int = 180) -> Dict[str, Any]:
+        """
+        Run a command in the rollout's terminal session.
+
+        Args:
+            command: Shell command to execute
+            timeout: Command timeout in seconds
+
+        Returns:
+            Dict with 'exit_code' (int) and 'output' (str)
+        """
+        import os
+        backend = os.getenv("TERMINAL_ENV", "local")
+        logger.debug("ToolContext.terminal [%s backend] task=%s: %s", backend, self.task_id[:8], command[:100])
+
+        # Run in thread pool so modal/docker backends' asyncio.run() doesn't deadlock
+        result = _run_tool_in_thread(
+            "terminal",
+            {"command": command, "timeout": timeout},
+            self.task_id,
+        )
+        try:
+            return json.loads(result)
+        except json.JSONDecodeError:
+            return {"exit_code": -1, "output": result}
+
+    # -------------------------------------------------------------------------
+    # File tools
+    # -------------------------------------------------------------------------
+
+    def read_file(self, path: str) -> Dict[str, Any]:
+        """
+        Read a file from the rollout's filesystem.
+
+        Args:
+            path: File path to read
+
+        Returns:
+            Dict with file content or error
+        """
+        result = handle_function_call(
+            "read_file", {"path": path}, task_id=self.task_id
+        )
+        try:
+            return json.loads(result)
+        except json.JSONDecodeError:
+            return {"error": result}
+
+    def write_file(self, path: str, content: str) -> Dict[str, Any]:
+        """
+        Write a TEXT file in the rollout's filesystem.
+
+        Uses a shell heredoc under the hood, so this is only safe for text content.
+        For binary files (images, compiled artifacts, etc.), use upload_file() instead.
+
+        Args:
+            path: File path to write
+            content: Text content to write
+
+        Returns:
+            Dict with success status or error
+        """
+        result = handle_function_call(
+            "write_file", {"path": path, "content": content}, task_id=self.task_id
+        )
+        try:
+            return json.loads(result)
+        except json.JSONDecodeError:
+            return {"error": result}
+
+    def upload_file(self, local_path: str, remote_path: str) -> Dict[str, Any]:
+        """
+        Upload a local file to the rollout's sandbox (binary-safe).
+
+        Unlike write_file() which passes content through a shell heredoc (text-only),
+        this method base64-encodes the file and decodes it inside the sandbox.
+        Safe for any file type: binaries, images, archives, etc.
+
+        For large files (>1MB), the content is split into chunks to avoid
+        hitting shell command-length limits.
+
+        Args:
+            local_path: Path to a local file on the host
+            remote_path: Destination path inside the sandbox
+
+        Returns:
+            Dict with 'exit_code' and 'output'
+        """
+        import base64
+        from pathlib import Path as _Path
+
+        local = _Path(local_path)
+        if not local.exists():
+            return {"exit_code": -1, "output": f"Local file not found: {local_path}"}
+
+        raw = local.read_bytes()
+        b64 = base64.b64encode(raw).decode("ascii")
+
+        # Ensure parent directory exists in the sandbox
+        parent = str(_Path(remote_path).parent)
+        if parent not in (".", "/"):
+            self.terminal(f"mkdir -p {parent}", timeout=10)
+
+        # For small files, single command is fine
+        chunk_size = 60_000  # ~60KB per chunk (well within shell limits)
+        if len(b64) <= chunk_size:
+            result = self.terminal(
+                f"printf '%s' '{b64}' | base64 -d > {remote_path}",
+                timeout=30,
+            )
+        else:
+            # For larger files, write base64 in chunks then decode
+            tmp_b64 = "/tmp/_hermes_upload.b64"
+            self.terminal(f": > {tmp_b64}", timeout=5)  # truncate
+            for i in range(0, len(b64), chunk_size):
+                chunk = b64[i : i + chunk_size]
+                self.terminal(f"printf '%s' '{chunk}' >> {tmp_b64}", timeout=15)
+            result = self.terminal(
+                f"base64 -d {tmp_b64} > {remote_path} && rm -f {tmp_b64}",
+                timeout=30,
+            )
+
+        return result
+
+    def upload_dir(self, local_dir: str, remote_dir: str) -> List[Dict[str, Any]]:
+        """
+        Upload an entire local directory to the rollout's sandbox (binary-safe).
+
+        Recursively uploads all files, preserving directory structure.
+
+        Args:
+            local_dir: Path to a local directory on the host
+            remote_dir: Destination directory inside the sandbox
+
+        Returns:
+            List of results, one per file uploaded
+        """
+        from pathlib import Path as _Path
+
+        local = _Path(local_dir)
+        if not local.exists() or not local.is_dir():
+            return [{"exit_code": -1, "output": f"Local directory not found: {local_dir}"}]
+
+        results = []
+        for file_path in sorted(local.rglob("*")):
+            if file_path.is_file():
+                relative = file_path.relative_to(local)
+                target = f"{remote_dir}/{relative}"
+                results.append(self.upload_file(str(file_path), target))
+        return results
+
+    def download_file(self, remote_path: str, local_path: str) -> Dict[str, Any]:
+        """
+        Download a file from the rollout's sandbox to the host (binary-safe).
+
+        The inverse of upload_file(). Base64-encodes the file inside the sandbox,
+        reads the encoded data through the terminal, and decodes it locally.
+        Safe for any file type.
+
+        Args:
+            remote_path: Path to the file inside the sandbox
+            local_path: Destination path on the host
+
+        Returns:
+            Dict with 'success' (bool) and 'bytes' (int) or 'error' (str)
+        """
+        import base64
+        from pathlib import Path as _Path
+
+        # Base64-encode the file inside the sandbox and capture output
+        result = self.terminal(
+            f"base64 {remote_path} 2>/dev/null",
+            timeout=30,
+        )
+
+        if result.get("exit_code", -1) != 0:
+            return {
+                "success": False,
+                "error": f"Failed to read remote file: {result.get('output', '')}",
+            }
+
+        b64_data = result.get("output", "").strip()
+        if not b64_data:
+            return {"success": False, "error": f"Remote file is empty or missing: {remote_path}"}
+
+        try:
+            raw = base64.b64decode(b64_data)
+        except Exception as e:
+            return {"success": False, "error": f"Base64 decode failed: {e}"}
+
+        # Write to local host filesystem
+        local = _Path(local_path)
+        local.parent.mkdir(parents=True, exist_ok=True)
+        local.write_bytes(raw)
+
+        return {"success": True, "bytes": len(raw)}
+
+    def download_dir(self, remote_dir: str, local_dir: str) -> List[Dict[str, Any]]:
+        """
+        Download a directory from the rollout's sandbox to the host (binary-safe).
+
+        Lists all files in the remote directory, then downloads each one.
+        Preserves directory structure.
+
+        Args:
+            remote_dir: Path to the directory inside the sandbox
+            local_dir: Destination directory on the host
+
+        Returns:
+            List of results, one per file downloaded
+        """
+        from pathlib import Path as _Path
+
+        # List files in the remote directory
+        ls_result = self.terminal(
+            f"find {remote_dir} -type f 2>/dev/null",
+            timeout=15,
+        )
+
+        if ls_result.get("exit_code", -1) != 0:
+            return [{"success": False, "error": f"Failed to list remote dir: {remote_dir}"}]
+
+        file_list = ls_result.get("output", "").strip()
+        if not file_list:
+            return [{"success": False, "error": f"Remote directory is empty or missing: {remote_dir}"}]
+
+        results = []
+        for remote_file in file_list.splitlines():
+            remote_file = remote_file.strip()
+            if not remote_file:
+                continue
+            # Compute the relative path to preserve directory structure
+            if remote_file.startswith(remote_dir):
+                relative = remote_file[len(remote_dir):].lstrip("/")
+            else:
+                relative = _Path(remote_file).name
+            local_file = str(_Path(local_dir) / relative)
+            results.append(self.download_file(remote_file, local_file))
+
+        return results
+
+    def search(self, query: str, path: str = ".") -> Dict[str, Any]:
+        """
+        Search for text in the rollout's filesystem.
+
+        Args:
+            query: Search query
+            path: Directory to search in
+
+        Returns:
+            Dict with search results
+        """
+        result = handle_function_call(
+            "search", {"query": query, "path": path}, task_id=self.task_id
+        )
+        try:
+            return json.loads(result)
+        except json.JSONDecodeError:
+            return {"error": result}
+
+    # -------------------------------------------------------------------------
+    # Web tools
+    # -------------------------------------------------------------------------
+
+    def web_search(self, query: str) -> Dict[str, Any]:
+        """
+        Search the web.
+
+        Args:
+            query: Search query
+
+        Returns:
+            Dict with search results
+        """
+        result = handle_function_call("web_search", {"query": query})
+        try:
+            return json.loads(result)
+        except json.JSONDecodeError:
+            return {"error": result}
+
+    def web_extract(self, urls: List[str]) -> Dict[str, Any]:
+        """
+        Extract content from URLs.
+
+        Args:
+            urls: List of URLs to extract content from
+
+        Returns:
+            Dict with extracted content
+        """
+        result = handle_function_call("web_extract", {"urls": urls})
+        try:
+            return json.loads(result)
+        except json.JSONDecodeError:
+            return {"error": result}
+
+    # -------------------------------------------------------------------------
+    # Browser tools
+    # -------------------------------------------------------------------------
+
+    def browser_navigate(self, url: str) -> Dict[str, Any]:
+        """
+        Navigate the rollout's browser session to a URL.
+
+        Args:
+            url: URL to navigate to
+
+        Returns:
+            Dict with page snapshot or error
+        """
+        result = handle_function_call(
+            "browser_navigate", {"url": url}, task_id=self.task_id
+        )
+        try:
+            return json.loads(result)
+        except json.JSONDecodeError:
+            return {"error": result}
+
+    def browser_snapshot(self) -> Dict[str, Any]:
+        """
+        Take a snapshot of the current browser page.
+
+        Returns:
+            Dict with page content/accessibility snapshot
+        """
+        result = handle_function_call(
+            "browser_snapshot", {}, task_id=self.task_id
+        )
+        try:
+            return json.loads(result)
+        except json.JSONDecodeError:
+            return {"error": result}
+
+    # -------------------------------------------------------------------------
+    # Generic tool access
+    # -------------------------------------------------------------------------
+
+    def call_tool(self, tool_name: str, arguments: Dict[str, Any]) -> str:
+        """
+        Call any hermes-agent tool by name.
+
+        This is the generic escape hatch -- if a tool doesn't have a convenience
+        wrapper above, you can call it directly here.
+
+        Args:
+            tool_name: Name of the tool (e.g., "vision_analyze", "skills_list")
+            arguments: Dict of arguments for the tool
+
+        Returns:
+            Raw JSON string result from the tool
+        """
+        return _run_tool_in_thread(tool_name, arguments, self.task_id)
+
+    # -------------------------------------------------------------------------
+    # Cleanup
+    # -------------------------------------------------------------------------
+
+    def cleanup(self):
+        """
+        Release all resources (terminal VMs, browser sessions, background processes)
+        for this rollout.
+
+        Called automatically by the base environment via try/finally after
+        compute_reward() completes. You generally don't need to call this yourself.
+        """
+        # Kill any background processes from this rollout (safety net)
+        try:
+            from tools.process_registry import process_registry
+            killed = process_registry.kill_all(task_id=self.task_id)
+            if killed:
+                logger.debug("Process cleanup for task %s: killed %d process(es)", self.task_id, killed)
+        except Exception as e:
+            logger.debug("Process cleanup for task %s: %s", self.task_id, e)
+
+        try:
+            cleanup_vm(self.task_id)
+        except Exception as e:
+            logger.debug("VM cleanup for task %s: %s", self.task_id, e)
+
+        # Suppress browser_tool's noisy debug prints during cleanup.
+        # The cleanup still runs (safe), it just doesn't spam the console.
+        _prev_quiet = os.environ.get("HERMES_QUIET")
+        os.environ["HERMES_QUIET"] = "1"
+        try:
+            cleanup_browser(self.task_id)
+        except Exception as e:
+            logger.debug("Browser cleanup for task %s: %s", self.task_id, e)
+        finally:
+            if _prev_quiet is None:
+                os.environ.pop("HERMES_QUIET", None)
+            else:
+                os.environ["HERMES_QUIET"] = _prev_quiet
--- a/example-skill/SKILL.md
+++ b/example-skill/SKILL.md
@@ -1,70 +0,0 @@
---
-name: example-skill
-description: An example skill demonstrating the skill file format and structure
---
-
-# Example Skill
-
-This is an example skill file that demonstrates how to create skills for the Hermes Agent.
-
-## Skill File Format
-
-Skills are markdown files with YAML frontmatter at the top:
-
-```yaml
---
-name: your-skill-name
-description: A brief one-line description of what this skill does
---
-```
-
-The frontmatter fields:
- **name**: The identifier used to reference this skill (lowercase, hyphens for spaces)
- **description**: A brief description shown when listing skills (keep under 200 chars)
-
-## Writing Effective Skills
-
-### 1. Be Specific and Actionable
-
-Good skills provide clear, actionable instructions:
-
-```
-When reviewing code:
-1. Check for security vulnerabilities first
-2. Verify error handling is comprehensive
-3. Ensure tests cover edge cases
-```
-
-### 2. Include Examples
-
-Show concrete examples of what you want:
-
-```python
-# Good: Descriptive variable names
-user_authentication_token = get_token()
-
-# Bad: Cryptic abbreviations  
-uat = gt()
-```
-
-### 3. Define When to Use
-
-Help the agent understand when this skill applies:
-
-> Use this skill when: reviewing pull requests, auditing security, or checking code quality.
-
-## Skill Categories
-
-Consider organizing skills by purpose:
-
- **Conventions**: Coding standards, API patterns, naming rules
- **Workflows**: Step-by-step processes for deployments, reviews, releases
- **Knowledge**: Domain-specific information, system architecture, gotchas
- **Templates**: Boilerplate for common tasks, response formats
-
-## Tips
-
-1. Keep the description concise - it's shown in the skills list
-2. Use headers to organize longer skills
-3. Include code examples where helpful
-4. Reference other skills if they're related
--- a/gateway/init.py
+++ b/gateway/init.py
@@ -0,0 +1,35 @@
+"""
+Hermes Gateway - Multi-platform messaging integration.
+
+This module provides a unified gateway for connecting the Hermes agent
+to various messaging platforms (Telegram, Discord, WhatsApp) with:
+- Session management (persistent conversations with reset policies)
+- Dynamic context injection (agent knows where messages come from)
+- Delivery routing (cron job outputs to appropriate channels)
+- Platform-specific toolsets (different capabilities per platform)
+"""
+
+from .config import GatewayConfig, PlatformConfig, HomeChannel, load_gateway_config
+from .session import (
+    SessionContext,
+    SessionStore,
+    SessionResetPolicy,
+    build_session_context_prompt,
+)
+from .delivery import DeliveryRouter, DeliveryTarget
+
+__all__ = [
+    # Config
+    "GatewayConfig",
+    "PlatformConfig", 
+    "HomeChannel",
+    "load_gateway_config",
+    # Session
+    "SessionContext",
+    "SessionStore",
+    "SessionResetPolicy",
+    "build_session_context_prompt",
+    # Delivery
+    "DeliveryRouter",
+    "DeliveryTarget",
+]
--- a/gateway/config.py
+++ b/gateway/config.py
@@ -0,0 +1,350 @@
+"""
+Gateway configuration management.
+
+Handles loading and validating configuration for:
+- Connected platforms (Telegram, Discord, WhatsApp)
+- Home channels for each platform
+- Session reset policies
+- Delivery preferences
+"""
+
+import os
+import json
+from pathlib import Path
+from dataclasses import dataclass, field
+from typing import Dict, List, Optional, Any
+from enum import Enum
+
+
+class Platform(Enum):
+    """Supported messaging platforms."""
+    LOCAL = "local"
+    TELEGRAM = "telegram"
+    DISCORD = "discord"
+    WHATSAPP = "whatsapp"
+    SLACK = "slack"
+
+
+@dataclass
+class HomeChannel:
+    """
+    Default destination for a platform.
+    
+    When a cron job specifies deliver="telegram" without a specific chat ID,
+    messages are sent to this home channel.
+    """
+    platform: Platform
+    chat_id: str
+    name: str  # Human-readable name for display
+    
+    def to_dict(self) -> Dict[str, Any]:
+        return {
+            "platform": self.platform.value,
+            "chat_id": self.chat_id,
+            "name": self.name,
+        }
+    
+    @classmethod
+    def from_dict(cls, data: Dict[str, Any]) -> "HomeChannel":
+        return cls(
+            platform=Platform(data["platform"]),
+            chat_id=str(data["chat_id"]),
+            name=data.get("name", "Home"),
+        )
+
+
+@dataclass
+class SessionResetPolicy:
+    """
+    Controls when sessions reset (lose context).
+    
+    Modes:
+    - "daily": Reset at a specific hour each day
+    - "idle": Reset after N minutes of inactivity
+    - "both": Whichever triggers first (daily boundary OR idle timeout)
+    """
+    mode: str = "both"  # "daily", "idle", or "both"
+    at_hour: int = 4  # Hour for daily reset (0-23, local time)
+    idle_minutes: int = 1440  # Minutes of inactivity before reset (24 hours)
+    
+    def to_dict(self) -> Dict[str, Any]:
+        return {
+            "mode": self.mode,
+            "at_hour": self.at_hour,
+            "idle_minutes": self.idle_minutes,
+        }
+    
+    @classmethod
+    def from_dict(cls, data: Dict[str, Any]) -> "SessionResetPolicy":
+        return cls(
+            mode=data.get("mode", "both"),
+            at_hour=data.get("at_hour", 4),
+            idle_minutes=data.get("idle_minutes", 1440),
+        )
+
+
+@dataclass
+class PlatformConfig:
+    """Configuration for a single messaging platform."""
+    enabled: bool = False
+    token: Optional[str] = None  # Bot token (Telegram, Discord)
+    api_key: Optional[str] = None  # API key if different from token
+    home_channel: Optional[HomeChannel] = None
+    
+    # Platform-specific settings
+    extra: Dict[str, Any] = field(default_factory=dict)
+    
+    def to_dict(self) -> Dict[str, Any]:
+        result = {
+            "enabled": self.enabled,
+            "extra": self.extra,
+        }
+        if self.token:
+            result["token"] = self.token
+        if self.api_key:
+            result["api_key"] = self.api_key
+        if self.home_channel:
+            result["home_channel"] = self.home_channel.to_dict()
+        return result
+    
+    @classmethod
+    def from_dict(cls, data: Dict[str, Any]) -> "PlatformConfig":
+        home_channel = None
+        if "home_channel" in data:
+            home_channel = HomeChannel.from_dict(data["home_channel"])
+        
+        return cls(
+            enabled=data.get("enabled", False),
+            token=data.get("token"),
+            api_key=data.get("api_key"),
+            home_channel=home_channel,
+            extra=data.get("extra", {}),
+        )
+
+
+@dataclass
+class GatewayConfig:
+    """
+    Main gateway configuration.
+    
+    Manages all platform connections, session policies, and delivery settings.
+    """
+    # Platform configurations
+    platforms: Dict[Platform, PlatformConfig] = field(default_factory=dict)
+    
+    # Session reset policies by type
+    default_reset_policy: SessionResetPolicy = field(default_factory=SessionResetPolicy)
+    reset_by_type: Dict[str, SessionResetPolicy] = field(default_factory=dict)
+    reset_by_platform: Dict[Platform, SessionResetPolicy] = field(default_factory=dict)
+    
+    # Reset trigger commands
+    reset_triggers: List[str] = field(default_factory=lambda: ["/new", "/reset"])
+    
+    # Storage paths
+    sessions_dir: Path = field(default_factory=lambda: Path.home() / ".hermes" / "sessions")
+    
+    # Delivery settings
+    always_log_local: bool = True  # Always save cron outputs to local files
+    
+    def get_connected_platforms(self) -> List[Platform]:
+        """Return list of platforms that are enabled and configured."""
+        connected = []
+        for platform, config in self.platforms.items():
+            if config.enabled and (config.token or config.api_key):
+                connected.append(platform)
+        return connected
+    
+    def get_home_channel(self, platform: Platform) -> Optional[HomeChannel]:
+        """Get the home channel for a platform."""
+        config = self.platforms.get(platform)
+        if config:
+            return config.home_channel
+        return None
+    
+    def get_reset_policy(
+        self, 
+        platform: Optional[Platform] = None,
+        session_type: Optional[str] = None
+    ) -> SessionResetPolicy:
+        """
+        Get the appropriate reset policy for a session.
+        
+        Priority: platform override > type override > default
+        """
+        # Platform-specific override takes precedence
+        if platform and platform in self.reset_by_platform:
+            return self.reset_by_platform[platform]
+        
+        # Type-specific override (dm, group, thread)
+        if session_type and session_type in self.reset_by_type:
+            return self.reset_by_type[session_type]
+        
+        return self.default_reset_policy
+    
+    def to_dict(self) -> Dict[str, Any]:
+        return {
+            "platforms": {
+                p.value: c.to_dict() for p, c in self.platforms.items()
+            },
+            "default_reset_policy": self.default_reset_policy.to_dict(),
+            "reset_by_type": {
+                k: v.to_dict() for k, v in self.reset_by_type.items()
+            },
+            "reset_by_platform": {
+                p.value: v.to_dict() for p, v in self.reset_by_platform.items()
+            },
+            "reset_triggers": self.reset_triggers,
+            "sessions_dir": str(self.sessions_dir),
+            "always_log_local": self.always_log_local,
+        }
+    
+    @classmethod
+    def from_dict(cls, data: Dict[str, Any]) -> "GatewayConfig":
+        platforms = {}
+        for platform_name, platform_data in data.get("platforms", {}).items():
+            try:
+                platform = Platform(platform_name)
+                platforms[platform] = PlatformConfig.from_dict(platform_data)
+            except ValueError:
+                pass  # Skip unknown platforms
+        
+        reset_by_type = {}
+        for type_name, policy_data in data.get("reset_by_type", {}).items():
+            reset_by_type[type_name] = SessionResetPolicy.from_dict(policy_data)
+        
+        reset_by_platform = {}
+        for platform_name, policy_data in data.get("reset_by_platform", {}).items():
+            try:
+                platform = Platform(platform_name)
+                reset_by_platform[platform] = SessionResetPolicy.from_dict(policy_data)
+            except ValueError:
+                pass
+        
+        default_policy = SessionResetPolicy()
+        if "default_reset_policy" in data:
+            default_policy = SessionResetPolicy.from_dict(data["default_reset_policy"])
+        
+        sessions_dir = Path.home() / ".hermes" / "sessions"
+        if "sessions_dir" in data:
+            sessions_dir = Path(data["sessions_dir"])
+        
+        return cls(
+            platforms=platforms,
+            default_reset_policy=default_policy,
+            reset_by_type=reset_by_type,
+            reset_by_platform=reset_by_platform,
+            reset_triggers=data.get("reset_triggers", ["/new", "/reset"]),
+            sessions_dir=sessions_dir,
+            always_log_local=data.get("always_log_local", True),
+        )
+
+
+def load_gateway_config() -> GatewayConfig:
+    """
+    Load gateway configuration from multiple sources.
+    
+    Priority (highest to lowest):
+    1. Environment variables
+    2. ~/.hermes/gateway.json
+    3. cli-config.yaml gateway section
+    4. Defaults
+    """
+    config = GatewayConfig()
+    
+    # Try loading from ~/.hermes/gateway.json
+    gateway_config_path = Path.home() / ".hermes" / "gateway.json"
+    if gateway_config_path.exists():
+        try:
+            with open(gateway_config_path, "r") as f:
+                data = json.load(f)
+                config = GatewayConfig.from_dict(data)
+        except Exception as e:
+            print(f"[gateway] Warning: Failed to load {gateway_config_path}: {e}")
+    
+    # Override with environment variables
+    _apply_env_overrides(config)
+    
+    return config
+
+
+def _apply_env_overrides(config: GatewayConfig) -> None:
+    """Apply environment variable overrides to config."""
+    
+    # Telegram
+    telegram_token = os.getenv("TELEGRAM_BOT_TOKEN")
+    if telegram_token:
+        if Platform.TELEGRAM not in config.platforms:
+            config.platforms[Platform.TELEGRAM] = PlatformConfig()
+        config.platforms[Platform.TELEGRAM].enabled = True
+        config.platforms[Platform.TELEGRAM].token = telegram_token
+    
+    telegram_home = os.getenv("TELEGRAM_HOME_CHANNEL")
+    if telegram_home and Platform.TELEGRAM in config.platforms:
+        config.platforms[Platform.TELEGRAM].home_channel = HomeChannel(
+            platform=Platform.TELEGRAM,
+            chat_id=telegram_home,
+            name=os.getenv("TELEGRAM_HOME_CHANNEL_NAME", "Home"),
+        )
+    
+    # Discord
+    discord_token = os.getenv("DISCORD_BOT_TOKEN")
+    if discord_token:
+        if Platform.DISCORD not in config.platforms:
+            config.platforms[Platform.DISCORD] = PlatformConfig()
+        config.platforms[Platform.DISCORD].enabled = True
+        config.platforms[Platform.DISCORD].token = discord_token
+    
+    discord_home = os.getenv("DISCORD_HOME_CHANNEL")
+    if discord_home and Platform.DISCORD in config.platforms:
+        config.platforms[Platform.DISCORD].home_channel = HomeChannel(
+            platform=Platform.DISCORD,
+            chat_id=discord_home,
+            name=os.getenv("DISCORD_HOME_CHANNEL_NAME", "Home"),
+        )
+    
+    # WhatsApp (typically uses different auth mechanism)
+    whatsapp_enabled = os.getenv("WHATSAPP_ENABLED", "").lower() in ("true", "1", "yes")
+    if whatsapp_enabled:
+        if Platform.WHATSAPP not in config.platforms:
+            config.platforms[Platform.WHATSAPP] = PlatformConfig()
+        config.platforms[Platform.WHATSAPP].enabled = True
+    
+    # Slack
+    slack_token = os.getenv("SLACK_BOT_TOKEN")
+    if slack_token:
+        if Platform.SLACK not in config.platforms:
+            config.platforms[Platform.SLACK] = PlatformConfig()
+        config.platforms[Platform.SLACK].enabled = True
+        config.platforms[Platform.SLACK].token = slack_token
+        # Home channel
+        slack_home = os.getenv("SLACK_HOME_CHANNEL")
+        if slack_home:
+            config.platforms[Platform.SLACK].home_channel = HomeChannel(
+                platform=Platform.SLACK,
+                chat_id=slack_home,
+                name=os.getenv("SLACK_HOME_CHANNEL_NAME", ""),
+            )
+    
+    # Session settings
+    idle_minutes = os.getenv("SESSION_IDLE_MINUTES")
+    if idle_minutes:
+        try:
+            config.default_reset_policy.idle_minutes = int(idle_minutes)
+        except ValueError:
+            pass
+    
+    reset_hour = os.getenv("SESSION_RESET_HOUR")
+    if reset_hour:
+        try:
+            config.default_reset_policy.at_hour = int(reset_hour)
+        except ValueError:
+            pass
+
+
+def save_gateway_config(config: GatewayConfig) -> None:
+    """Save gateway configuration to ~/.hermes/gateway.json."""
+    gateway_config_path = Path.home() / ".hermes" / "gateway.json"
+    gateway_config_path.parent.mkdir(parents=True, exist_ok=True)
+    
+    with open(gateway_config_path, "w") as f:
+        json.dump(config.to_dict(), f, indent=2)
--- a/gateway/delivery.py
+++ b/gateway/delivery.py
@@ -0,0 +1,318 @@
+"""
+Delivery routing for cron job outputs and agent responses.
+
+Routes messages to the appropriate destination based on:
+- Explicit targets (e.g., "telegram:123456789")
+- Platform home channels (e.g., "telegram" → home channel)
+- Origin (back to where the job was created)
+- Local (always saved to files)
+"""
+
+import json
+from pathlib import Path
+from datetime import datetime
+from dataclasses import dataclass
+from typing import Dict, List, Optional, Any, Union
+from enum import Enum
+
+from .config import Platform, GatewayConfig, HomeChannel
+from .session import SessionSource
+
+
+@dataclass
+class DeliveryTarget:
+    """
+    A single delivery target.
+    
+    Represents where a message should be sent:
+    - "origin" → back to source
+    - "local" → save to local files
+    - "telegram" → Telegram home channel
+    - "telegram:123456" → specific Telegram chat
+    """
+    platform: Platform
+    chat_id: Optional[str] = None  # None means use home channel
+    is_origin: bool = False
+    is_explicit: bool = False  # True if chat_id was explicitly specified
+    
+    @classmethod
+    def parse(cls, target: str, origin: Optional[SessionSource] = None) -> "DeliveryTarget":
+        """
+        Parse a delivery target string.
+        
+        Formats:
+        - "origin" → back to source
+        - "local" → local files only
+        - "telegram" → Telegram home channel
+        - "telegram:123456" → specific Telegram chat
+        """
+        target = target.strip().lower()
+        
+        if target == "origin":
+            if origin:
+                return cls(
+                    platform=origin.platform,
+                    chat_id=origin.chat_id,
+                    is_origin=True,
+                )
+            else:
+                # Fallback to local if no origin
+                return cls(platform=Platform.LOCAL, is_origin=True)
+        
+        if target == "local":
+            return cls(platform=Platform.LOCAL)
+        
+        # Check for platform:chat_id format
+        if ":" in target:
+            platform_str, chat_id = target.split(":", 1)
+            try:
+                platform = Platform(platform_str)
+                return cls(platform=platform, chat_id=chat_id, is_explicit=True)
+            except ValueError:
+                # Unknown platform, treat as local
+                return cls(platform=Platform.LOCAL)
+        
+        # Just a platform name (use home channel)
+        try:
+            platform = Platform(target)
+            return cls(platform=platform)
+        except ValueError:
+            # Unknown platform, treat as local
+            return cls(platform=Platform.LOCAL)
+    
+    def to_string(self) -> str:
+        """Convert back to string format."""
+        if self.is_origin:
+            return "origin"
+        if self.platform == Platform.LOCAL:
+            return "local"
+        if self.chat_id:
+            return f"{self.platform.value}:{self.chat_id}"
+        return self.platform.value
+
+
+class DeliveryRouter:
+    """
+    Routes messages to appropriate destinations.
+    
+    Handles the logic of resolving delivery targets and dispatching
+    messages to the right platform adapters.
+    """
+    
+    def __init__(self, config: GatewayConfig, adapters: Dict[Platform, Any] = None):
+        """
+        Initialize the delivery router.
+        
+        Args:
+            config: Gateway configuration
+            adapters: Dict mapping platforms to their adapter instances
+        """
+        self.config = config
+        self.adapters = adapters or {}
+        self.output_dir = Path.home() / ".hermes" / "cron" / "output"
+    
+    def resolve_targets(
+        self,
+        deliver: Union[str, List[str]],
+        origin: Optional[SessionSource] = None
+    ) -> List[DeliveryTarget]:
+        """
+        Resolve delivery specification to concrete targets.
+        
+        Args:
+            deliver: Delivery spec - "origin", "telegram", ["local", "discord"], etc.
+            origin: The source where the request originated (for "origin" target)
+        
+        Returns:
+            List of resolved delivery targets
+        """
+        if isinstance(deliver, str):
+            deliver = [deliver]
+        
+        targets = []
+        seen_platforms = set()
+        
+        for target_str in deliver:
+            target = DeliveryTarget.parse(target_str, origin)
+            
+            # Resolve home channel if needed
+            if target.chat_id is None and target.platform != Platform.LOCAL:
+                home = self.config.get_home_channel(target.platform)
+                if home:
+                    target.chat_id = home.chat_id
+                else:
+                    # No home channel configured, skip this platform
+                    continue
+            
+            # Deduplicate
+            key = (target.platform, target.chat_id)
+            if key not in seen_platforms:
+                seen_platforms.add(key)
+                targets.append(target)
+        
+        # Always include local if configured
+        if self.config.always_log_local:
+            local_key = (Platform.LOCAL, None)
+            if local_key not in seen_platforms:
+                targets.append(DeliveryTarget(platform=Platform.LOCAL))
+        
+        return targets
+    
+    async def deliver(
+        self,
+        content: str,
+        targets: List[DeliveryTarget],
+        job_id: Optional[str] = None,
+        job_name: Optional[str] = None,
+        metadata: Optional[Dict[str, Any]] = None
+    ) -> Dict[str, Any]:
+        """
+        Deliver content to all specified targets.
+        
+        Args:
+            content: The message/output to deliver
+            targets: List of delivery targets
+            job_id: Optional job ID (for cron jobs)
+            job_name: Optional job name
+            metadata: Additional metadata to include
+        
+        Returns:
+            Dict with delivery results per target
+        """
+        results = {}
+        
+        for target in targets:
+            try:
+                if target.platform == Platform.LOCAL:
+                    result = self._deliver_local(content, job_id, job_name, metadata)
+                else:
+                    result = await self._deliver_to_platform(target, content, metadata)
+                
+                results[target.to_string()] = {
+                    "success": True,
+                    "result": result
+                }
+            except Exception as e:
+                results[target.to_string()] = {
+                    "success": False,
+                    "error": str(e)
+                }
+        
+        return results
+    
+    def _deliver_local(
+        self,
+        content: str,
+        job_id: Optional[str],
+        job_name: Optional[str],
+        metadata: Optional[Dict[str, Any]]
+    ) -> Dict[str, Any]:
+        """Save content to local files."""
+        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
+        
+        if job_id:
+            output_path = self.output_dir / job_id / f"{timestamp}.md"
+        else:
+            output_path = self.output_dir / "misc" / f"{timestamp}.md"
+        
+        output_path.parent.mkdir(parents=True, exist_ok=True)
+        
+        # Build the output document
+        lines = []
+        if job_name:
+            lines.append(f"# {job_name}")
+        else:
+            lines.append("# Delivery Output")
+        
+        lines.append("")
+        lines.append(f"**Timestamp:** {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
+        
+        if job_id:
+            lines.append(f"**Job ID:** {job_id}")
+        
+        if metadata:
+            for key, value in metadata.items():
+                lines.append(f"**{key}:** {value}")
+        
+        lines.append("")
+        lines.append("---")
+        lines.append("")
+        lines.append(content)
+        
+        output_path.write_text("\n".join(lines))
+        
+        return {
+            "path": str(output_path),
+            "timestamp": timestamp
+        }
+    
+    async def _deliver_to_platform(
+        self,
+        target: DeliveryTarget,
+        content: str,
+        metadata: Optional[Dict[str, Any]]
+    ) -> Dict[str, Any]:
+        """Deliver content to a messaging platform."""
+        adapter = self.adapters.get(target.platform)
+        
+        if not adapter:
+            raise ValueError(f"No adapter configured for {target.platform.value}")
+        
+        if not target.chat_id:
+            raise ValueError(f"No chat ID for {target.platform.value} delivery")
+        
+        # Call the adapter's send method
+        # Adapters should implement: async def send(chat_id: str, content: str) -> Dict
+        return await adapter.send(target.chat_id, content, metadata=metadata)
+
+
+def parse_deliver_spec(
+    deliver: Optional[Union[str, List[str]]],
+    origin: Optional[SessionSource] = None,
+    default: str = "origin"
+) -> Union[str, List[str]]:
+    """
+    Normalize a delivery specification.
+    
+    If None or empty, returns the default.
+    """
+    if not deliver:
+        return default
+    return deliver
+
+
+def build_delivery_context_for_tool(
+    config: GatewayConfig,
+    origin: Optional[SessionSource] = None
+) -> Dict[str, Any]:
+    """
+    Build context for the schedule_cronjob tool to understand delivery options.
+    
+    This is passed to the tool so it can validate and explain delivery targets.
+    """
+    connected = config.get_connected_platforms()
+    
+    options = {
+        "origin": {
+            "description": "Back to where this job was created",
+            "available": origin is not None,
+        },
+        "local": {
+            "description": "Save to local files only",
+            "available": True,
+        }
+    }
+    
+    for platform in connected:
+        home = config.get_home_channel(platform)
+        options[platform.value] = {
+            "description": f"{platform.value.title()} home channel",
+            "available": True,
+            "home_channel": home.to_dict() if home else None,
+        }
+    
+    return {
+        "origin": origin.to_dict() if origin else None,
+        "options": options,
+        "always_log_local": config.always_log_local,
+    }
--- a/gateway/hooks.py
+++ b/gateway/hooks.py
@@ -0,0 +1,150 @@
+"""
+Event Hook System
+
+A lightweight event-driven system that fires handlers at key lifecycle points.
+Hooks are discovered from ~/.hermes/hooks/ directories, each containing:
+  - HOOK.yaml  (metadata: name, description, events list)
+  - handler.py (Python handler with async def handle(event_type, context))
+
+Events:
+  - gateway:startup     -- Gateway process starts
+  - session:start       -- New session created
+  - session:reset       -- User ran /new or /reset
+  - agent:start         -- Agent begins processing a message
+  - agent:step          -- Each turn in the tool-calling loop
+  - agent:end           -- Agent finishes processing
+  - command:*           -- Any slash command executed (wildcard match)
+
+Errors in hooks are caught and logged but never block the main pipeline.
+"""
+
+import asyncio
+import importlib.util
+import os
+from pathlib import Path
+from typing import Any, Callable, Dict, List, Optional
+
+import yaml
+
+
+HOOKS_DIR = Path(os.path.expanduser("~/.hermes/hooks"))
+
+
+class HookRegistry:
+    """
+    Discovers, loads, and fires event hooks.
+
+    Usage:
+        registry = HookRegistry()
+        registry.discover_and_load()
+        await registry.emit("agent:start", {"platform": "telegram", ...})
+    """
+
+    def __init__(self):
+        # event_type -> [handler_fn, ...]
+        self._handlers: Dict[str, List[Callable]] = {}
+        self._loaded_hooks: List[dict] = []  # metadata for listing
+
+    @property
+    def loaded_hooks(self) -> List[dict]:
+        """Return metadata about all loaded hooks."""
+        return list(self._loaded_hooks)
+
+    def discover_and_load(self) -> None:
+        """
+        Scan the hooks directory for hook directories and load their handlers.
+
+        Each hook directory must contain:
+          - HOOK.yaml with at least 'name' and 'events' keys
+          - handler.py with a top-level 'handle' function (sync or async)
+        """
+        if not HOOKS_DIR.exists():
+            return
+
+        for hook_dir in sorted(HOOKS_DIR.iterdir()):
+            if not hook_dir.is_dir():
+                continue
+
+            manifest_path = hook_dir / "HOOK.yaml"
+            handler_path = hook_dir / "handler.py"
+
+            if not manifest_path.exists() or not handler_path.exists():
+                continue
+
+            try:
+                manifest = yaml.safe_load(manifest_path.read_text(encoding="utf-8"))
+                if not manifest or not isinstance(manifest, dict):
+                    print(f"[hooks] Skipping {hook_dir.name}: invalid HOOK.yaml", flush=True)
+                    continue
+
+                hook_name = manifest.get("name", hook_dir.name)
+                events = manifest.get("events", [])
+                if not events:
+                    print(f"[hooks] Skipping {hook_name}: no events declared", flush=True)
+                    continue
+
+                # Dynamically load the handler module
+                spec = importlib.util.spec_from_file_location(
+                    f"hermes_hook_{hook_name}", handler_path
+                )
+                if spec is None or spec.loader is None:
+                    print(f"[hooks] Skipping {hook_name}: could not load handler.py", flush=True)
+                    continue
+
+                module = importlib.util.module_from_spec(spec)
+                spec.loader.exec_module(module)
+
+                handle_fn = getattr(module, "handle", None)
+                if handle_fn is None:
+                    print(f"[hooks] Skipping {hook_name}: no 'handle' function found", flush=True)
+                    continue
+
+                # Register the handler for each declared event
+                for event in events:
+                    self._handlers.setdefault(event, []).append(handle_fn)
+
+                self._loaded_hooks.append({
+                    "name": hook_name,
+                    "description": manifest.get("description", ""),
+                    "events": events,
+                    "path": str(hook_dir),
+                })
+
+                print(f"[hooks] Loaded hook '{hook_name}' for events: {events}", flush=True)
+
+            except Exception as e:
+                print(f"[hooks] Error loading hook {hook_dir.name}: {e}", flush=True)
+
+    async def emit(self, event_type: str, context: Optional[Dict[str, Any]] = None) -> None:
+        """
+        Fire all handlers registered for an event.
+
+        Supports wildcard matching: handlers registered for "command:*" will
+        fire for any "command:..." event. Handlers registered for a base type
+        like "agent" won't fire for "agent:start" -- only exact matches and
+        explicit wildcards.
+
+        Args:
+            event_type: The event identifier (e.g. "agent:start").
+            context:    Optional dict with event-specific data.
+        """
+        if context is None:
+            context = {}
+
+        # Collect handlers: exact match + wildcard match
+        handlers = list(self._handlers.get(event_type, []))
+
+        # Check for wildcard patterns (e.g., "command:*" matches "command:reset")
+        if ":" in event_type:
+            base = event_type.split(":")[0]
+            wildcard_key = f"{base}:*"
+            handlers.extend(self._handlers.get(wildcard_key, []))
+
+        for fn in handlers:
+            try:
+                result = fn(event_type, context)
+                # Support both sync and async handlers
+                if asyncio.iscoroutine(result):
+                    await result
+            except Exception as e:
+                print(f"[hooks] Error in handler for '{event_type}': {e}", flush=True)
--- a/gateway/pairing.py
+++ b/gateway/pairing.py
@@ -0,0 +1,282 @@
+"""
+DM Pairing System
+
+Code-based approval flow for authorizing new users on messaging platforms.
+Instead of static allowlists with user IDs, unknown users receive a one-time
+pairing code that the bot owner approves via the CLI.
+
+Security features (based on OWASP + NIST SP 800-63-4 guidance):
+  - 8-char codes from 32-char unambiguous alphabet (no 0/O/1/I)
+  - Cryptographic randomness via secrets.choice()
+  - 1-hour code expiry
+  - Max 3 pending codes per platform
+  - Rate limiting: 1 request per user per 10 minutes
+  - Lockout after 5 failed approval attempts (1 hour)
+  - File permissions: chmod 0600 on all data files
+  - Codes are never logged to stdout
+
+Storage: ~/.hermes/pairing/
+"""
+
+import json
+import os
+import secrets
+import time
+from pathlib import Path
+from typing import Optional
+
+
+# Unambiguous alphabet -- excludes 0/O, 1/I to prevent confusion
+ALPHABET = "ABCDEFGHJKLMNPQRSTUVWXYZ23456789"
+CODE_LENGTH = 8
+
+# Timing constants
+CODE_TTL_SECONDS = 3600             # Codes expire after 1 hour
+RATE_LIMIT_SECONDS = 600            # 1 request per user per 10 minutes
+LOCKOUT_SECONDS = 3600              # Lockout duration after too many failures
+
+# Limits
+MAX_PENDING_PER_PLATFORM = 3        # Max pending codes per platform
+MAX_FAILED_ATTEMPTS = 5             # Failed approvals before lockout
+
+PAIRING_DIR = Path(os.path.expanduser("~/.hermes/pairing"))
+
+
+def _secure_write(path: Path, data: str) -> None:
+    """Write data to file with restrictive permissions (owner read/write only)."""
+    path.parent.mkdir(parents=True, exist_ok=True)
+    path.write_text(data, encoding="utf-8")
+    try:
+        os.chmod(path, 0o600)
+    except OSError:
+        pass  # Windows doesn't support chmod the same way
+
+
+class PairingStore:
+    """
+    Manages pairing codes and approved user lists.
+
+    Data files per platform:
+      - {platform}-pending.json   : pending pairing requests
+      - {platform}-approved.json  : approved (paired) users
+      - _rate_limits.json         : rate limit tracking
+    """
+
+    def __init__(self):
+        PAIRING_DIR.mkdir(parents=True, exist_ok=True)
+
+    def _pending_path(self, platform: str) -> Path:
+        return PAIRING_DIR / f"{platform}-pending.json"
+
+    def _approved_path(self, platform: str) -> Path:
+        return PAIRING_DIR / f"{platform}-approved.json"
+
+    def _rate_limit_path(self) -> Path:
+        return PAIRING_DIR / "_rate_limits.json"
+
+    def _load_json(self, path: Path) -> dict:
+        if path.exists():
+            try:
+                return json.loads(path.read_text(encoding="utf-8"))
+            except (json.JSONDecodeError, OSError):
+                return {}
+        return {}
+
+    def _save_json(self, path: Path, data: dict) -> None:
+        _secure_write(path, json.dumps(data, indent=2, ensure_ascii=False))
+
+    # ----- Approved users -----
+
+    def is_approved(self, platform: str, user_id: str) -> bool:
+        """Check if a user is approved (paired) on a platform."""
+        approved = self._load_json(self._approved_path(platform))
+        return user_id in approved
+
+    def list_approved(self, platform: str = None) -> list:
+        """List approved users, optionally filtered by platform."""
+        results = []
+        platforms = [platform] if platform else self._all_platforms("approved")
+        for p in platforms:
+            approved = self._load_json(self._approved_path(p))
+            for uid, info in approved.items():
+                results.append({"platform": p, "user_id": uid, **info})
+        return results
+
+    def _approve_user(self, platform: str, user_id: str, user_name: str = "") -> None:
+        """Add a user to the approved list."""
+        approved = self._load_json(self._approved_path(platform))
+        approved[user_id] = {
+            "user_name": user_name,
+            "approved_at": time.time(),
+        }
+        self._save_json(self._approved_path(platform), approved)
+
+    def revoke(self, platform: str, user_id: str) -> bool:
+        """Remove a user from the approved list. Returns True if found."""
+        path = self._approved_path(platform)
+        approved = self._load_json(path)
+        if user_id in approved:
+            del approved[user_id]
+            self._save_json(path, approved)
+            return True
+        return False
+
+    # ----- Pending codes -----
+
+    def generate_code(
+        self, platform: str, user_id: str, user_name: str = ""
+    ) -> Optional[str]:
+        """
+        Generate a pairing code for a new user.
+
+        Returns the code string, or None if:
+          - User is rate-limited (too recent request)
+          - Max pending codes reached for this platform
+          - User/platform is in lockout due to failed attempts
+        """
+        self._cleanup_expired(platform)
+
+        # Check lockout
+        if self._is_locked_out(platform):
+            return None
+
+        # Check rate limit for this specific user
+        if self._is_rate_limited(platform, user_id):
+            return None
+
+        # Check max pending
+        pending = self._load_json(self._pending_path(platform))
+        if len(pending) >= MAX_PENDING_PER_PLATFORM:
+            return None
+
+        # Generate cryptographically random code
+        code = "".join(secrets.choice(ALPHABET) for _ in range(CODE_LENGTH))
+
+        # Store pending request
+        pending[code] = {
+            "user_id": user_id,
+            "user_name": user_name,
+            "created_at": time.time(),
+        }
+        self._save_json(self._pending_path(platform), pending)
+
+        # Record rate limit
+        self._record_rate_limit(platform, user_id)
+
+        return code
+
+    def approve_code(self, platform: str, code: str) -> Optional[dict]:
+        """
+        Approve a pairing code. Adds the user to the approved list.
+
+        Returns {user_id, user_name} on success, None if code is invalid/expired.
+        """
+        self._cleanup_expired(platform)
+        code = code.upper().strip()
+
+        pending = self._load_json(self._pending_path(platform))
+        if code not in pending:
+            self._record_failed_attempt(platform)
+            return None
+
+        entry = pending.pop(code)
+        self._save_json(self._pending_path(platform), pending)
+
+        # Add to approved list
+        self._approve_user(platform, entry["user_id"], entry.get("user_name", ""))
+
+        return {
+            "user_id": entry["user_id"],
+            "user_name": entry.get("user_name", ""),
+        }
+
+    def list_pending(self, platform: str = None) -> list:
+        """List pending pairing requests, optionally filtered by platform."""
+        results = []
+        platforms = [platform] if platform else self._all_platforms("pending")
+        for p in platforms:
+            self._cleanup_expired(p)
+            pending = self._load_json(self._pending_path(p))
+            for code, info in pending.items():
+                age_min = int((time.time() - info["created_at"]) / 60)
+                results.append({
+                    "platform": p,
+                    "code": code,
+                    "user_id": info["user_id"],
+                    "user_name": info.get("user_name", ""),
+                    "age_minutes": age_min,
+                })
+        return results
+
+    def clear_pending(self, platform: str = None) -> int:
+        """Clear all pending requests. Returns count removed."""
+        count = 0
+        platforms = [platform] if platform else self._all_platforms("pending")
+        for p in platforms:
+            pending = self._load_json(self._pending_path(p))
+            count += len(pending)
+            self._save_json(self._pending_path(p), {})
+        return count
+
+    # ----- Rate limiting and lockout -----
+
+    def _is_rate_limited(self, platform: str, user_id: str) -> bool:
+        """Check if a user has requested a code too recently."""
+        limits = self._load_json(self._rate_limit_path())
+        key = f"{platform}:{user_id}"
+        last_request = limits.get(key, 0)
+        return (time.time() - last_request) < RATE_LIMIT_SECONDS
+
+    def _record_rate_limit(self, platform: str, user_id: str) -> None:
+        """Record the time of a pairing request for rate limiting."""
+        limits = self._load_json(self._rate_limit_path())
+        key = f"{platform}:{user_id}"
+        limits[key] = time.time()
+        self._save_json(self._rate_limit_path(), limits)
+
+    def _is_locked_out(self, platform: str) -> bool:
+        """Check if a platform is in lockout due to failed approval attempts."""
+        limits = self._load_json(self._rate_limit_path())
+        lockout_key = f"_lockout:{platform}"
+        lockout_until = limits.get(lockout_key, 0)
+        return time.time() < lockout_until
+
+    def _record_failed_attempt(self, platform: str) -> None:
+        """Record a failed approval attempt. Triggers lockout after MAX_FAILED_ATTEMPTS."""
+        limits = self._load_json(self._rate_limit_path())
+        fail_key = f"_failures:{platform}"
+        fails = limits.get(fail_key, 0) + 1
+        limits[fail_key] = fails
+        if fails >= MAX_FAILED_ATTEMPTS:
+            lockout_key = f"_lockout:{platform}"
+            limits[lockout_key] = time.time() + LOCKOUT_SECONDS
+            limits[fail_key] = 0  # Reset counter
+            print(f"[pairing] Platform {platform} locked out for {LOCKOUT_SECONDS}s "
+                  f"after {MAX_FAILED_ATTEMPTS} failed attempts", flush=True)
+        self._save_json(self._rate_limit_path(), limits)
+
+    # ----- Cleanup -----
+
+    def _cleanup_expired(self, platform: str) -> None:
+        """Remove expired pending codes."""
+        path = self._pending_path(platform)
+        pending = self._load_json(path)
+        now = time.time()
+        expired = [
+            code for code, info in pending.items()
+            if (now - info["created_at"]) > CODE_TTL_SECONDS
+        ]
+        if expired:
+            for code in expired:
+                del pending[code]
+            self._save_json(path, pending)
+
+    def _all_platforms(self, suffix: str) -> list:
+        """List all platforms that have data files of a given suffix."""
+        platforms = []
+        for f in PAIRING_DIR.iterdir():
+            if f.name.endswith(f"-{suffix}.json"):
+                platform = f.name.replace(f"-{suffix}.json", "")
+                if not platform.startswith("_"):
+                    platforms.append(platform)
+        return platforms
--- a/gateway/platforms/init.py
+++ b/gateway/platforms/init.py
@@ -0,0 +1,17 @@
+"""
+Platform adapters for messaging integrations.
+
+Each adapter handles:
+- Receiving messages from a platform
+- Sending messages/responses back
+- Platform-specific authentication
+- Message formatting and media handling
+"""
+
+from .base import BasePlatformAdapter, MessageEvent, SendResult
+
+__all__ = [
+    "BasePlatformAdapter",
+    "MessageEvent",
+    "SendResult",
+]
--- a/gateway/platforms/base.py
+++ b/gateway/platforms/base.py
@@ -0,0 +1,691 @@
+"""
+Base platform adapter interface.
+
+All platform adapters (Telegram, Discord, WhatsApp) inherit from this
+and implement the required methods.
+"""
+
+import asyncio
+import os
+import re
+import uuid
+from abc import ABC, abstractmethod
+from dataclasses import dataclass, field
+from datetime import datetime
+from pathlib import Path
+from typing import Dict, List, Optional, Any, Callable, Awaitable, Tuple
+from enum import Enum
+
+import sys
+sys.path.insert(0, str(__file__).rsplit("/", 3)[0])
+
+from gateway.config import Platform, PlatformConfig
+from gateway.session import SessionSource
+
+
+# ---------------------------------------------------------------------------
+# Image cache utilities
+#
+# When users send images on messaging platforms, we download them to a local
+# cache directory so they can be analyzed by the vision tool (which accepts
+# local file paths). This avoids issues with ephemeral platform URLs
+# (e.g. Telegram file URLs expire after ~1 hour).
+# ---------------------------------------------------------------------------
+
+# Default location: ~/.hermes/image_cache/
+IMAGE_CACHE_DIR = Path(os.path.expanduser("~/.hermes/image_cache"))
+
+
+def get_image_cache_dir() -> Path:
+    """Return the image cache directory, creating it if it doesn't exist."""
+    IMAGE_CACHE_DIR.mkdir(parents=True, exist_ok=True)
+    return IMAGE_CACHE_DIR
+
+
+def cache_image_from_bytes(data: bytes, ext: str = ".jpg") -> str:
+    """
+    Save raw image bytes to the cache and return the absolute file path.
+
+    Args:
+        data: Raw image bytes.
+        ext:  File extension including the dot (e.g. ".jpg", ".png").
+
+    Returns:
+        Absolute path to the cached image file as a string.
+    """
+    cache_dir = get_image_cache_dir()
+    filename = f"img_{uuid.uuid4().hex[:12]}{ext}"
+    filepath = cache_dir / filename
+    filepath.write_bytes(data)
+    return str(filepath)
+
+
+async def cache_image_from_url(url: str, ext: str = ".jpg") -> str:
+    """
+    Download an image from a URL and save it to the local cache.
+
+    Uses httpx for async download with a reasonable timeout.
+
+    Args:
+        url: The HTTP/HTTPS URL to download from.
+        ext: File extension including the dot (e.g. ".jpg", ".png").
+
+    Returns:
+        Absolute path to the cached image file as a string.
+    """
+    import httpx
+
+    async with httpx.AsyncClient(timeout=30.0, follow_redirects=True) as client:
+        response = await client.get(
+            url,
+            headers={
+                "User-Agent": "Mozilla/5.0 (compatible; HermesAgent/1.0)",
+                "Accept": "image/*,*/*;q=0.8",
+            },
+        )
+        response.raise_for_status()
+        return cache_image_from_bytes(response.content, ext)
+
+
+def cleanup_image_cache(max_age_hours: int = 24) -> int:
+    """
+    Delete cached images older than *max_age_hours*.
+
+    Returns the number of files removed.
+    """
+    import time
+
+    cache_dir = get_image_cache_dir()
+    cutoff = time.time() - (max_age_hours * 3600)
+    removed = 0
+    for f in cache_dir.iterdir():
+        if f.is_file() and f.stat().st_mtime < cutoff:
+            try:
+                f.unlink()
+                removed += 1
+            except OSError:
+                pass
+    return removed
+
+
+# ---------------------------------------------------------------------------
+# Audio cache utilities
+#
+# Same pattern as image cache -- voice messages from platforms are downloaded
+# here so the STT tool (OpenAI Whisper) can transcribe them from local files.
+# ---------------------------------------------------------------------------
+
+AUDIO_CACHE_DIR = Path(os.path.expanduser("~/.hermes/audio_cache"))
+
+
+def get_audio_cache_dir() -> Path:
+    """Return the audio cache directory, creating it if it doesn't exist."""
+    AUDIO_CACHE_DIR.mkdir(parents=True, exist_ok=True)
+    return AUDIO_CACHE_DIR
+
+
+def cache_audio_from_bytes(data: bytes, ext: str = ".ogg") -> str:
+    """
+    Save raw audio bytes to the cache and return the absolute file path.
+
+    Args:
+        data: Raw audio bytes.
+        ext:  File extension including the dot (e.g. ".ogg", ".mp3").
+
+    Returns:
+        Absolute path to the cached audio file as a string.
+    """
+    cache_dir = get_audio_cache_dir()
+    filename = f"audio_{uuid.uuid4().hex[:12]}{ext}"
+    filepath = cache_dir / filename
+    filepath.write_bytes(data)
+    return str(filepath)
+
+
+async def cache_audio_from_url(url: str, ext: str = ".ogg") -> str:
+    """
+    Download an audio file from a URL and save it to the local cache.
+
+    Args:
+        url: The HTTP/HTTPS URL to download from.
+        ext: File extension including the dot (e.g. ".ogg", ".mp3").
+
+    Returns:
+        Absolute path to the cached audio file as a string.
+    """
+    import httpx
+
+    async with httpx.AsyncClient(timeout=30.0, follow_redirects=True) as client:
+        response = await client.get(
+            url,
+            headers={
+                "User-Agent": "Mozilla/5.0 (compatible; HermesAgent/1.0)",
+                "Accept": "audio/*,*/*;q=0.8",
+            },
+        )
+        response.raise_for_status()
+        return cache_audio_from_bytes(response.content, ext)
+
+
+class MessageType(Enum):
+    """Types of incoming messages."""
+    TEXT = "text"
+    PHOTO = "photo"
+    VIDEO = "video"
+    AUDIO = "audio"
+    VOICE = "voice"
+    DOCUMENT = "document"
+    STICKER = "sticker"
+    COMMAND = "command"  # /command style
+
+
+@dataclass
+class MessageEvent:
+    """
+    Incoming message from a platform.
+    
+    Normalized representation that all adapters produce.
+    """
+    # Message content
+    text: str
+    message_type: MessageType = MessageType.TEXT
+    
+    # Source information
+    source: SessionSource = None
+    
+    # Original platform data
+    raw_message: Any = None
+    message_id: Optional[str] = None
+    
+    # Media attachments
+    media_urls: List[str] = field(default_factory=list)
+    media_types: List[str] = field(default_factory=list)
+    
+    # Reply context
+    reply_to_message_id: Optional[str] = None
+    
+    # Timestamps
+    timestamp: datetime = field(default_factory=datetime.now)
+    
+    def is_command(self) -> bool:
+        """Check if this is a command message (e.g., /new, /reset)."""
+        return self.text.startswith("/")
+    
+    def get_command(self) -> Optional[str]:
+        """Extract command name if this is a command message."""
+        if not self.is_command():
+            return None
+        # Split on space and get first word, strip the /
+        parts = self.text.split(maxsplit=1)
+        return parts[0][1:].lower() if parts else None
+    
+    def get_command_args(self) -> str:
+        """Get the arguments after a command."""
+        if not self.is_command():
+            return self.text
+        parts = self.text.split(maxsplit=1)
+        return parts[1] if len(parts) > 1 else ""
+
+
+@dataclass 
+class SendResult:
+    """Result of sending a message."""
+    success: bool
+    message_id: Optional[str] = None
+    error: Optional[str] = None
+    raw_response: Any = None
+
+
+# Type for message handlers
+MessageHandler = Callable[[MessageEvent], Awaitable[Optional[str]]]
+
+
+class BasePlatformAdapter(ABC):
+    """
+    Base class for platform adapters.
+    
+    Subclasses implement platform-specific logic for:
+    - Connecting and authenticating
+    - Receiving messages
+    - Sending messages/responses
+    - Handling media
+    """
+    
+    def __init__(self, config: PlatformConfig, platform: Platform):
+        self.config = config
+        self.platform = platform
+        self._message_handler: Optional[MessageHandler] = None
+        self._running = False
+        
+        # Track active message handlers per session for interrupt support
+        # Key: session_key (e.g., chat_id), Value: (event, asyncio.Event for interrupt)
+        self._active_sessions: Dict[str, asyncio.Event] = {}
+        self._pending_messages: Dict[str, MessageEvent] = {}
+    
+    @property
+    def name(self) -> str:
+        """Human-readable name for this adapter."""
+        return self.platform.value.title()
+    
+    @property
+    def is_connected(self) -> bool:
+        """Check if adapter is currently connected."""
+        return self._running
+    
+    def set_message_handler(self, handler: MessageHandler) -> None:
+        """
+        Set the handler for incoming messages.
+        
+        The handler receives a MessageEvent and should return
+        an optional response string.
+        """
+        self._message_handler = handler
+    
+    @abstractmethod
+    async def connect(self) -> bool:
+        """
+        Connect to the platform and start receiving messages.
+        
+        Returns True if connection was successful.
+        """
+        pass
+    
+    @abstractmethod
+    async def disconnect(self) -> None:
+        """Disconnect from the platform."""
+        pass
+    
+    @abstractmethod
+    async def send(
+        self,
+        chat_id: str,
+        content: str,
+        reply_to: Optional[str] = None,
+        metadata: Optional[Dict[str, Any]] = None
+    ) -> SendResult:
+        """
+        Send a message to a chat.
+        
+        Args:
+            chat_id: The chat/channel ID to send to
+            content: Message content (may be markdown)
+            reply_to: Optional message ID to reply to
+            metadata: Additional platform-specific options
+        
+        Returns:
+            SendResult with success status and message ID
+        """
+        pass
+    
+    async def send_typing(self, chat_id: str) -> None:
+        """
+        Send a typing indicator.
+        
+        Override in subclasses if the platform supports it.
+        """
+        pass
+    
+    async def send_image(
+        self,
+        chat_id: str,
+        image_url: str,
+        caption: Optional[str] = None,
+        reply_to: Optional[str] = None,
+    ) -> SendResult:
+        """
+        Send an image natively via the platform API.
+        
+        Override in subclasses to send images as proper attachments
+        instead of plain-text URLs. Default falls back to sending the
+        URL as a text message.
+        """
+        # Fallback: send URL as text (subclasses override for native images)
+        text = f"{caption}\n{image_url}" if caption else image_url
+        return await self.send(chat_id=chat_id, content=text, reply_to=reply_to)
+    
+    @staticmethod
+    def extract_images(content: str) -> Tuple[List[Tuple[str, str]], str]:
+        """
+        Extract image URLs from markdown and HTML image tags in a response.
+        
+        Finds patterns like:
+        - ![alt text](https://example.com/image.png)
+        - <img src="https://example.com/image.png">
+        - <img src="https://example.com/image.png"></img>
+        
+        Args:
+            content: The response text to scan.
+        
+        Returns:
+            Tuple of (list of (url, alt_text) pairs, cleaned content with image tags removed).
+        """
+        images = []
+        cleaned = content
+        
+        # Match markdown images: ![alt](url)
+        md_pattern = r'!\[([^\]]*)\]\((https?://[^\s\)]+)\)'
+        for match in re.finditer(md_pattern, content):
+            alt_text = match.group(1)
+            url = match.group(2)
+            # Only extract URLs that look like actual images
+            if any(url.lower().endswith(ext) or ext in url.lower() for ext in
+                   ['.png', '.jpg', '.jpeg', '.gif', '.webp', 'fal.media', 'fal-cdn', 'replicate.delivery']):
+                images.append((url, alt_text))
+        
+        # Match HTML img tags: <img src="url"> or <img src="url"></img> or <img src="url"/>
+        html_pattern = r'<img\s+src=["\']?(https?://[^\s"\'<>]+)["\']?\s*/?>\s*(?:</img>)?'
+        for match in re.finditer(html_pattern, content):
+            url = match.group(1)
+            images.append((url, ""))
+        
+        # Remove matched image tags from content if we found images
+        if images:
+            cleaned = re.sub(md_pattern, '', cleaned)
+            cleaned = re.sub(html_pattern, '', cleaned)
+            # Clean up leftover blank lines
+            cleaned = re.sub(r'\n{3,}', '\n\n', cleaned).strip()
+        
+        return images, cleaned
+    
+    async def send_voice(
+        self,
+        chat_id: str,
+        audio_path: str,
+        caption: Optional[str] = None,
+        reply_to: Optional[str] = None,
+    ) -> SendResult:
+        """
+        Send an audio file as a native voice message via the platform API.
+        
+        Override in subclasses to send audio as voice bubbles (Telegram)
+        or file attachments (Discord). Default falls back to sending the
+        file path as text.
+        """
+        text = f"🔊 Audio: {audio_path}"
+        if caption:
+            text = f"{caption}\n{text}"
+        return await self.send(chat_id=chat_id, content=text, reply_to=reply_to)
+    
+    @staticmethod
+    def extract_media(content: str) -> Tuple[List[Tuple[str, bool]], str]:
+        """
+        Extract MEDIA:<path> tags and [[audio_as_voice]] directives from response text.
+        
+        The TTS tool returns responses like:
+            [[audio_as_voice]]
+            MEDIA:/path/to/audio.ogg
+        
+        Args:
+            content: The response text to scan.
+        
+        Returns:
+            Tuple of (list of (path, is_voice) pairs, cleaned content with tags removed).
+        """
+        media = []
+        cleaned = content
+        
+        # Check for [[audio_as_voice]] directive
+        has_voice_tag = "[[audio_as_voice]]" in content
+        cleaned = cleaned.replace("[[audio_as_voice]]", "")
+        
+        # Extract MEDIA:<path> tags (path may contain spaces)
+        media_pattern = r'MEDIA:(\S+)'
+        for match in re.finditer(media_pattern, content):
+            path = match.group(1).strip()
+            if path:
+                media.append((path, has_voice_tag))
+        
+        # Remove MEDIA tags from content
+        if media:
+            cleaned = re.sub(media_pattern, '', cleaned)
+            cleaned = re.sub(r'\n{3,}', '\n\n', cleaned).strip()
+        
+        return media, cleaned
+    
+    async def _keep_typing(self, chat_id: str, interval: float = 2.0) -> None:
+        """
+        Continuously send typing indicator until cancelled.
+        
+        Telegram/Discord typing status expires after ~5 seconds, so we refresh every 2
+        to recover quickly after progress messages interrupt it.
+        """
+        try:
+            while True:
+                await self.send_typing(chat_id)
+                await asyncio.sleep(interval)
+        except asyncio.CancelledError:
+            pass  # Normal cancellation when handler completes
+    
+    async def handle_message(self, event: MessageEvent) -> None:
+        """
+        Process an incoming message.
+        
+        This method returns quickly by spawning background tasks.
+        This allows new messages to be processed even while an agent is running,
+        enabling interruption support.
+        """
+        if not self._message_handler:
+            return
+        
+        session_key = event.source.chat_id
+        
+        # Check if there's already an active handler for this session
+        if session_key in self._active_sessions:
+            # Store this as a pending message - it will interrupt the running agent
+            print(f"[{self.name}] ⚡ New message while session {session_key} is active - triggering interrupt")
+            self._pending_messages[session_key] = event
+            # Signal the interrupt (the processing task checks this)
+            self._active_sessions[session_key].set()
+            return  # Don't process now - will be handled after current task finishes
+        
+        # Spawn background task to process this message
+        asyncio.create_task(self._process_message_background(event, session_key))
+    
+    @staticmethod
+    def _get_human_delay() -> float:
+        """
+        Return a random delay in seconds for human-like response pacing.
+
+        Reads from env vars:
+          HERMES_HUMAN_DELAY_MODE: "off" (default) | "natural" | "custom"
+          HERMES_HUMAN_DELAY_MIN_MS: minimum delay in ms (default 800, custom mode)
+          HERMES_HUMAN_DELAY_MAX_MS: maximum delay in ms (default 2500, custom mode)
+        """
+        import random
+
+        mode = os.getenv("HERMES_HUMAN_DELAY_MODE", "off").lower()
+        if mode == "off":
+            return 0.0
+        min_ms = int(os.getenv("HERMES_HUMAN_DELAY_MIN_MS", "800"))
+        max_ms = int(os.getenv("HERMES_HUMAN_DELAY_MAX_MS", "2500"))
+        if mode == "natural":
+            min_ms, max_ms = 800, 2500
+        return random.uniform(min_ms / 1000.0, max_ms / 1000.0)
+
+    async def _process_message_background(self, event: MessageEvent, session_key: str) -> None:
+        """Background task that actually processes the message."""
+        # Create interrupt event for this session
+        interrupt_event = asyncio.Event()
+        self._active_sessions[session_key] = interrupt_event
+        
+        # Start continuous typing indicator (refreshes every 2 seconds)
+        typing_task = asyncio.create_task(self._keep_typing(event.source.chat_id))
+        
+        try:
+            # Call the handler (this can take a while with tool calls)
+            response = await self._message_handler(event)
+            
+            # Send response if any
+            if response:
+                # Extract MEDIA:<path> tags (from TTS tool) before other processing
+                media_files, response = self.extract_media(response)
+                
+                # Extract image URLs and send them as native platform attachments
+                images, text_content = self.extract_images(response)
+                
+                # Send the text portion first (if any remains after extractions)
+                if text_content:
+                    result = await self.send(
+                        chat_id=event.source.chat_id,
+                        content=text_content,
+                        reply_to=event.message_id
+                    )
+                    
+                    # Log send failures (don't raise - user already saw tool progress)
+                    if not result.success:
+                        print(f"[{self.name}] Failed to send response: {result.error}")
+                        # Try sending without markdown as fallback
+                        fallback_result = await self.send(
+                            chat_id=event.source.chat_id,
+                            content=f"(Response formatting failed, plain text:)\n\n{text_content[:3500]}",
+                            reply_to=event.message_id
+                        )
+                        if not fallback_result.success:
+                            print(f"[{self.name}] Fallback send also failed: {fallback_result.error}")
+                
+                # Human-like pacing delay between text and media
+                human_delay = self._get_human_delay()
+                
+                # Send extracted images as native attachments
+                for image_url, alt_text in images:
+                    if human_delay > 0:
+                        await asyncio.sleep(human_delay)
+                    try:
+                        img_result = await self.send_image(
+                            chat_id=event.source.chat_id,
+                            image_url=image_url,
+                            caption=alt_text if alt_text else None,
+                        )
+                        if not img_result.success:
+                            print(f"[{self.name}] Failed to send image: {img_result.error}")
+                    except Exception as img_err:
+                        print(f"[{self.name}] Error sending image: {img_err}")
+                
+                # Send extracted audio/voice files as native attachments
+                for audio_path, is_voice in media_files:
+                    if human_delay > 0:
+                        await asyncio.sleep(human_delay)
+                    try:
+                        voice_result = await self.send_voice(
+                            chat_id=event.source.chat_id,
+                            audio_path=audio_path,
+                        )
+                        if not voice_result.success:
+                            print(f"[{self.name}] Failed to send voice: {voice_result.error}")
+                    except Exception as voice_err:
+                        print(f"[{self.name}] Error sending voice: {voice_err}")
+            
+            # Check if there's a pending message that was queued during our processing
+            if session_key in self._pending_messages:
+                pending_event = self._pending_messages.pop(session_key)
+                print(f"[{self.name}] 📨 Processing queued message from interrupt")
+                # Clean up current session before processing pending
+                if session_key in self._active_sessions:
+                    del self._active_sessions[session_key]
+                typing_task.cancel()
+                try:
+                    await typing_task
+                except asyncio.CancelledError:
+                    pass
+                # Process pending message in new background task
+                await self._process_message_background(pending_event, session_key)
+                return  # Already cleaned up
+                
+        except Exception as e:
+            print(f"[{self.name}] Error handling message: {e}")
+            import traceback
+            traceback.print_exc()
+        finally:
+            # Stop typing indicator
+            typing_task.cancel()
+            try:
+                await typing_task
+            except asyncio.CancelledError:
+                pass
+            # Clean up session tracking
+            if session_key in self._active_sessions:
+                del self._active_sessions[session_key]
+    
+    def has_pending_interrupt(self, session_key: str) -> bool:
+        """Check if there's a pending interrupt for a session."""
+        return session_key in self._active_sessions and self._active_sessions[session_key].is_set()
+    
+    def get_pending_message(self, session_key: str) -> Optional[MessageEvent]:
+        """Get and clear any pending message for a session."""
+        return self._pending_messages.pop(session_key, None)
+    
+    def build_source(
+        self,
+        chat_id: str,
+        chat_name: Optional[str] = None,
+        chat_type: str = "dm",
+        user_id: Optional[str] = None,
+        user_name: Optional[str] = None,
+        thread_id: Optional[str] = None
+    ) -> SessionSource:
+        """Helper to build a SessionSource for this platform."""
+        return SessionSource(
+            platform=self.platform,
+            chat_id=str(chat_id),
+            chat_name=chat_name,
+            chat_type=chat_type,
+            user_id=str(user_id) if user_id else None,
+            user_name=user_name,
+            thread_id=str(thread_id) if thread_id else None,
+        )
+    
+    @abstractmethod
+    async def get_chat_info(self, chat_id: str) -> Dict[str, Any]:
+        """
+        Get information about a chat/channel.
+        
+        Returns dict with at least:
+        - name: Chat name
+        - type: "dm", "group", "channel"
+        """
+        pass
+    
+    def format_message(self, content: str) -> str:
+        """
+        Format a message for this platform.
+        
+        Override in subclasses to handle platform-specific formatting
+        (e.g., Telegram MarkdownV2, Discord markdown).
+        
+        Default implementation returns content as-is.
+        """
+        return content
+    
+    def truncate_message(self, content: str, max_length: int = 4096) -> List[str]:
+        """
+        Split a long message into chunks.
+        
+        Args:
+            content: The full message content
+            max_length: Maximum length per chunk (platform-specific)
+        
+        Returns:
+            List of message chunks
+        """
+        if len(content) <= max_length:
+            return [content]
+        
+        chunks = []
+        while content:
+            if len(content) <= max_length:
+                chunks.append(content)
+                break
+            
+            # Try to split at a newline
+            split_idx = content.rfind("\n", 0, max_length)
+            if split_idx == -1:
+                # No newline, split at space
+                split_idx = content.rfind(" ", 0, max_length)
+            if split_idx == -1:
+                # No space either, hard split
+                split_idx = max_length
+            
+            chunks.append(content[:split_idx])
+            content = content[split_idx:].lstrip()
+        
+        return chunks
--- a/gateway/platforms/discord.py
+++ b/gateway/platforms/discord.py
@@ -0,0 +1,679 @@
+"""
+Discord platform adapter.
+
+Uses discord.py library for:
+- Receiving messages from servers and DMs
+- Sending responses back
+- Handling threads and channels
+"""
+
+import asyncio
+import os
+from typing import Dict, List, Optional, Any
+
+try:
+    import discord
+    from discord import Message as DiscordMessage, Intents
+    from discord.ext import commands
+    DISCORD_AVAILABLE = True
+except ImportError:
+    DISCORD_AVAILABLE = False
+    discord = None
+    DiscordMessage = Any
+    Intents = Any
+    commands = None
+
+import sys
+sys.path.insert(0, str(__file__).rsplit("/", 3)[0])
+
+from gateway.config import Platform, PlatformConfig
+from gateway.platforms.base import (
+    BasePlatformAdapter,
+    MessageEvent,
+    MessageType,
+    SendResult,
+    cache_image_from_url,
+    cache_audio_from_url,
+)
+
+
+def check_discord_requirements() -> bool:
+    """Check if Discord dependencies are available."""
+    return DISCORD_AVAILABLE
+
+
+class DiscordAdapter(BasePlatformAdapter):
+    """
+    Discord bot adapter.
+    
+    Handles:
+    - Receiving messages from servers and DMs
+    - Sending responses with Discord markdown
+    - Thread support
+    - Native slash commands (/ask, /reset, /status, /stop)
+    - Button-based exec approvals
+    - Auto-threading for long conversations
+    - Reaction-based feedback
+    """
+    
+    # Discord message limits
+    MAX_MESSAGE_LENGTH = 2000
+    
+    def __init__(self, config: PlatformConfig):
+        super().__init__(config, Platform.DISCORD)
+        self._client: Optional[commands.Bot] = None
+        self._ready_event = asyncio.Event()
+        self._allowed_user_ids: set = set()  # For button approval authorization
+    
+    async def connect(self) -> bool:
+        """Connect to Discord and start receiving events."""
+        if not DISCORD_AVAILABLE:
+            print(f"[{self.name}] discord.py not installed. Run: pip install discord.py")
+            return False
+        
+        if not self.config.token:
+            print(f"[{self.name}] No bot token configured")
+            return False
+        
+        try:
+            # Set up intents
+            intents = Intents.default()
+            intents.message_content = True
+            intents.dm_messages = True
+            intents.guild_messages = True
+            
+            # Create bot
+            self._client = commands.Bot(
+                command_prefix="!",  # Not really used, we handle raw messages
+                intents=intents,
+            )
+            
+            # Parse allowed user IDs for button authorization
+            allowed_env = os.getenv("DISCORD_ALLOWED_USERS", "")
+            if allowed_env:
+                self._allowed_user_ids = {
+                    uid.strip() for uid in allowed_env.split(",") if uid.strip()
+                }
+            
+            # Register event handlers
+            @self._client.event
+            async def on_ready():
+                print(f"[{self.name}] Connected as {self._client.user}")
+                # Sync slash commands with Discord
+                try:
+                    synced = await self._client.tree.sync()
+                    print(f"[{self.name}] Synced {len(synced)} slash command(s)")
+                except Exception as e:
+                    print(f"[{self.name}] Slash command sync failed: {e}")
+                self._ready_event.set()
+            
+            @self._client.event
+            async def on_message(message: DiscordMessage):
+                # Ignore bot's own messages
+                if message.author == self._client.user:
+                    return
+                await self._handle_message(message)
+            
+            # Register slash commands
+            self._register_slash_commands()
+            
+            # Start the bot in background
+            asyncio.create_task(self._client.start(self.config.token))
+            
+            # Wait for ready
+            await asyncio.wait_for(self._ready_event.wait(), timeout=30)
+            
+            self._running = True
+            return True
+            
+        except asyncio.TimeoutError:
+            print(f"[{self.name}] Timeout waiting for connection")
+            return False
+        except Exception as e:
+            print(f"[{self.name}] Failed to connect: {e}")
+            return False
+    
+    async def disconnect(self) -> None:
+        """Disconnect from Discord."""
+        if self._client:
+            try:
+                await self._client.close()
+            except Exception as e:
+                print(f"[{self.name}] Error during disconnect: {e}")
+        
+        self._running = False
+        self._client = None
+        self._ready_event.clear()
+        print(f"[{self.name}] Disconnected")
+    
+    async def send(
+        self,
+        chat_id: str,
+        content: str,
+        reply_to: Optional[str] = None,
+        metadata: Optional[Dict[str, Any]] = None
+    ) -> SendResult:
+        """Send a message to a Discord channel."""
+        if not self._client:
+            return SendResult(success=False, error="Not connected")
+        
+        try:
+            # Get the channel
+            channel = self._client.get_channel(int(chat_id))
+            if not channel:
+                channel = await self._client.fetch_channel(int(chat_id))
+            
+            if not channel:
+                return SendResult(success=False, error=f"Channel {chat_id} not found")
+            
+            # Format and split message if needed
+            formatted = self.format_message(content)
+            chunks = self.truncate_message(formatted, self.MAX_MESSAGE_LENGTH)
+            
+            message_ids = []
+            reference = None
+            
+            if reply_to:
+                try:
+                    ref_msg = await channel.fetch_message(int(reply_to))
+                    reference = ref_msg
+                except Exception:
+                    pass  # Ignore if we can't find the referenced message
+            
+            for i, chunk in enumerate(chunks):
+                msg = await channel.send(
+                    content=chunk,
+                    reference=reference if i == 0 else None,
+                )
+                message_ids.append(str(msg.id))
+            
+            return SendResult(
+                success=True,
+                message_id=message_ids[0] if message_ids else None,
+                raw_response={"message_ids": message_ids}
+            )
+            
+        except Exception as e:
+            return SendResult(success=False, error=str(e))
+    
+    async def send_voice(
+        self,
+        chat_id: str,
+        audio_path: str,
+        caption: Optional[str] = None,
+        reply_to: Optional[str] = None,
+    ) -> SendResult:
+        """Send audio as a Discord file attachment."""
+        if not self._client:
+            return SendResult(success=False, error="Not connected")
+        
+        try:
+            import io
+            
+            channel = self._client.get_channel(int(chat_id))
+            if not channel:
+                channel = await self._client.fetch_channel(int(chat_id))
+            if not channel:
+                return SendResult(success=False, error=f"Channel {chat_id} not found")
+            
+            if not os.path.exists(audio_path):
+                return SendResult(success=False, error=f"Audio file not found: {audio_path}")
+            
+            # Determine filename from path
+            filename = os.path.basename(audio_path)
+            
+            with open(audio_path, "rb") as f:
+                file = discord.File(io.BytesIO(f.read()), filename=filename)
+                msg = await channel.send(
+                    content=caption if caption else None,
+                    file=file,
+                )
+                return SendResult(success=True, message_id=str(msg.id))
+        
+        except Exception as e:
+            print(f"[{self.name}] Failed to send audio: {e}")
+            return await super().send_voice(chat_id, audio_path, caption, reply_to)
+    
+    async def send_image(
+        self,
+        chat_id: str,
+        image_url: str,
+        caption: Optional[str] = None,
+        reply_to: Optional[str] = None,
+    ) -> SendResult:
+        """Send an image natively as a Discord file attachment."""
+        if not self._client:
+            return SendResult(success=False, error="Not connected")
+        
+        try:
+            import aiohttp
+            
+            channel = self._client.get_channel(int(chat_id))
+            if not channel:
+                channel = await self._client.fetch_channel(int(chat_id))
+            if not channel:
+                return SendResult(success=False, error=f"Channel {chat_id} not found")
+            
+            # Download the image and send as a Discord file attachment
+            # (Discord renders attachments inline, unlike plain URLs)
+            async with aiohttp.ClientSession() as session:
+                async with session.get(image_url, timeout=aiohttp.ClientTimeout(total=30)) as resp:
+                    if resp.status != 200:
+                        raise Exception(f"Failed to download image: HTTP {resp.status}")
+                    
+                    image_data = await resp.read()
+                    
+                    # Determine filename from URL or content type
+                    content_type = resp.headers.get("content-type", "image/png")
+                    ext = "png"
+                    if "jpeg" in content_type or "jpg" in content_type:
+                        ext = "jpg"
+                    elif "gif" in content_type:
+                        ext = "gif"
+                    elif "webp" in content_type:
+                        ext = "webp"
+                    
+                    import io
+                    file = discord.File(io.BytesIO(image_data), filename=f"image.{ext}")
+                    
+                    msg = await channel.send(
+                        content=caption if caption else None,
+                        file=file,
+                    )
+                    return SendResult(success=True, message_id=str(msg.id))
+        
+        except ImportError:
+            print(f"[{self.name}] aiohttp not installed, falling back to URL. Run: pip install aiohttp")
+            return await super().send_image(chat_id, image_url, caption, reply_to)
+        except Exception as e:
+            print(f"[{self.name}] Failed to send image attachment, falling back to URL: {e}")
+            return await super().send_image(chat_id, image_url, caption, reply_to)
+    
+    async def send_typing(self, chat_id: str) -> None:
+        """Send typing indicator."""
+        if self._client:
+            try:
+                channel = self._client.get_channel(int(chat_id))
+                if channel:
+                    await channel.typing()
+            except Exception:
+                pass  # Ignore typing indicator failures
+    
+    async def get_chat_info(self, chat_id: str) -> Dict[str, Any]:
+        """Get information about a Discord channel."""
+        if not self._client:
+            return {"name": "Unknown", "type": "dm"}
+        
+        try:
+            channel = self._client.get_channel(int(chat_id))
+            if not channel:
+                channel = await self._client.fetch_channel(int(chat_id))
+            
+            if not channel:
+                return {"name": str(chat_id), "type": "dm"}
+            
+            # Determine channel type
+            if isinstance(channel, discord.DMChannel):
+                chat_type = "dm"
+                name = channel.recipient.name if channel.recipient else str(chat_id)
+            elif isinstance(channel, discord.Thread):
+                chat_type = "thread"
+                name = channel.name
+            elif isinstance(channel, discord.TextChannel):
+                chat_type = "channel"
+                name = f"#{channel.name}"
+                if channel.guild:
+                    name = f"{channel.guild.name} / {name}"
+            else:
+                chat_type = "channel"
+                name = getattr(channel, "name", str(chat_id))
+            
+            return {
+                "name": name,
+                "type": chat_type,
+                "guild_id": str(channel.guild.id) if hasattr(channel, "guild") and channel.guild else None,
+                "guild_name": channel.guild.name if hasattr(channel, "guild") and channel.guild else None,
+            }
+        except Exception as e:
+            return {"name": str(chat_id), "type": "dm", "error": str(e)}
+    
+    def format_message(self, content: str) -> str:
+        """
+        Format message for Discord.
+        
+        Discord uses its own markdown variant.
+        """
+        # Discord markdown is fairly standard, no special escaping needed
+        return content
+    
+    def _register_slash_commands(self) -> None:
+        """Register Discord slash commands on the command tree."""
+        if not self._client:
+            return
+
+        tree = self._client.tree
+
+        @tree.command(name="ask", description="Ask Hermes a question")
+        @discord.app_commands.describe(question="Your question for Hermes")
+        async def slash_ask(interaction: discord.Interaction, question: str):
+            await interaction.response.defer()
+            event = self._build_slash_event(interaction, question)
+            await self.handle_message(event)
+            # The response is sent via the normal send() flow
+            # Send a followup to close the interaction if needed
+            try:
+                await interaction.followup.send("Processing complete~", ephemeral=True)
+            except Exception:
+                pass
+
+        @tree.command(name="reset", description="Reset your Hermes session")
+        async def slash_reset(interaction: discord.Interaction):
+            await interaction.response.defer(ephemeral=True)
+            event = self._build_slash_event(interaction, "/reset")
+            await self.handle_message(event)
+            try:
+                await interaction.followup.send("Session reset~", ephemeral=True)
+            except Exception:
+                pass
+
+        @tree.command(name="status", description="Show Hermes session status")
+        async def slash_status(interaction: discord.Interaction):
+            await interaction.response.defer(ephemeral=True)
+            event = self._build_slash_event(interaction, "/status")
+            await self.handle_message(event)
+            try:
+                await interaction.followup.send("Status sent~", ephemeral=True)
+            except Exception:
+                pass
+
+        @tree.command(name="stop", description="Stop the running Hermes agent")
+        async def slash_stop(interaction: discord.Interaction):
+            await interaction.response.defer(ephemeral=True)
+            event = self._build_slash_event(interaction, "/stop")
+            await self.handle_message(event)
+            try:
+                await interaction.followup.send("Stop requested~", ephemeral=True)
+            except Exception:
+                pass
+
+    def _build_slash_event(self, interaction: discord.Interaction, text: str) -> MessageEvent:
+        """Build a MessageEvent from a Discord slash command interaction."""
+        is_dm = isinstance(interaction.channel, discord.DMChannel)
+        chat_type = "dm" if is_dm else "group"
+        chat_name = ""
+        if not is_dm and hasattr(interaction.channel, "name"):
+            chat_name = interaction.channel.name
+            if hasattr(interaction.channel, "guild") and interaction.channel.guild:
+                chat_name = f"{interaction.channel.guild.name} / #{chat_name}"
+
+        source = self.build_source(
+            chat_id=str(interaction.channel_id),
+            chat_name=chat_name,
+            chat_type=chat_type,
+            user_id=str(interaction.user.id),
+            user_name=interaction.user.display_name,
+        )
+
+        msg_type = MessageType.COMMAND if text.startswith("/") else MessageType.TEXT
+        return MessageEvent(
+            text=text,
+            message_type=msg_type,
+            source=source,
+            raw_message=interaction,
+        )
+
+    async def send_exec_approval(
+        self, chat_id: str, command: str, approval_id: str
+    ) -> SendResult:
+        """
+        Send a button-based exec approval prompt for a dangerous command.
+
+        Returns SendResult. The approval is resolved when a user clicks a button.
+        """
+        if not self._client or not DISCORD_AVAILABLE:
+            return SendResult(success=False, error="Not connected")
+
+        try:
+            channel = self._client.get_channel(int(chat_id))
+            if not channel:
+                channel = await self._client.fetch_channel(int(chat_id))
+
+            embed = discord.Embed(
+                title="Command Approval Required",
+                description=f"```\n{command[:500]}\n```",
+                color=discord.Color.orange(),
+            )
+            embed.set_footer(text=f"Approval ID: {approval_id}")
+
+            view = ExecApprovalView(
+                approval_id=approval_id,
+                allowed_user_ids=self._allowed_user_ids,
+            )
+
+            msg = await channel.send(embed=embed, view=view)
+            return SendResult(success=True, message_id=str(msg.id))
+
+        except Exception as e:
+            return SendResult(success=False, error=str(e))
+
+    async def _handle_message(self, message: DiscordMessage) -> None:
+        """Handle incoming Discord messages."""
+        # In server channels (not DMs), require the bot to be @mentioned
+        # UNLESS the channel is in the free-response list.
+        #
+        # Config:
+        #   DISCORD_FREE_RESPONSE_CHANNELS: Comma-separated channel IDs where the
+        #       bot responds to every message without needing a mention.
+        #   DISCORD_REQUIRE_MENTION: Set to "false" to disable mention requirement
+        #       globally (all channels become free-response). Default: "true".
+        
+        if not isinstance(message.channel, discord.DMChannel):
+            # Check if this channel is in the free-response list
+            free_channels_raw = os.getenv("DISCORD_FREE_RESPONSE_CHANNELS", "")
+            free_channels = {ch.strip() for ch in free_channels_raw.split(",") if ch.strip()}
+            channel_id = str(message.channel.id)
+            
+            # Global override: if DISCORD_REQUIRE_MENTION=false, all channels are free
+            require_mention = os.getenv("DISCORD_REQUIRE_MENTION", "true").lower() not in ("false", "0", "no")
+            
+            is_free_channel = channel_id in free_channels
+            
+            if require_mention and not is_free_channel:
+                # Must be @mentioned to respond
+                if self._client.user not in message.mentions:
+                    return  # Silently ignore messages that don't mention the bot
+            
+            # Strip the bot mention from the message text so the agent sees clean input
+            if self._client.user and self._client.user in message.mentions:
+                message.content = message.content.replace(f"<@{self._client.user.id}>", "").strip()
+                message.content = message.content.replace(f"<@!{self._client.user.id}>", "").strip()
+        
+        # Determine message type
+        msg_type = MessageType.TEXT
+        if message.content.startswith("/"):
+            msg_type = MessageType.COMMAND
+        elif message.attachments:
+            # Check attachment types
+            for att in message.attachments:
+                if att.content_type:
+                    if att.content_type.startswith("image/"):
+                        msg_type = MessageType.PHOTO
+                    elif att.content_type.startswith("video/"):
+                        msg_type = MessageType.VIDEO
+                    elif att.content_type.startswith("audio/"):
+                        msg_type = MessageType.AUDIO
+                    else:
+                        msg_type = MessageType.DOCUMENT
+                    break
+        
+        # Determine chat type
+        if isinstance(message.channel, discord.DMChannel):
+            chat_type = "dm"
+            chat_name = message.author.name
+        elif isinstance(message.channel, discord.Thread):
+            chat_type = "thread"
+            chat_name = message.channel.name
+        else:
+            chat_type = "group"  # Treat server channels as groups
+            chat_name = getattr(message.channel, "name", str(message.channel.id))
+            if hasattr(message.channel, "guild") and message.channel.guild:
+                chat_name = f"{message.channel.guild.name} / #{chat_name}"
+        
+        # Get thread ID if in a thread
+        thread_id = None
+        if isinstance(message.channel, discord.Thread):
+            thread_id = str(message.channel.id)
+        
+        # Build source
+        source = self.build_source(
+            chat_id=str(message.channel.id),
+            chat_name=chat_name,
+            chat_type=chat_type,
+            user_id=str(message.author.id),
+            user_name=message.author.display_name,
+            thread_id=thread_id,
+        )
+        
+        # Build media URLs -- download image attachments to local cache so the
+        # vision tool can access them reliably (Discord CDN URLs can expire).
+        media_urls = []
+        media_types = []
+        for att in message.attachments:
+            content_type = att.content_type or "unknown"
+            if content_type.startswith("image/"):
+                try:
+                    # Determine extension from content type (image/png -> .png)
+                    ext = "." + content_type.split("/")[-1].split(";")[0]
+                    if ext not in (".jpg", ".jpeg", ".png", ".gif", ".webp"):
+                        ext = ".jpg"
+                    cached_path = await cache_image_from_url(att.url, ext=ext)
+                    media_urls.append(cached_path)
+                    media_types.append(content_type)
+                    print(f"[Discord] Cached user image: {cached_path}", flush=True)
+                except Exception as e:
+                    print(f"[Discord] Failed to cache image attachment: {e}", flush=True)
+                    # Fall back to the CDN URL if caching fails
+                    media_urls.append(att.url)
+                    media_types.append(content_type)
+            elif content_type.startswith("audio/"):
+                try:
+                    ext = "." + content_type.split("/")[-1].split(";")[0]
+                    if ext not in (".ogg", ".mp3", ".wav", ".webm", ".m4a"):
+                        ext = ".ogg"
+                    cached_path = await cache_audio_from_url(att.url, ext=ext)
+                    media_urls.append(cached_path)
+                    media_types.append(content_type)
+                    print(f"[Discord] Cached user audio: {cached_path}", flush=True)
+                except Exception as e:
+                    print(f"[Discord] Failed to cache audio attachment: {e}", flush=True)
+                    media_urls.append(att.url)
+                    media_types.append(content_type)
+            else:
+                # Other attachments: keep the original URL
+                media_urls.append(att.url)
+                media_types.append(content_type)
+        
+        event = MessageEvent(
+            text=message.content,
+            message_type=msg_type,
+            source=source,
+            raw_message=message,
+            message_id=str(message.id),
+            media_urls=media_urls,
+            media_types=media_types,
+            reply_to_message_id=str(message.reference.message_id) if message.reference else None,
+            timestamp=message.created_at,
+        )
+        
+        await self.handle_message(event)
+
+
+# ---------------------------------------------------------------------------
+# Discord UI Components (outside the adapter class)
+# ---------------------------------------------------------------------------
+
+if DISCORD_AVAILABLE:
+
+    class ExecApprovalView(discord.ui.View):
+        """
+        Interactive button view for exec approval of dangerous commands.
+
+        Shows three buttons: Allow Once (green), Always Allow (blue), Deny (red).
+        Only users in the allowed list can click. The view times out after 5 minutes.
+        """
+
+        def __init__(self, approval_id: str, allowed_user_ids: set):
+            super().__init__(timeout=300)  # 5-minute timeout
+            self.approval_id = approval_id
+            self.allowed_user_ids = allowed_user_ids
+            self.resolved = False
+
+        def _check_auth(self, interaction: discord.Interaction) -> bool:
+            """Verify the user clicking is authorized."""
+            if not self.allowed_user_ids:
+                return True  # No allowlist = anyone can approve
+            return str(interaction.user.id) in self.allowed_user_ids
+
+        async def _resolve(
+            self, interaction: discord.Interaction, action: str, color: discord.Color
+        ):
+            """Resolve the approval and update the message."""
+            if self.resolved:
+                await interaction.response.send_message(
+                    "This approval has already been resolved~", ephemeral=True
+                )
+                return
+
+            if not self._check_auth(interaction):
+                await interaction.response.send_message(
+                    "You're not authorized to approve commands~", ephemeral=True
+                )
+                return
+
+            self.resolved = True
+
+            # Update the embed with the decision
+            embed = interaction.message.embeds[0] if interaction.message.embeds else None
+            if embed:
+                embed.color = color
+                embed.set_footer(text=f"{action} by {interaction.user.display_name}")
+
+            # Disable all buttons
+            for child in self.children:
+                child.disabled = True
+
+            await interaction.response.edit_message(embed=embed, view=self)
+
+            # Store the approval decision for the gateway to pick up
+            try:
+                from tools.terminal_tool import _session_approved_patterns
+                if action == "allow_once":
+                    pass  # One-time approval handled by gateway
+                elif action == "allow_always":
+                    _session_approved_patterns.add(self.approval_id)
+            except ImportError:
+                pass
+
+        @discord.ui.button(label="Allow Once", style=discord.ButtonStyle.green)
+        async def allow_once(
+            self, interaction: discord.Interaction, button: discord.ui.Button
+        ):
+            await self._resolve(interaction, "allow_once", discord.Color.green())
+
+        @discord.ui.button(label="Always Allow", style=discord.ButtonStyle.blurple)
+        async def allow_always(
+            self, interaction: discord.Interaction, button: discord.ui.Button
+        ):
+            await self._resolve(interaction, "allow_always", discord.Color.blue())
+
+        @discord.ui.button(label="Deny", style=discord.ButtonStyle.red)
+        async def deny(
+            self, interaction: discord.Interaction, button: discord.ui.Button
+        ):
+            await self._resolve(interaction, "deny", discord.Color.red())
+
+        async def on_timeout(self):
+            """Handle view timeout -- disable buttons and mark as expired."""
+            self.resolved = True
+            for child in self.children:
+                child.disabled = True
--- a/gateway/platforms/slack.py
+++ b/gateway/platforms/slack.py
@@ -0,0 +1,374 @@
+"""
+Slack platform adapter.
+
+Uses slack-bolt (Python) with Socket Mode for:
+- Receiving messages from channels and DMs
+- Sending responses back
+- Handling slash commands
+- Thread support
+"""
+
+import asyncio
+import os
+from typing import Dict, List, Optional, Any
+
+try:
+    from slack_bolt.async_app import AsyncApp
+    from slack_bolt.adapter.socket_mode.async_handler import AsyncSocketModeHandler
+    from slack_sdk.web.async_client import AsyncWebClient
+    SLACK_AVAILABLE = True
+except ImportError:
+    SLACK_AVAILABLE = False
+    AsyncApp = Any
+    AsyncSocketModeHandler = Any
+    AsyncWebClient = Any
+
+import sys
+sys.path.insert(0, str(__file__).rsplit("/", 3)[0])
+
+from gateway.config import Platform, PlatformConfig
+from gateway.platforms.base import (
+    BasePlatformAdapter,
+    MessageEvent,
+    MessageType,
+    SendResult,
+    cache_image_from_url,
+    cache_audio_from_url,
+)
+
+
+def check_slack_requirements() -> bool:
+    """Check if Slack dependencies are available."""
+    return SLACK_AVAILABLE
+
+
+class SlackAdapter(BasePlatformAdapter):
+    """
+    Slack bot adapter using Socket Mode.
+
+    Requires two tokens:
+      - SLACK_BOT_TOKEN (xoxb-...) for API calls
+      - SLACK_APP_TOKEN (xapp-...) for Socket Mode connection
+
+    Features:
+      - DMs and channel messages (mention-gated in channels)
+      - Thread support
+      - File/image/audio attachments
+      - Slash commands (/hermes)
+      - Typing indicators (not natively supported by Slack bots)
+    """
+
+    MAX_MESSAGE_LENGTH = 4000  # Slack's limit is higher but mrkdwn can inflate
+
+    def __init__(self, config: PlatformConfig):
+        super().__init__(config, Platform.SLACK)
+        self._app: Optional[AsyncApp] = None
+        self._handler: Optional[AsyncSocketModeHandler] = None
+        self._bot_user_id: Optional[str] = None
+
+    async def connect(self) -> bool:
+        """Connect to Slack via Socket Mode."""
+        if not SLACK_AVAILABLE:
+            print("[Slack] slack-bolt not installed. Run: pip install slack-bolt")
+            return False
+
+        bot_token = self.config.token
+        app_token = os.getenv("SLACK_APP_TOKEN")
+
+        if not bot_token:
+            print("[Slack] SLACK_BOT_TOKEN not set")
+            return False
+        if not app_token:
+            print("[Slack] SLACK_APP_TOKEN not set")
+            return False
+
+        try:
+            self._app = AsyncApp(token=bot_token)
+
+            # Get our own bot user ID for mention detection
+            auth_response = await self._app.client.auth_test()
+            self._bot_user_id = auth_response.get("user_id")
+            bot_name = auth_response.get("user", "unknown")
+
+            # Register message event handler
+            @self._app.event("message")
+            async def handle_message_event(event, say):
+                await self._handle_slack_message(event)
+
+            # Register slash command handler
+            @self._app.command("/hermes")
+            async def handle_hermes_command(ack, command):
+                await ack()
+                await self._handle_slash_command(command)
+
+            # Start Socket Mode handler in background
+            self._handler = AsyncSocketModeHandler(self._app, app_token)
+            asyncio.create_task(self._handler.start_async())
+
+            self._running = True
+            print(f"[Slack] Connected as @{bot_name} (Socket Mode)")
+            return True
+
+        except Exception as e:
+            print(f"[Slack] Connection failed: {e}")
+            return False
+
+    async def disconnect(self) -> None:
+        """Disconnect from Slack."""
+        if self._handler:
+            await self._handler.close_async()
+        self._running = False
+        print("[Slack] Disconnected")
+
+    async def send(
+        self,
+        chat_id: str,
+        content: str,
+        reply_to: Optional[str] = None,
+        metadata: Optional[Dict[str, Any]] = None,
+    ) -> SendResult:
+        """Send a message to a Slack channel or DM."""
+        if not self._app:
+            return SendResult(success=False, error="Not connected")
+
+        try:
+            kwargs = {
+                "channel": chat_id,
+                "text": content,
+            }
+
+            # Reply in thread if thread_ts is available
+            if reply_to:
+                kwargs["thread_ts"] = reply_to
+            elif metadata and metadata.get("thread_ts"):
+                kwargs["thread_ts"] = metadata["thread_ts"]
+
+            result = await self._app.client.chat_postMessage(**kwargs)
+
+            return SendResult(
+                success=True,
+                message_id=result.get("ts"),
+                raw_response=result,
+            )
+
+        except Exception as e:
+            print(f"[Slack] Send error: {e}")
+            return SendResult(success=False, error=str(e))
+
+    async def send_typing(self, chat_id: str) -> None:
+        """Slack doesn't have a direct typing indicator API for bots."""
+        pass
+
+    async def send_image(
+        self,
+        chat_id: str,
+        image_url: str,
+        caption: Optional[str] = None,
+        reply_to: Optional[str] = None,
+    ) -> SendResult:
+        """Send an image to Slack by uploading the URL as a file."""
+        if not self._app:
+            return SendResult(success=False, error="Not connected")
+
+        try:
+            import httpx
+
+            # Download the image first
+            async with httpx.AsyncClient(timeout=30.0, follow_redirects=True) as client:
+                response = await client.get(image_url)
+                response.raise_for_status()
+
+            result = await self._app.client.files_upload_v2(
+                channel=chat_id,
+                content=response.content,
+                filename="image.png",
+                initial_comment=caption or "",
+                thread_ts=reply_to,
+            )
+
+            return SendResult(success=True, raw_response=result)
+
+        except Exception as e:
+            # Fall back to sending the URL as text
+            text = f"{caption}\n{image_url}" if caption else image_url
+            return await self.send(chat_id=chat_id, content=text, reply_to=reply_to)
+
+    async def send_voice(
+        self,
+        chat_id: str,
+        audio_path: str,
+        caption: Optional[str] = None,
+        reply_to: Optional[str] = None,
+    ) -> SendResult:
+        """Send an audio file to Slack."""
+        if not self._app:
+            return SendResult(success=False, error="Not connected")
+
+        try:
+            result = await self._app.client.files_upload_v2(
+                channel=chat_id,
+                file=audio_path,
+                filename=os.path.basename(audio_path),
+                initial_comment=caption or "",
+                thread_ts=reply_to,
+            )
+            return SendResult(success=True, raw_response=result)
+
+        except Exception as e:
+            return SendResult(success=False, error=str(e))
+
+    async def get_chat_info(self, chat_id: str) -> Dict[str, Any]:
+        """Get information about a Slack channel."""
+        if not self._app:
+            return {"name": chat_id, "type": "unknown"}
+
+        try:
+            result = await self._app.client.conversations_info(channel=chat_id)
+            channel = result.get("channel", {})
+            is_dm = channel.get("is_im", False)
+            return {
+                "name": channel.get("name", chat_id),
+                "type": "dm" if is_dm else "group",
+            }
+        except Exception:
+            return {"name": chat_id, "type": "unknown"}
+
+    # ----- Internal handlers -----
+
+    async def _handle_slack_message(self, event: dict) -> None:
+        """Handle an incoming Slack message event."""
+        # Ignore bot messages (including our own)
+        if event.get("bot_id") or event.get("subtype") == "bot_message":
+            return
+
+        # Ignore message edits and deletions
+        subtype = event.get("subtype")
+        if subtype in ("message_changed", "message_deleted"):
+            return
+
+        text = event.get("text", "")
+        user_id = event.get("user", "")
+        channel_id = event.get("channel", "")
+        thread_ts = event.get("thread_ts") or event.get("ts")
+        ts = event.get("ts", "")
+
+        # Determine if this is a DM or channel message
+        channel_type = event.get("channel_type", "")
+        is_dm = channel_type == "im"
+
+        # In channels, only respond if bot is mentioned
+        if not is_dm and self._bot_user_id:
+            if f"<@{self._bot_user_id}>" not in text:
+                return
+            # Strip the bot mention from the text
+            text = text.replace(f"<@{self._bot_user_id}>", "").strip()
+
+        # Determine message type
+        msg_type = MessageType.TEXT
+        if text.startswith("/"):
+            msg_type = MessageType.COMMAND
+
+        # Handle file attachments
+        media_urls = []
+        media_types = []
+        files = event.get("files", [])
+        for f in files:
+            mimetype = f.get("mimetype", "unknown")
+            url = f.get("url_private_download") or f.get("url_private", "")
+            if mimetype.startswith("image/") and url:
+                try:
+                    ext = "." + mimetype.split("/")[-1].split(";")[0]
+                    if ext not in (".jpg", ".jpeg", ".png", ".gif", ".webp"):
+                        ext = ".jpg"
+                    # Slack private URLs require the bot token as auth header
+                    cached = await self._download_slack_file(url, ext)
+                    media_urls.append(cached)
+                    media_types.append(mimetype)
+                    msg_type = MessageType.PHOTO
+                except Exception as e:
+                    print(f"[Slack] Failed to cache image: {e}", flush=True)
+            elif mimetype.startswith("audio/") and url:
+                try:
+                    ext = "." + mimetype.split("/")[-1].split(";")[0]
+                    if ext not in (".ogg", ".mp3", ".wav", ".webm", ".m4a"):
+                        ext = ".ogg"
+                    cached = await self._download_slack_file(url, ext, audio=True)
+                    media_urls.append(cached)
+                    media_types.append(mimetype)
+                    msg_type = MessageType.VOICE
+                except Exception as e:
+                    print(f"[Slack] Failed to cache audio: {e}", flush=True)
+
+        # Build source
+        source = self.build_source(
+            chat_id=channel_id,
+            chat_name=channel_id,  # Will be resolved later if needed
+            chat_type="dm" if is_dm else "group",
+            user_id=user_id,
+            thread_id=thread_ts,
+        )
+
+        msg_event = MessageEvent(
+            text=text,
+            message_type=msg_type,
+            source=source,
+            raw_message=event,
+            message_id=ts,
+            media_urls=media_urls,
+            media_types=media_types,
+            reply_to_message_id=thread_ts if thread_ts != ts else None,
+        )
+
+        await self.handle_message(msg_event)
+
+    async def _handle_slash_command(self, command: dict) -> None:
+        """Handle /hermes slash command."""
+        text = command.get("text", "").strip()
+        user_id = command.get("user_id", "")
+        channel_id = command.get("channel_id", "")
+
+        # Map common slash subcommands to gateway commands
+        if text in ("new", "reset"):
+            text = "/reset"
+        elif text == "status":
+            text = "/status"
+        elif text == "stop":
+            text = "/stop"
+        elif text:
+            pass  # Treat as a regular question
+        else:
+            text = "/help"
+
+        source = self.build_source(
+            chat_id=channel_id,
+            chat_type="dm",  # Slash commands are always in DM-like context
+            user_id=user_id,
+        )
+
+        event = MessageEvent(
+            text=text,
+            message_type=MessageType.COMMAND if text.startswith("/") else MessageType.TEXT,
+            source=source,
+            raw_message=command,
+        )
+
+        await self.handle_message(event)
+
+    async def _download_slack_file(self, url: str, ext: str, audio: bool = False) -> str:
+        """Download a Slack file using the bot token for auth."""
+        import httpx
+
+        bot_token = self.config.token
+        async with httpx.AsyncClient(timeout=30.0, follow_redirects=True) as client:
+            response = await client.get(
+                url,
+                headers={"Authorization": f"Bearer {bot_token}"},
+            )
+            response.raise_for_status()
+
+        if audio:
+            from gateway.platforms.base import cache_audio_from_bytes
+            return cache_audio_from_bytes(response.content, ext)
+        else:
+            from gateway.platforms.base import cache_image_from_bytes
+            return cache_image_from_bytes(response.content, ext)
--- a/gateway/platforms/telegram.py
+++ b/gateway/platforms/telegram.py
@@ -0,0 +1,484 @@
+"""
+Telegram platform adapter.
+
+Uses python-telegram-bot library for:
+- Receiving messages from users/groups
+- Sending responses back
+- Handling media and commands
+"""
+
+import asyncio
+from typing import Dict, List, Optional, Any
+
+try:
+    from telegram import Update, Bot, Message
+    from telegram.ext import (
+        Application,
+        CommandHandler,
+        MessageHandler as TelegramMessageHandler,
+        ContextTypes,
+        filters,
+    )
+    from telegram.constants import ParseMode, ChatType
+    TELEGRAM_AVAILABLE = True
+except ImportError:
+    TELEGRAM_AVAILABLE = False
+    Update = Any
+    Bot = Any
+    Message = Any
+    Application = Any
+    ContextTypes = Any
+
+import sys
+sys.path.insert(0, str(__file__).rsplit("/", 3)[0])
+
+from gateway.config import Platform, PlatformConfig
+from gateway.platforms.base import (
+    BasePlatformAdapter,
+    MessageEvent,
+    MessageType,
+    SendResult,
+    cache_image_from_bytes,
+    cache_audio_from_bytes,
+)
+
+
+def check_telegram_requirements() -> bool:
+    """Check if Telegram dependencies are available."""
+    return TELEGRAM_AVAILABLE
+
+
+class TelegramAdapter(BasePlatformAdapter):
+    """
+    Telegram bot adapter.
+    
+    Handles:
+    - Receiving messages from users and groups
+    - Sending responses with Telegram markdown
+    - Forum topics (thread_id support)
+    - Media messages
+    """
+    
+    # Telegram message limits
+    MAX_MESSAGE_LENGTH = 4096
+    
+    def __init__(self, config: PlatformConfig):
+        super().__init__(config, Platform.TELEGRAM)
+        self._app: Optional[Application] = None
+        self._bot: Optional[Bot] = None
+    
+    async def connect(self) -> bool:
+        """Connect to Telegram and start polling for updates."""
+        if not TELEGRAM_AVAILABLE:
+            print(f"[{self.name}] python-telegram-bot not installed. Run: pip install python-telegram-bot")
+            return False
+        
+        if not self.config.token:
+            print(f"[{self.name}] No bot token configured")
+            return False
+        
+        try:
+            # Build the application
+            self._app = Application.builder().token(self.config.token).build()
+            self._bot = self._app.bot
+            
+            # Register handlers
+            self._app.add_handler(TelegramMessageHandler(
+                filters.TEXT & ~filters.COMMAND,
+                self._handle_text_message
+            ))
+            self._app.add_handler(TelegramMessageHandler(
+                filters.COMMAND,
+                self._handle_command
+            ))
+            self._app.add_handler(TelegramMessageHandler(
+                filters.PHOTO | filters.VIDEO | filters.AUDIO | filters.VOICE | filters.Document.ALL | filters.Sticker.ALL,
+                self._handle_media_message
+            ))
+            
+            # Start polling in background
+            await self._app.initialize()
+            await self._app.start()
+            await self._app.updater.start_polling(allowed_updates=Update.ALL_TYPES)
+            
+            self._running = True
+            print(f"[{self.name}] Connected and polling for updates")
+            return True
+            
+        except Exception as e:
+            print(f"[{self.name}] Failed to connect: {e}")
+            return False
+    
+    async def disconnect(self) -> None:
+        """Stop polling and disconnect."""
+        if self._app:
+            try:
+                await self._app.updater.stop()
+                await self._app.stop()
+                await self._app.shutdown()
+            except Exception as e:
+                print(f"[{self.name}] Error during disconnect: {e}")
+        
+        self._running = False
+        self._app = None
+        self._bot = None
+        print(f"[{self.name}] Disconnected")
+    
+    async def send(
+        self,
+        chat_id: str,
+        content: str,
+        reply_to: Optional[str] = None,
+        metadata: Optional[Dict[str, Any]] = None
+    ) -> SendResult:
+        """Send a message to a Telegram chat."""
+        if not self._bot:
+            return SendResult(success=False, error="Not connected")
+        
+        try:
+            # Format and split message if needed
+            formatted = self.format_message(content)
+            chunks = self.truncate_message(formatted, self.MAX_MESSAGE_LENGTH)
+            
+            message_ids = []
+            thread_id = metadata.get("thread_id") if metadata else None
+            
+            for i, chunk in enumerate(chunks):
+                # Try Markdown first, fall back to plain text if it fails
+                try:
+                    msg = await self._bot.send_message(
+                        chat_id=int(chat_id),
+                        text=chunk,
+                        parse_mode=ParseMode.MARKDOWN,
+                        reply_to_message_id=int(reply_to) if reply_to and i == 0 else None,
+                        message_thread_id=int(thread_id) if thread_id else None,
+                    )
+                except Exception as md_error:
+                    # Markdown parsing failed, try plain text
+                    if "parse" in str(md_error).lower() or "markdown" in str(md_error).lower():
+                        msg = await self._bot.send_message(
+                            chat_id=int(chat_id),
+                            text=chunk,
+                            parse_mode=None,  # Plain text
+                            reply_to_message_id=int(reply_to) if reply_to and i == 0 else None,
+                            message_thread_id=int(thread_id) if thread_id else None,
+                        )
+                    else:
+                        raise  # Re-raise if not a parse error
+                message_ids.append(str(msg.message_id))
+            
+            return SendResult(
+                success=True,
+                message_id=message_ids[0] if message_ids else None,
+                raw_response={"message_ids": message_ids}
+            )
+            
+        except Exception as e:
+            return SendResult(success=False, error=str(e))
+    
+    async def send_voice(
+        self,
+        chat_id: str,
+        audio_path: str,
+        caption: Optional[str] = None,
+        reply_to: Optional[str] = None,
+    ) -> SendResult:
+        """Send audio as a native Telegram voice message or audio file."""
+        if not self._bot:
+            return SendResult(success=False, error="Not connected")
+        
+        try:
+            import os
+            if not os.path.exists(audio_path):
+                return SendResult(success=False, error=f"Audio file not found: {audio_path}")
+            
+            with open(audio_path, "rb") as audio_file:
+                # .ogg files -> send as voice (round playable bubble)
+                if audio_path.endswith(".ogg") or audio_path.endswith(".opus"):
+                    msg = await self._bot.send_voice(
+                        chat_id=int(chat_id),
+                        voice=audio_file,
+                        caption=caption[:1024] if caption else None,
+                        reply_to_message_id=int(reply_to) if reply_to else None,
+                    )
+                else:
+                    # .mp3 and others -> send as audio file
+                    msg = await self._bot.send_audio(
+                        chat_id=int(chat_id),
+                        audio=audio_file,
+                        caption=caption[:1024] if caption else None,
+                        reply_to_message_id=int(reply_to) if reply_to else None,
+                    )
+            return SendResult(success=True, message_id=str(msg.message_id))
+        except Exception as e:
+            print(f"[{self.name}] Failed to send voice/audio: {e}")
+            return await super().send_voice(chat_id, audio_path, caption, reply_to)
+    
+    async def send_image(
+        self,
+        chat_id: str,
+        image_url: str,
+        caption: Optional[str] = None,
+        reply_to: Optional[str] = None,
+    ) -> SendResult:
+        """Send an image natively as a Telegram photo."""
+        if not self._bot:
+            return SendResult(success=False, error="Not connected")
+        
+        try:
+            # Telegram can send photos directly from URLs
+            msg = await self._bot.send_photo(
+                chat_id=int(chat_id),
+                photo=image_url,
+                caption=caption[:1024] if caption else None,  # Telegram caption limit
+                reply_to_message_id=int(reply_to) if reply_to else None,
+            )
+            return SendResult(success=True, message_id=str(msg.message_id))
+        except Exception as e:
+            print(f"[{self.name}] Failed to send photo, falling back to URL: {e}")
+            # Fallback: send as text link
+            return await super().send_image(chat_id, image_url, caption, reply_to)
+    
+    async def send_typing(self, chat_id: str) -> None:
+        """Send typing indicator."""
+        if self._bot:
+            try:
+                await self._bot.send_chat_action(
+                    chat_id=int(chat_id),
+                    action="typing"
+                )
+            except Exception:
+                pass  # Ignore typing indicator failures
+    
+    async def get_chat_info(self, chat_id: str) -> Dict[str, Any]:
+        """Get information about a Telegram chat."""
+        if not self._bot:
+            return {"name": "Unknown", "type": "dm"}
+        
+        try:
+            chat = await self._bot.get_chat(int(chat_id))
+            
+            chat_type = "dm"
+            if chat.type == ChatType.GROUP:
+                chat_type = "group"
+            elif chat.type == ChatType.SUPERGROUP:
+                chat_type = "group"
+                if chat.is_forum:
+                    chat_type = "forum"
+            elif chat.type == ChatType.CHANNEL:
+                chat_type = "channel"
+            
+            return {
+                "name": chat.title or chat.full_name or str(chat_id),
+                "type": chat_type,
+                "username": chat.username,
+                "is_forum": getattr(chat, "is_forum", False),
+            }
+        except Exception as e:
+            return {"name": str(chat_id), "type": "dm", "error": str(e)}
+    
+    def format_message(self, content: str) -> str:
+        """
+        Format message for Telegram.
+        
+        Telegram uses a subset of markdown. We'll use the simpler
+        Markdown mode (not MarkdownV2) for compatibility.
+        """
+        # Basic escaping for Telegram Markdown
+        # In Markdown mode (not V2), only certain characters need escaping
+        return content
+    
+    async def _handle_text_message(self, update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
+        """Handle incoming text messages."""
+        if not update.message or not update.message.text:
+            return
+        
+        event = self._build_message_event(update.message, MessageType.TEXT)
+        await self.handle_message(event)
+    
+    async def _handle_command(self, update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
+        """Handle incoming command messages."""
+        if not update.message or not update.message.text:
+            return
+        
+        event = self._build_message_event(update.message, MessageType.COMMAND)
+        await self.handle_message(event)
+    
+    async def _handle_media_message(self, update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
+        """Handle incoming media messages, downloading images to local cache."""
+        if not update.message:
+            return
+        
+        msg = update.message
+        
+        # Determine media type
+        if msg.sticker:
+            msg_type = MessageType.STICKER
+        elif msg.photo:
+            msg_type = MessageType.PHOTO
+        elif msg.video:
+            msg_type = MessageType.VIDEO
+        elif msg.audio:
+            msg_type = MessageType.AUDIO
+        elif msg.voice:
+            msg_type = MessageType.VOICE
+        else:
+            msg_type = MessageType.DOCUMENT
+        
+        event = self._build_message_event(msg, msg_type)
+        
+        # Add caption as text
+        if msg.caption:
+            event.text = msg.caption
+        
+        # Handle stickers: describe via vision tool with caching
+        if msg.sticker:
+            await self._handle_sticker(msg, event)
+            await self.handle_message(event)
+            return
+        
+        # Download photo to local image cache so the vision tool can access it
+        # even after Telegram's ephemeral file URLs expire (~1 hour).
+        if msg.photo:
+            try:
+                # msg.photo is a list of PhotoSize sorted by size; take the largest
+                photo = msg.photo[-1]
+                file_obj = await photo.get_file()
+                # Download the image bytes directly into memory
+                image_bytes = await file_obj.download_as_bytearray()
+                # Determine extension from the file path if available
+                ext = ".jpg"
+                if file_obj.file_path:
+                    for candidate in [".png", ".webp", ".gif", ".jpeg", ".jpg"]:
+                        if file_obj.file_path.lower().endswith(candidate):
+                            ext = candidate
+                            break
+                # Save to cache and populate media_urls with the local path
+                cached_path = cache_image_from_bytes(bytes(image_bytes), ext=ext)
+                event.media_urls = [cached_path]
+                event.media_types = [f"image/{ext.lstrip('.')}"]
+                print(f"[Telegram] Cached user photo: {cached_path}", flush=True)
+            except Exception as e:
+                print(f"[Telegram] Failed to cache photo: {e}", flush=True)
+        
+        # Download voice/audio messages to cache for STT transcription
+        if msg.voice:
+            try:
+                file_obj = await msg.voice.get_file()
+                audio_bytes = await file_obj.download_as_bytearray()
+                cached_path = cache_audio_from_bytes(bytes(audio_bytes), ext=".ogg")
+                event.media_urls = [cached_path]
+                event.media_types = ["audio/ogg"]
+                print(f"[Telegram] Cached user voice: {cached_path}", flush=True)
+            except Exception as e:
+                print(f"[Telegram] Failed to cache voice: {e}", flush=True)
+        elif msg.audio:
+            try:
+                file_obj = await msg.audio.get_file()
+                audio_bytes = await file_obj.download_as_bytearray()
+                cached_path = cache_audio_from_bytes(bytes(audio_bytes), ext=".mp3")
+                event.media_urls = [cached_path]
+                event.media_types = ["audio/mp3"]
+                print(f"[Telegram] Cached user audio: {cached_path}", flush=True)
+            except Exception as e:
+                print(f"[Telegram] Failed to cache audio: {e}", flush=True)
+        
+        await self.handle_message(event)
+    
+    async def _handle_sticker(self, msg: Message, event: "MessageEvent") -> None:
+        """
+        Describe a Telegram sticker via vision analysis, with caching.
+
+        For static stickers (WEBP), we download, analyze with vision, and cache
+        the description by file_unique_id. For animated/video stickers, we inject
+        a placeholder noting the emoji.
+        """
+        from gateway.sticker_cache import (
+            get_cached_description,
+            cache_sticker_description,
+            build_sticker_injection,
+            build_animated_sticker_injection,
+            STICKER_VISION_PROMPT,
+        )
+
+        sticker = msg.sticker
+        emoji = sticker.emoji or ""
+        set_name = sticker.set_name or ""
+
+        # Animated and video stickers can't be analyzed as static images
+        if sticker.is_animated or sticker.is_video:
+            event.text = build_animated_sticker_injection(emoji)
+            return
+
+        # Check the cache first
+        cached = get_cached_description(sticker.file_unique_id)
+        if cached:
+            event.text = build_sticker_injection(
+                cached["description"], cached.get("emoji", emoji), cached.get("set_name", set_name)
+            )
+            print(f"[Telegram] Sticker cache hit: {sticker.file_unique_id}", flush=True)
+            return
+
+        # Cache miss -- download and analyze
+        try:
+            file_obj = await sticker.get_file()
+            image_bytes = await file_obj.download_as_bytearray()
+            cached_path = cache_image_from_bytes(bytes(image_bytes), ext=".webp")
+            print(f"[Telegram] Analyzing sticker: {cached_path}", flush=True)
+
+            from tools.vision_tools import vision_analyze_tool
+            import json as _json
+
+            result_json = await vision_analyze_tool(
+                image_url=cached_path,
+                user_prompt=STICKER_VISION_PROMPT,
+            )
+            result = _json.loads(result_json)
+
+            if result.get("success"):
+                description = result.get("analysis", "a sticker")
+                cache_sticker_description(sticker.file_unique_id, description, emoji, set_name)
+                event.text = build_sticker_injection(description, emoji, set_name)
+            else:
+                # Vision failed -- use emoji as fallback
+                event.text = build_sticker_injection(
+                    f"a sticker with emoji {emoji}" if emoji else "a sticker",
+                    emoji, set_name,
+                )
+        except Exception as e:
+            print(f"[Telegram] Sticker analysis error: {e}", flush=True)
+            event.text = build_sticker_injection(
+                f"a sticker with emoji {emoji}" if emoji else "a sticker",
+                emoji, set_name,
+            )
+
+    def _build_message_event(self, message: Message, msg_type: MessageType) -> MessageEvent:
+        """Build a MessageEvent from a Telegram message."""
+        chat = message.chat
+        user = message.from_user
+        
+        # Determine chat type
+        chat_type = "dm"
+        if chat.type in (ChatType.GROUP, ChatType.SUPERGROUP):
+            chat_type = "group"
+        elif chat.type == ChatType.CHANNEL:
+            chat_type = "channel"
+        
+        # Build source
+        source = self.build_source(
+            chat_id=str(chat.id),
+            chat_name=chat.title or (chat.full_name if hasattr(chat, "full_name") else None),
+            chat_type=chat_type,
+            user_id=str(user.id) if user else None,
+            user_name=user.full_name if user else None,
+            thread_id=str(message.message_thread_id) if message.message_thread_id else None,
+        )
+        
+        return MessageEvent(
+            text=message.text or "",
+            message_type=msg_type,
+            source=source,
+            raw_message=message,
+            message_id=str(message.message_id),
+            timestamp=message.date,
+        )
--- a/gateway/platforms/whatsapp.py
+++ b/gateway/platforms/whatsapp.py
@@ -0,0 +1,360 @@
+"""
+WhatsApp platform adapter.
+
+WhatsApp integration is more complex than Telegram/Discord because:
+- No official bot API for personal accounts
+- Business API requires Meta Business verification
+- Most solutions use web-based automation
+
+This adapter supports multiple backends:
+1. WhatsApp Business API (requires Meta verification)
+2. whatsapp-web.js (via Node.js subprocess) - for personal accounts
+3. Baileys (via Node.js subprocess) - alternative for personal accounts
+
+For simplicity, we'll implement a generic interface that can work
+with different backends via a bridge pattern.
+"""
+
+import asyncio
+import json
+import subprocess
+from pathlib import Path
+from typing import Dict, List, Optional, Any
+
+import sys
+sys.path.insert(0, str(__file__).rsplit("/", 3)[0])
+
+from gateway.config import Platform, PlatformConfig
+from gateway.platforms.base import (
+    BasePlatformAdapter,
+    MessageEvent,
+    MessageType,
+    SendResult,
+    cache_image_from_url,
+    cache_audio_from_url,
+)
+
+
+def check_whatsapp_requirements() -> bool:
+    """
+    Check if WhatsApp dependencies are available.
+    
+    WhatsApp requires a Node.js bridge for most implementations.
+    """
+    # Check for Node.js
+    try:
+        result = subprocess.run(
+            ["node", "--version"],
+            capture_output=True,
+            text=True,
+            timeout=5
+        )
+        return result.returncode == 0
+    except Exception:
+        return False
+
+
+class WhatsAppAdapter(BasePlatformAdapter):
+    """
+    WhatsApp adapter.
+    
+    This implementation uses a simple HTTP bridge pattern where:
+    1. A Node.js process runs the WhatsApp Web client
+    2. Messages are forwarded via HTTP/IPC to this Python adapter
+    3. Responses are sent back through the bridge
+    
+    The actual Node.js bridge implementation can vary:
+    - whatsapp-web.js based
+    - Baileys based
+    - Business API based
+    
+    Configuration:
+    - bridge_script: Path to the Node.js bridge script
+    - bridge_port: Port for HTTP communication (default: 3000)
+    - session_path: Path to store WhatsApp session data
+    """
+    
+    # WhatsApp message limits
+    MAX_MESSAGE_LENGTH = 65536  # WhatsApp allows longer messages
+    
+    def __init__(self, config: PlatformConfig):
+        super().__init__(config, Platform.WHATSAPP)
+        self._bridge_process: Optional[subprocess.Popen] = None
+        self._bridge_port: int = config.extra.get("bridge_port", 3000)
+        self._bridge_script: Optional[str] = config.extra.get("bridge_script")
+        self._session_path: Path = Path(config.extra.get(
+            "session_path",
+            Path.home() / ".hermes" / "whatsapp" / "session"
+        ))
+        self._message_queue: asyncio.Queue = asyncio.Queue()
+    
+    async def connect(self) -> bool:
+        """
+        Start the WhatsApp bridge.
+        
+        This launches the Node.js bridge process and waits for it to be ready.
+        """
+        if not check_whatsapp_requirements():
+            print(f"[{self.name}] Node.js not found. WhatsApp requires Node.js.")
+            return False
+        
+        if not self._bridge_script:
+            print(f"[{self.name}] No bridge script configured.")
+            print(f"[{self.name}] Set 'bridge_script' in whatsapp.extra config.")
+            print(f"[{self.name}] See docs/messaging.md for WhatsApp setup instructions.")
+            return False
+        
+        bridge_path = Path(self._bridge_script)
+        if not bridge_path.exists():
+            print(f"[{self.name}] Bridge script not found: {bridge_path}")
+            return False
+        
+        try:
+            # Ensure session directory exists
+            self._session_path.mkdir(parents=True, exist_ok=True)
+            
+            # Start the bridge process
+            self._bridge_process = subprocess.Popen(
+                [
+                    "node",
+                    str(bridge_path),
+                    "--port", str(self._bridge_port),
+                    "--session", str(self._session_path),
+                ],
+                stdout=subprocess.PIPE,
+                stderr=subprocess.PIPE,
+                text=True,
+            )
+            
+            # Wait for bridge to be ready (look for ready signal)
+            # This is a simplified version - real implementation would
+            # wait for an HTTP health check or specific stdout message
+            await asyncio.sleep(5)
+            
+            if self._bridge_process.poll() is not None:
+                stderr = self._bridge_process.stderr.read() if self._bridge_process.stderr else ""
+                print(f"[{self.name}] Bridge process died: {stderr}")
+                return False
+            
+            # Start message polling task
+            asyncio.create_task(self._poll_messages())
+            
+            self._running = True
+            print(f"[{self.name}] Bridge started on port {self._bridge_port}")
+            print(f"[{self.name}] Scan QR code if prompted (check bridge output)")
+            return True
+            
+        except Exception as e:
+            print(f"[{self.name}] Failed to start bridge: {e}")
+            return False
+    
+    async def disconnect(self) -> None:
+        """Stop the WhatsApp bridge."""
+        if self._bridge_process:
+            try:
+                self._bridge_process.terminate()
+                await asyncio.sleep(1)
+                if self._bridge_process.poll() is None:
+                    self._bridge_process.kill()
+            except Exception as e:
+                print(f"[{self.name}] Error stopping bridge: {e}")
+        
+        self._running = False
+        self._bridge_process = None
+        print(f"[{self.name}] Disconnected")
+    
+    async def send(
+        self,
+        chat_id: str,
+        content: str,
+        reply_to: Optional[str] = None,
+        metadata: Optional[Dict[str, Any]] = None
+    ) -> SendResult:
+        """Send a message via the WhatsApp bridge."""
+        if not self._running:
+            return SendResult(success=False, error="Not connected")
+        
+        try:
+            import aiohttp
+            
+            async with aiohttp.ClientSession() as session:
+                payload = {
+                    "chatId": chat_id,
+                    "message": content,
+                }
+                if reply_to:
+                    payload["replyTo"] = reply_to
+                
+                async with session.post(
+                    f"http://localhost:{self._bridge_port}/send",
+                    json=payload,
+                    timeout=aiohttp.ClientTimeout(total=30)
+                ) as resp:
+                    if resp.status == 200:
+                        data = await resp.json()
+                        return SendResult(
+                            success=True,
+                            message_id=data.get("messageId"),
+                            raw_response=data
+                        )
+                    else:
+                        error = await resp.text()
+                        return SendResult(success=False, error=error)
+                        
+        except ImportError:
+            return SendResult(
+                success=False, 
+                error="aiohttp not installed. Run: pip install aiohttp"
+            )
+        except Exception as e:
+            return SendResult(success=False, error=str(e))
+    
+    async def send_typing(self, chat_id: str) -> None:
+        """Send typing indicator via bridge."""
+        if not self._running:
+            return
+        
+        try:
+            import aiohttp
+            
+            async with aiohttp.ClientSession() as session:
+                await session.post(
+                    f"http://localhost:{self._bridge_port}/typing",
+                    json={"chatId": chat_id},
+                    timeout=aiohttp.ClientTimeout(total=5)
+                )
+        except Exception:
+            pass  # Ignore typing indicator failures
+    
+    async def get_chat_info(self, chat_id: str) -> Dict[str, Any]:
+        """Get information about a WhatsApp chat."""
+        if not self._running:
+            return {"name": "Unknown", "type": "dm"}
+        
+        try:
+            import aiohttp
+            
+            async with aiohttp.ClientSession() as session:
+                async with session.get(
+                    f"http://localhost:{self._bridge_port}/chat/{chat_id}",
+                    timeout=aiohttp.ClientTimeout(total=10)
+                ) as resp:
+                    if resp.status == 200:
+                        data = await resp.json()
+                        return {
+                            "name": data.get("name", chat_id),
+                            "type": "group" if data.get("isGroup") else "dm",
+                            "participants": data.get("participants", []),
+                        }
+        except Exception:
+            pass
+        
+        return {"name": chat_id, "type": "dm"}
+    
+    async def _poll_messages(self) -> None:
+        """Poll the bridge for incoming messages."""
+        try:
+            import aiohttp
+        except ImportError:
+            print(f"[{self.name}] aiohttp not installed, message polling disabled")
+            return
+        
+        while self._running:
+            try:
+                async with aiohttp.ClientSession() as session:
+                    async with session.get(
+                        f"http://localhost:{self._bridge_port}/messages",
+                        timeout=aiohttp.ClientTimeout(total=30)
+                    ) as resp:
+                        if resp.status == 200:
+                            messages = await resp.json()
+                            for msg_data in messages:
+                                event = await self._build_message_event(msg_data)
+                                if event:
+                                    await self.handle_message(event)
+            except asyncio.CancelledError:
+                break
+            except Exception as e:
+                print(f"[{self.name}] Poll error: {e}")
+                await asyncio.sleep(5)
+            
+            await asyncio.sleep(1)  # Poll interval
+    
+    async def _build_message_event(self, data: Dict[str, Any]) -> Optional[MessageEvent]:
+        """Build a MessageEvent from bridge message data, downloading images to cache."""
+        try:
+            # Determine message type
+            msg_type = MessageType.TEXT
+            if data.get("hasMedia"):
+                media_type = data.get("mediaType", "")
+                if "image" in media_type:
+                    msg_type = MessageType.PHOTO
+                elif "video" in media_type:
+                    msg_type = MessageType.VIDEO
+                elif "audio" in media_type or "ptt" in media_type:  # ptt = voice note
+                    msg_type = MessageType.VOICE
+                else:
+                    msg_type = MessageType.DOCUMENT
+            
+            # Determine chat type
+            is_group = data.get("isGroup", False)
+            chat_type = "group" if is_group else "dm"
+            
+            # Build source
+            source = self.build_source(
+                chat_id=data.get("chatId", ""),
+                chat_name=data.get("chatName"),
+                chat_type=chat_type,
+                user_id=data.get("senderId"),
+                user_name=data.get("senderName"),
+            )
+            
+            # Download image media URLs to the local cache so the vision tool
+            # can access them reliably regardless of URL expiration.
+            raw_urls = data.get("mediaUrls", [])
+            cached_urls = []
+            media_types = []
+            for url in raw_urls:
+                if msg_type == MessageType.PHOTO and url.startswith(("http://", "https://")):
+                    try:
+                        cached_path = await cache_image_from_url(url, ext=".jpg")
+                        cached_urls.append(cached_path)
+                        media_types.append("image/jpeg")
+                        print(f"[{self.name}] Cached user image: {cached_path}", flush=True)
+                    except Exception as e:
+                        print(f"[{self.name}] Failed to cache image: {e}", flush=True)
+                        cached_urls.append(url)
+                        media_types.append("image/jpeg")
+                elif msg_type == MessageType.VOICE and url.startswith(("http://", "https://")):
+                    try:
+                        cached_path = await cache_audio_from_url(url, ext=".ogg")
+                        cached_urls.append(cached_path)
+                        media_types.append("audio/ogg")
+                        print(f"[{self.name}] Cached user voice: {cached_path}", flush=True)
+                    except Exception as e:
+                        print(f"[{self.name}] Failed to cache voice: {e}", flush=True)
+                        cached_urls.append(url)
+                        media_types.append("audio/ogg")
+                else:
+                    cached_urls.append(url)
+                    media_types.append("unknown")
+            
+            return MessageEvent(
+                text=data.get("body", ""),
+                message_type=msg_type,
+                source=source,
+                raw_message=data,
+                message_id=data.get("messageId"),
+                media_urls=cached_urls,
+                media_types=media_types,
+            )
+        except Exception as e:
+            print(f"[{self.name}] Error building event: {e}")
+            return None
+
+
+# Note: A reference Node.js bridge script would be provided in scripts/whatsapp-bridge/
+# It would use whatsapp-web.js or Baileys to:
+# 1. Handle WhatsApp Web authentication (QR code)
+# 2. Listen for incoming messages
+# 3. Expose HTTP endpoints for send/receive/status
--- a/gateway/run.py
+++ b/gateway/run.py
--- a/gateway/session.py
+++ b/gateway/session.py
@@ -0,0 +1,533 @@
+"""
+Session management for the gateway.
+
+Handles:
+- Session context tracking (where messages come from)
+- Session storage (conversations persisted to disk)
+- Reset policy evaluation (when to start fresh)
+- Dynamic system prompt injection (agent knows its context)
+"""
+
+import os
+import json
+import uuid
+from pathlib import Path
+from datetime import datetime, timedelta
+from dataclasses import dataclass, field
+from typing import Dict, List, Optional, Any
+
+from .config import (
+    Platform,
+    GatewayConfig,
+    SessionResetPolicy,
+    HomeChannel,
+)
+
+
+@dataclass
+class SessionSource:
+    """
+    Describes where a message originated from.
+    
+    This information is used to:
+    1. Route responses back to the right place
+    2. Inject context into the system prompt
+    3. Track origin for cron job delivery
+    """
+    platform: Platform
+    chat_id: str
+    chat_name: Optional[str] = None
+    chat_type: str = "dm"  # "dm", "group", "channel", "thread"
+    user_id: Optional[str] = None
+    user_name: Optional[str] = None
+    thread_id: Optional[str] = None  # For forum topics, Discord threads, etc.
+    
+    @property
+    def description(self) -> str:
+        """Human-readable description of the source."""
+        if self.platform == Platform.LOCAL:
+            return "CLI terminal"
+        
+        parts = []
+        if self.chat_type == "dm":
+            parts.append(f"DM with {self.user_name or self.user_id or 'user'}")
+        elif self.chat_type == "group":
+            parts.append(f"group: {self.chat_name or self.chat_id}")
+        elif self.chat_type == "channel":
+            parts.append(f"channel: {self.chat_name or self.chat_id}")
+        else:
+            parts.append(self.chat_name or self.chat_id)
+        
+        if self.thread_id:
+            parts.append(f"thread: {self.thread_id}")
+        
+        return ", ".join(parts)
+    
+    def to_dict(self) -> Dict[str, Any]:
+        return {
+            "platform": self.platform.value,
+            "chat_id": self.chat_id,
+            "chat_name": self.chat_name,
+            "chat_type": self.chat_type,
+            "user_id": self.user_id,
+            "user_name": self.user_name,
+            "thread_id": self.thread_id,
+        }
+    
+    @classmethod
+    def from_dict(cls, data: Dict[str, Any]) -> "SessionSource":
+        return cls(
+            platform=Platform(data["platform"]),
+            chat_id=str(data["chat_id"]),
+            chat_name=data.get("chat_name"),
+            chat_type=data.get("chat_type", "dm"),
+            user_id=data.get("user_id"),
+            user_name=data.get("user_name"),
+            thread_id=data.get("thread_id"),
+        )
+    
+    @classmethod
+    def local_cli(cls) -> "SessionSource":
+        """Create a source representing the local CLI."""
+        return cls(
+            platform=Platform.LOCAL,
+            chat_id="cli",
+            chat_name="CLI terminal",
+            chat_type="dm",
+        )
+
+
+@dataclass
+class SessionContext:
+    """
+    Full context for a session, used for dynamic system prompt injection.
+    
+    The agent receives this information to understand:
+    - Where messages are coming from
+    - What platforms are available
+    - Where it can deliver scheduled task outputs
+    """
+    source: SessionSource
+    connected_platforms: List[Platform]
+    home_channels: Dict[Platform, HomeChannel]
+    
+    # Session metadata
+    session_key: str = ""
+    session_id: str = ""
+    created_at: Optional[datetime] = None
+    updated_at: Optional[datetime] = None
+    
+    def to_dict(self) -> Dict[str, Any]:
+        return {
+            "source": self.source.to_dict(),
+            "connected_platforms": [p.value for p in self.connected_platforms],
+            "home_channels": {
+                p.value: hc.to_dict() for p, hc in self.home_channels.items()
+            },
+            "session_key": self.session_key,
+            "session_id": self.session_id,
+            "created_at": self.created_at.isoformat() if self.created_at else None,
+            "updated_at": self.updated_at.isoformat() if self.updated_at else None,
+        }
+
+
+def build_session_context_prompt(context: SessionContext) -> str:
+    """
+    Build the dynamic system prompt section that tells the agent about its context.
+    
+    This is injected into the system prompt so the agent knows:
+    - Where messages are coming from
+    - What platforms are connected
+    - Where it can deliver scheduled task outputs
+    """
+    lines = [
+        "## Current Session Context",
+        "",
+    ]
+    
+    # Source info
+    platform_name = context.source.platform.value.title()
+    if context.source.platform == Platform.LOCAL:
+        lines.append(f"**Source:** {platform_name} (the machine running this agent)")
+    else:
+        lines.append(f"**Source:** {platform_name} ({context.source.description})")
+    
+    # Connected platforms
+    platforms_list = ["local (files on this machine)"]
+    for p in context.connected_platforms:
+        if p != Platform.LOCAL:
+            platforms_list.append(f"{p.value}: Connected ✓")
+    
+    lines.append(f"**Connected Platforms:** {', '.join(platforms_list)}")
+    
+    # Home channels
+    if context.home_channels:
+        lines.append("")
+        lines.append("**Home Channels (default destinations):**")
+        for platform, home in context.home_channels.items():
+            lines.append(f"  - {platform.value}: {home.name} (ID: {home.chat_id})")
+    
+    # Delivery options for scheduled tasks
+    lines.append("")
+    lines.append("**Delivery options for scheduled tasks:**")
+    
+    # Origin delivery
+    if context.source.platform == Platform.LOCAL:
+        lines.append("- `\"origin\"` → Local output (saved to files)")
+    else:
+        lines.append(f"- `\"origin\"` → Back to this chat ({context.source.chat_name or context.source.chat_id})")
+    
+    # Local always available
+    lines.append("- `\"local\"` → Save to local files only (~/.hermes/cron/output/)")
+    
+    # Platform home channels
+    for platform, home in context.home_channels.items():
+        lines.append(f"- `\"{platform.value}\"` → Home channel ({home.name})")
+    
+    # Note about explicit targeting
+    lines.append("")
+    lines.append("*For explicit targeting, use `\"platform:chat_id\"` format if the user provides a specific chat ID.*")
+    
+    return "\n".join(lines)
+
+
+@dataclass
+class SessionEntry:
+    """
+    Entry in the session store.
+    
+    Maps a session key to its current session ID and metadata.
+    """
+    session_key: str
+    session_id: str
+    created_at: datetime
+    updated_at: datetime
+    
+    # Origin metadata for delivery routing
+    origin: Optional[SessionSource] = None
+    
+    # Display metadata
+    display_name: Optional[str] = None
+    platform: Optional[Platform] = None
+    chat_type: str = "dm"
+    
+    # Token tracking
+    input_tokens: int = 0
+    output_tokens: int = 0
+    total_tokens: int = 0
+    
+    def to_dict(self) -> Dict[str, Any]:
+        result = {
+            "session_key": self.session_key,
+            "session_id": self.session_id,
+            "created_at": self.created_at.isoformat(),
+            "updated_at": self.updated_at.isoformat(),
+            "display_name": self.display_name,
+            "platform": self.platform.value if self.platform else None,
+            "chat_type": self.chat_type,
+            "input_tokens": self.input_tokens,
+            "output_tokens": self.output_tokens,
+            "total_tokens": self.total_tokens,
+        }
+        if self.origin:
+            result["origin"] = self.origin.to_dict()
+        return result
+    
+    @classmethod
+    def from_dict(cls, data: Dict[str, Any]) -> "SessionEntry":
+        origin = None
+        if "origin" in data and data["origin"]:
+            origin = SessionSource.from_dict(data["origin"])
+        
+        platform = None
+        if data.get("platform"):
+            try:
+                platform = Platform(data["platform"])
+            except ValueError:
+                pass
+        
+        return cls(
+            session_key=data["session_key"],
+            session_id=data["session_id"],
+            created_at=datetime.fromisoformat(data["created_at"]),
+            updated_at=datetime.fromisoformat(data["updated_at"]),
+            origin=origin,
+            display_name=data.get("display_name"),
+            platform=platform,
+            chat_type=data.get("chat_type", "dm"),
+            input_tokens=data.get("input_tokens", 0),
+            output_tokens=data.get("output_tokens", 0),
+            total_tokens=data.get("total_tokens", 0),
+        )
+
+
+class SessionStore:
+    """
+    Manages session storage and retrieval.
+    
+    Sessions are stored in:
+    - sessions.json: Index mapping session keys to session IDs
+    - {session_id}.jsonl: Conversation transcripts
+    """
+    
+    def __init__(self, sessions_dir: Path, config: GatewayConfig,
+                 has_active_processes_fn=None):
+        self.sessions_dir = sessions_dir
+        self.config = config
+        self._entries: Dict[str, SessionEntry] = {}
+        self._loaded = False
+        # Optional callback to check if a session has active background processes.
+        # When set, sessions with running processes are exempt from reset.
+        self._has_active_processes_fn = has_active_processes_fn
+    
+    def _ensure_loaded(self) -> None:
+        """Load sessions from disk if not already loaded."""
+        if self._loaded:
+            return
+        
+        self.sessions_dir.mkdir(parents=True, exist_ok=True)
+        sessions_file = self.sessions_dir / "sessions.json"
+        
+        if sessions_file.exists():
+            try:
+                with open(sessions_file, "r") as f:
+                    data = json.load(f)
+                    for key, entry_data in data.items():
+                        self._entries[key] = SessionEntry.from_dict(entry_data)
+            except Exception as e:
+                print(f"[gateway] Warning: Failed to load sessions: {e}")
+        
+        self._loaded = True
+    
+    def _save(self) -> None:
+        """Save sessions index to disk."""
+        self.sessions_dir.mkdir(parents=True, exist_ok=True)
+        sessions_file = self.sessions_dir / "sessions.json"
+        
+        data = {key: entry.to_dict() for key, entry in self._entries.items()}
+        with open(sessions_file, "w") as f:
+            json.dump(data, f, indent=2)
+    
+    def _generate_session_key(self, source: SessionSource) -> str:
+        """Generate a session key from a source."""
+        platform = source.platform.value
+        
+        if source.chat_type == "dm":
+            # DMs share the main session per platform
+            return f"agent:main:{platform}:dm"
+        else:
+            # Groups/channels get their own keys
+            return f"agent:main:{platform}:{source.chat_type}:{source.chat_id}"
+    
+    def _should_reset(self, entry: SessionEntry, source: SessionSource) -> bool:
+        """
+        Check if a session should be reset based on policy.
+        
+        Returns True if the session is stale and should start fresh.
+        Sessions with active background processes are never reset.
+        """
+        # Don't reset sessions that have active background processes
+        if self._has_active_processes_fn:
+            session_key = self._generate_session_key(source)
+            if self._has_active_processes_fn(session_key):
+                return False
+
+        policy = self.config.get_reset_policy(
+            platform=source.platform,
+            session_type=source.chat_type
+        )
+        
+        now = datetime.now()
+        
+        # Check idle timeout
+        if policy.mode in ("idle", "both"):
+            idle_deadline = entry.updated_at + timedelta(minutes=policy.idle_minutes)
+            if now > idle_deadline:
+                return True
+        
+        # Check daily reset
+        if policy.mode in ("daily", "both"):
+            # Find the most recent reset boundary
+            today_reset = now.replace(
+                hour=policy.at_hour, 
+                minute=0, 
+                second=0, 
+                microsecond=0
+            )
+            if now.hour < policy.at_hour:
+                # Reset boundary was yesterday
+                today_reset -= timedelta(days=1)
+            
+            if entry.updated_at < today_reset:
+                return True
+        
+        return False
+    
+    def get_or_create_session(
+        self, 
+        source: SessionSource,
+        force_new: bool = False
+    ) -> SessionEntry:
+        """
+        Get an existing session or create a new one.
+        
+        Evaluates reset policy to determine if the existing session is stale.
+        """
+        self._ensure_loaded()
+        
+        session_key = self._generate_session_key(source)
+        now = datetime.now()
+        
+        # Check for existing session
+        if session_key in self._entries and not force_new:
+            entry = self._entries[session_key]
+            
+            # Check if session should be reset
+            if not self._should_reset(entry, source):
+                # Update timestamp and return existing
+                entry.updated_at = now
+                self._save()
+                return entry
+        
+        # Create new session
+        session_id = f"{now.strftime('%Y%m%d_%H%M%S')}_{uuid.uuid4().hex[:8]}"
+        
+        entry = SessionEntry(
+            session_key=session_key,
+            session_id=session_id,
+            created_at=now,
+            updated_at=now,
+            origin=source,
+            display_name=source.chat_name,
+            platform=source.platform,
+            chat_type=source.chat_type,
+        )
+        
+        self._entries[session_key] = entry
+        self._save()
+        
+        return entry
+    
+    def update_session(
+        self, 
+        session_key: str,
+        input_tokens: int = 0,
+        output_tokens: int = 0
+    ) -> None:
+        """Update a session's metadata after an interaction."""
+        self._ensure_loaded()
+        
+        if session_key in self._entries:
+            entry = self._entries[session_key]
+            entry.updated_at = datetime.now()
+            entry.input_tokens += input_tokens
+            entry.output_tokens += output_tokens
+            entry.total_tokens = entry.input_tokens + entry.output_tokens
+            self._save()
+    
+    def reset_session(self, session_key: str) -> Optional[SessionEntry]:
+        """Force reset a session, creating a new session ID."""
+        self._ensure_loaded()
+        
+        if session_key not in self._entries:
+            return None
+        
+        old_entry = self._entries[session_key]
+        now = datetime.now()
+        session_id = f"{now.strftime('%Y%m%d_%H%M%S')}_{uuid.uuid4().hex[:8]}"
+        
+        new_entry = SessionEntry(
+            session_key=session_key,
+            session_id=session_id,
+            created_at=now,
+            updated_at=now,
+            origin=old_entry.origin,
+            display_name=old_entry.display_name,
+            platform=old_entry.platform,
+            chat_type=old_entry.chat_type,
+        )
+        
+        self._entries[session_key] = new_entry
+        self._save()
+        
+        return new_entry
+    
+    def list_sessions(self, active_minutes: Optional[int] = None) -> List[SessionEntry]:
+        """
+        List all sessions, optionally filtered by activity.
+        
+        Args:
+            active_minutes: If provided, only return sessions updated within this many minutes
+        """
+        self._ensure_loaded()
+        
+        entries = list(self._entries.values())
+        
+        if active_minutes is not None:
+            cutoff = datetime.now() - timedelta(minutes=active_minutes)
+            entries = [e for e in entries if e.updated_at >= cutoff]
+        
+        # Sort by most recently updated
+        entries.sort(key=lambda e: e.updated_at, reverse=True)
+        
+        return entries
+    
+    def get_transcript_path(self, session_id: str) -> Path:
+        """Get the path to a session's transcript file."""
+        return self.sessions_dir / f"{session_id}.jsonl"
+    
+    def append_to_transcript(self, session_id: str, message: Dict[str, Any]) -> None:
+        """Append a message to a session's transcript."""
+        transcript_path = self.get_transcript_path(session_id)
+        
+        with open(transcript_path, "a") as f:
+            f.write(json.dumps(message, ensure_ascii=False) + "\n")
+    
+    def load_transcript(self, session_id: str) -> List[Dict[str, Any]]:
+        """Load all messages from a session's transcript."""
+        transcript_path = self.get_transcript_path(session_id)
+        
+        if not transcript_path.exists():
+            return []
+        
+        messages = []
+        with open(transcript_path, "r") as f:
+            for line in f:
+                line = line.strip()
+                if line:
+                    messages.append(json.loads(line))
+        
+        return messages
+
+
+def build_session_context(
+    source: SessionSource,
+    config: GatewayConfig,
+    session_entry: Optional[SessionEntry] = None
+) -> SessionContext:
+    """
+    Build a full session context from a source and config.
+    
+    This is used to inject context into the agent's system prompt.
+    """
+    connected = config.get_connected_platforms()
+    
+    home_channels = {}
+    for platform in connected:
+        home = config.get_home_channel(platform)
+        if home:
+            home_channels[platform] = home
+    
+    context = SessionContext(
+        source=source,
+        connected_platforms=connected,
+        home_channels=home_channels,
+    )
+    
+    if session_entry:
+        context.session_key = session_entry.session_key
+        context.session_id = session_entry.session_id
+        context.created_at = session_entry.created_at
+        context.updated_at = session_entry.updated_at
+    
+    return context
--- a/gateway/sticker_cache.py
+++ b/gateway/sticker_cache.py
@@ -0,0 +1,111 @@
+"""
+Sticker description cache for Telegram.
+
+When users send stickers, we describe them via the vision tool and cache
+the descriptions keyed by file_unique_id so we don't re-analyze the same
+sticker image on every send. Descriptions are concise (1-2 sentences).
+
+Cache location: ~/.hermes/sticker_cache.json
+"""
+
+import json
+import os
+import time
+from pathlib import Path
+from typing import Optional
+
+
+CACHE_PATH = Path(os.path.expanduser("~/.hermes/sticker_cache.json"))
+
+# Vision prompt for describing stickers -- kept concise to save tokens
+STICKER_VISION_PROMPT = (
+    "Describe this sticker in 1-2 sentences. Focus on what it depicts -- "
+    "character, action, emotion. Be concise and objective."
+)
+
+
+def _load_cache() -> dict:
+    """Load the sticker cache from disk."""
+    if CACHE_PATH.exists():
+        try:
+            return json.loads(CACHE_PATH.read_text(encoding="utf-8"))
+        except (json.JSONDecodeError, OSError):
+            return {}
+    return {}
+
+
+def _save_cache(cache: dict) -> None:
+    """Save the sticker cache to disk."""
+    CACHE_PATH.parent.mkdir(parents=True, exist_ok=True)
+    CACHE_PATH.write_text(
+        json.dumps(cache, indent=2, ensure_ascii=False),
+        encoding="utf-8",
+    )
+
+
+def get_cached_description(file_unique_id: str) -> Optional[dict]:
+    """
+    Look up a cached sticker description.
+
+    Returns:
+        dict with keys {description, emoji, set_name, cached_at} or None.
+    """
+    cache = _load_cache()
+    return cache.get(file_unique_id)
+
+
+def cache_sticker_description(
+    file_unique_id: str,
+    description: str,
+    emoji: str = "",
+    set_name: str = "",
+) -> None:
+    """
+    Store a sticker description in the cache.
+
+    Args:
+        file_unique_id: Telegram's stable sticker identifier.
+        description:    Vision-generated description text.
+        emoji:          Associated emoji (e.g. "😀").
+        set_name:       Sticker set name if available.
+    """
+    cache = _load_cache()
+    cache[file_unique_id] = {
+        "description": description,
+        "emoji": emoji,
+        "set_name": set_name,
+        "cached_at": time.time(),
+    }
+    _save_cache(cache)
+
+
+def build_sticker_injection(
+    description: str,
+    emoji: str = "",
+    set_name: str = "",
+) -> str:
+    """
+    Build the warm-style injection text for a sticker description.
+
+    Returns a string like:
+      [The user sent a sticker 😀 from "MyPack"~ It shows: "A cat waving" (=^.w.^=)]
+    """
+    context = ""
+    if set_name and emoji:
+        context = f" {emoji} from \"{set_name}\""
+    elif emoji:
+        context = f" {emoji}"
+
+    return f"[The user sent a sticker{context}~ It shows: \"{description}\" (=^.w.^=)]"
+
+
+def build_animated_sticker_injection(emoji: str = "") -> str:
+    """
+    Build injection text for animated/video stickers we can't analyze.
+    """
+    if emoji:
+        return (
+            f"[The user sent an animated sticker {emoji}~ "
+            f"I can't see animated ones yet, but the emoji suggests: {emoji}]"
+        )
+    return "[The user sent an animated sticker~ I can't see animated ones yet]"
--- a/hermes_cli/init.py
+++ b/hermes_cli/init.py
@@ -0,0 +1,14 @@
+"""
+Hermes CLI - Unified command-line interface for Hermes Agent.
+
+Provides subcommands for:
+- hermes chat          - Interactive chat (same as ./hermes)
+- hermes gateway       - Run gateway in foreground
+- hermes gateway start - Start gateway service
+- hermes gateway stop  - Stop gateway service  
+- hermes setup         - Interactive setup wizard
+- hermes status        - Show status of all components
+- hermes cron          - Manage cron jobs
+"""
+
+__version__ = "0.1.0"
--- a/hermes_cli/config.py
+++ b/hermes_cli/config.py
@@ -0,0 +1,897 @@
+"""
+Configuration management for Hermes Agent.
+
+Config files are stored in ~/.hermes/ for easy access:
+- ~/.hermes/config.yaml  - All settings (model, toolsets, terminal, etc.)
+- ~/.hermes/.env         - API keys and secrets
+
+This module provides:
+- hermes config          - Show current configuration
+- hermes config edit     - Open config in editor
+- hermes config set      - Set a specific value
+- hermes config wizard   - Re-run setup wizard
+"""
+
+import os
+import sys
+import subprocess
+from pathlib import Path
+from typing import Dict, Any, Optional, List, Tuple
+
+import yaml
+
+# ANSI colors
+class Colors:
+    RESET = "\033[0m"
+    BOLD = "\033[1m"
+    DIM = "\033[2m"
+    RED = "\033[31m"
+    GREEN = "\033[32m"
+    YELLOW = "\033[33m"
+    BLUE = "\033[34m"
+    MAGENTA = "\033[35m"
+    CYAN = "\033[36m"
+
+def color(text: str, *codes) -> str:
+    if not sys.stdout.isatty():
+        return text
+    return "".join(codes) + text + Colors.RESET
+
+
+# =============================================================================
+# Config paths
+# =============================================================================
+
+def get_hermes_home() -> Path:
+    """Get the Hermes home directory (~/.hermes)."""
+    return Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
+
+def get_config_path() -> Path:
+    """Get the main config file path."""
+    return get_hermes_home() / "config.yaml"
+
+def get_env_path() -> Path:
+    """Get the .env file path (for API keys)."""
+    return get_hermes_home() / ".env"
+
+def get_project_root() -> Path:
+    """Get the project installation directory."""
+    return Path(__file__).parent.parent.resolve()
+
+def ensure_hermes_home():
+    """Ensure ~/.hermes directory structure exists."""
+    home = get_hermes_home()
+    (home / "cron").mkdir(parents=True, exist_ok=True)
+    (home / "sessions").mkdir(parents=True, exist_ok=True)
+    (home / "logs").mkdir(parents=True, exist_ok=True)
+
+
+# =============================================================================
+# Config loading/saving
+# =============================================================================
+
+DEFAULT_CONFIG = {
+    "model": "anthropic/claude-opus-4.6",
+    "toolsets": ["hermes-cli"],
+    "max_turns": 100,
+    
+    "terminal": {
+        "backend": "local",
+        "cwd": ".",  # Use current directory
+        "timeout": 180,
+        "docker_image": "nikolaik/python-nodejs:python3.11-nodejs20",
+        "singularity_image": "docker://nikolaik/python-nodejs:python3.11-nodejs20",
+        "modal_image": "nikolaik/python-nodejs:python3.11-nodejs20",
+    },
+    
+    "browser": {
+        "inactivity_timeout": 120,
+    },
+    
+    "compression": {
+        "enabled": True,
+        "threshold": 0.85,
+        "summary_model": "google/gemini-3-flash-preview",
+    },
+    
+    "display": {
+        "compact": False,
+        "personality": "kawaii",
+    },
+    
+    # Text-to-speech configuration
+    "tts": {
+        "provider": "edge",  # "edge" (free) | "elevenlabs" (premium) | "openai"
+        "edge": {
+            "voice": "en-US-AriaNeural",
+            # Popular: AriaNeural, JennyNeural, AndrewNeural, BrianNeural, SoniaNeural
+        },
+        "elevenlabs": {
+            "voice_id": "pNInz6obpgDQGcFmaJgB",  # Adam
+            "model_id": "eleven_multilingual_v2",
+        },
+        "openai": {
+            "model": "gpt-4o-mini-tts",
+            "voice": "alloy",
+            # Voices: alloy, echo, fable, onyx, nova, shimmer
+        },
+    },
+    
+    "stt": {
+        "enabled": True,
+        "model": "whisper-1",
+    },
+    
+    "human_delay": {
+        "mode": "off",
+        "min_ms": 800,
+        "max_ms": 2500,
+    },
+    
+    # Permanently allowed dangerous command patterns (added via "always" approval)
+    "command_allowlist": [],
+    
+    # Config schema version - bump this when adding new required fields
+    "_config_version": 2,
+}
+
+# =============================================================================
+# Config Migration System
+# =============================================================================
+
+# Required environment variables with metadata for migration prompts
+REQUIRED_ENV_VARS = {
+    "OPENROUTER_API_KEY": {
+        "description": "OpenRouter API key (required for vision, web scraping, and tools)",
+        "prompt": "OpenRouter API key",
+        "url": "https://openrouter.ai/keys",
+        "required": True,
+        "password": True,
+    },
+}
+
+# Optional environment variables that enhance functionality
+OPTIONAL_ENV_VARS = {
+    "FIRECRAWL_API_KEY": {
+        "description": "Firecrawl API key for web search and scraping",
+        "prompt": "Firecrawl API key",
+        "url": "https://firecrawl.dev/",
+        "tools": ["web_search", "web_extract"],
+        "password": True,
+    },
+    "BROWSERBASE_API_KEY": {
+        "description": "Browserbase API key for browser automation",
+        "prompt": "Browserbase API key", 
+        "url": "https://browserbase.com/",
+        "tools": ["browser_navigate", "browser_click", "etc."],
+        "password": True,
+    },
+    "BROWSERBASE_PROJECT_ID": {
+        "description": "Browserbase project ID",
+        "prompt": "Browserbase project ID",
+        "url": "https://browserbase.com/",
+        "tools": ["browser_navigate", "browser_click", "etc."],
+        "password": False,
+    },
+    "FAL_KEY": {
+        "description": "FAL API key for image generation",
+        "prompt": "FAL API key",
+        "url": "https://fal.ai/",
+        "tools": ["image_generate"],
+        "password": True,
+    },
+    "TINKER_API_KEY": {
+        "description": "Tinker API key for RL training",
+        "prompt": "Tinker API key",
+        "url": "https://tinker-console.thinkingmachines.ai/keys",
+        "tools": ["rl_start_training", "rl_check_status", "rl_stop_training"],
+        "password": True,
+    },
+    "WANDB_API_KEY": {
+        "description": "Weights & Biases API key for experiment tracking",
+        "prompt": "WandB API key",
+        "url": "https://wandb.ai/authorize",
+        "tools": ["rl_get_results", "rl_check_status"],
+        "password": True,
+    },
+    "OPENAI_BASE_URL": {
+        "description": "Custom OpenAI-compatible API endpoint (for VLLM/SGLang/etc.)",
+        "prompt": "OpenAI-compatible base URL (only if running your own endpoint)",
+        "url": None,
+        "password": False,
+        "advanced": True,  # Hide from standard migrate flow
+    },
+    "HERMES_OPENAI_API_KEY": {
+        "description": "OpenAI API key for voice transcription (Whisper) and OpenAI TTS",
+        "prompt": "OpenAI API Key (for Whisper STT + TTS)",
+        "url": "https://platform.openai.com/api-keys",
+        "tools": ["voice_transcription", "openai_tts"],
+        "password": True,
+    },
+    "SLACK_BOT_TOKEN": {
+        "description": "Slack bot integration",
+        "prompt": "Slack Bot Token (xoxb-...)",
+        "url": "https://api.slack.com/apps",
+        "tools": ["slack"],
+        "password": True,
+    },
+    "SLACK_APP_TOKEN": {
+        "description": "Slack Socket Mode connection",
+        "prompt": "Slack App Token (xapp-...)",
+        "url": "https://api.slack.com/apps",
+        "tools": ["slack"],
+        "password": True,
+    },
+    # Messaging platform tokens
+    "TELEGRAM_BOT_TOKEN": {
+        "description": "Telegram bot token from @BotFather",
+        "prompt": "Telegram bot token",
+        "url": "https://t.me/BotFather",
+        "password": True,
+    },
+    "TELEGRAM_ALLOWED_USERS": {
+        "description": "Comma-separated Telegram user IDs allowed to use the bot (get ID from @userinfobot)",
+        "prompt": "Allowed Telegram user IDs (comma-separated)",
+        "url": "https://t.me/userinfobot",
+        "password": False,
+    },
+    "DISCORD_BOT_TOKEN": {
+        "description": "Discord bot token from Developer Portal",
+        "prompt": "Discord bot token",
+        "url": "https://discord.com/developers/applications",
+        "password": True,
+    },
+    "DISCORD_ALLOWED_USERS": {
+        "description": "Comma-separated Discord user IDs allowed to use the bot",
+        "prompt": "Allowed Discord user IDs (comma-separated)",
+        "url": None,
+        "password": False,
+    },
+    # Text-to-speech (premium providers)
+    "ELEVENLABS_API_KEY": {
+        "description": "ElevenLabs API key for premium text-to-speech voices",
+        "prompt": "ElevenLabs API key",
+        "url": "https://elevenlabs.io/",
+        "password": True,
+    },
+    # Terminal configuration
+    "MESSAGING_CWD": {
+        "description": "Working directory for terminal commands via messaging (Telegram/Discord/etc). CLI always uses current directory.",
+        "prompt": "Messaging working directory (default: home)",
+        "url": None,
+        "password": False,
+    },
+    "SUDO_PASSWORD": {
+        "description": "Sudo password for terminal commands requiring root access",
+        "prompt": "Sudo password",
+        "url": None,
+        "password": True,
+    },
+    # Agent configuration
+    "HERMES_MAX_ITERATIONS": {
+        "description": "Maximum tool-calling iterations per conversation (default: 60)",
+        "prompt": "Max iterations",
+        "url": None,
+        "password": False,
+    },
+    "HERMES_TOOL_PROGRESS": {
+        "description": "Send tool progress messages in messaging channels (true/false)",
+        "prompt": "Enable tool progress messages",
+        "url": None,
+        "password": False,
+    },
+    "HERMES_TOOL_PROGRESS_MODE": {
+        "description": "Progress mode: 'all' (every tool) or 'new' (only when tool changes)",
+        "prompt": "Progress mode (all/new)",
+        "url": None,
+        "password": False,
+    },
+}
+
+
+def get_missing_env_vars(required_only: bool = False) -> List[Dict[str, Any]]:
+    """
+    Check which environment variables are missing.
+    
+    Returns list of dicts with var info for missing variables.
+    """
+    missing = []
+    
+    # Check required vars
+    for var_name, info in REQUIRED_ENV_VARS.items():
+        if not get_env_value(var_name):
+            missing.append({"name": var_name, **info, "is_required": True})
+    
+    # Check optional vars (if not required_only)
+    if not required_only:
+        for var_name, info in OPTIONAL_ENV_VARS.items():
+            if not get_env_value(var_name):
+                missing.append({"name": var_name, **info, "is_required": False})
+    
+    return missing
+
+
+def get_missing_config_fields() -> List[Dict[str, Any]]:
+    """
+    Check which config fields are missing or outdated.
+    
+    Returns list of missing/outdated fields.
+    """
+    config = load_config()
+    missing = []
+    
+    # Check for new top-level keys in DEFAULT_CONFIG
+    for key, default_value in DEFAULT_CONFIG.items():
+        if key.startswith('_'):
+            continue  # Skip internal keys
+        if key not in config:
+            missing.append({
+                "key": key,
+                "default": default_value,
+                "description": f"New config section: {key}",
+            })
+        elif isinstance(default_value, dict):
+            # Check nested keys
+            for subkey, subvalue in default_value.items():
+                if subkey not in config.get(key, {}):
+                    missing.append({
+                        "key": f"{key}.{subkey}",
+                        "default": subvalue,
+                        "description": f"New config option: {key}.{subkey}",
+                    })
+    
+    return missing
+
+
+def check_config_version() -> Tuple[int, int]:
+    """
+    Check config version.
+    
+    Returns (current_version, latest_version).
+    """
+    config = load_config()
+    current = config.get("_config_version", 0)
+    latest = DEFAULT_CONFIG.get("_config_version", 1)
+    return current, latest
+
+
+def migrate_config(interactive: bool = True, quiet: bool = False) -> Dict[str, Any]:
+    """
+    Migrate config to latest version, prompting for new required fields.
+    
+    Args:
+        interactive: If True, prompt user for missing values
+        quiet: If True, suppress output
+        
+    Returns:
+        Dict with migration results: {"env_added": [...], "config_added": [...], "warnings": [...]}
+    """
+    results = {"env_added": [], "config_added": [], "warnings": []}
+    
+    # Check config version
+    current_ver, latest_ver = check_config_version()
+    
+    if current_ver < latest_ver and not quiet:
+        print(f"Config version: {current_ver} → {latest_ver}")
+    
+    # Check for missing required env vars
+    missing_env = get_missing_env_vars(required_only=True)
+    
+    if missing_env and not quiet:
+        print("\n⚠️  Missing required environment variables:")
+        for var in missing_env:
+            print(f"   • {var['name']}: {var['description']}")
+    
+    if interactive and missing_env:
+        print("\nLet's configure them now:\n")
+        for var in missing_env:
+            if var.get("url"):
+                print(f"  Get your key at: {var['url']}")
+            
+            if var.get("password"):
+                import getpass
+                value = getpass.getpass(f"  {var['prompt']}: ")
+            else:
+                value = input(f"  {var['prompt']}: ").strip()
+            
+            if value:
+                save_env_value(var["name"], value)
+                results["env_added"].append(var["name"])
+                print(f"  ✓ Saved {var['name']}")
+            else:
+                results["warnings"].append(f"Skipped {var['name']} - some features may not work")
+            print()
+    
+    # Check for missing optional env vars and offer to configure interactively
+    # Skip "advanced" vars (like OPENAI_BASE_URL) -- those are for power users
+    missing_optional = get_missing_env_vars(required_only=False)
+    required_names = {v["name"] for v in missing_env} if missing_env else set()
+    missing_optional = [
+        v for v in missing_optional
+        if v["name"] not in required_names and not v.get("advanced")
+    ]
+    
+    if interactive and missing_optional:
+        print("  Would you like to configure any optional keys now?")
+        try:
+            answer = input("  Configure optional keys? [y/N]: ").strip().lower()
+        except (EOFError, KeyboardInterrupt):
+            answer = "n"
+        
+        if answer in ("y", "yes"):
+            print()
+            for var in missing_optional:
+                desc = var.get("description", "")
+                if var.get("url"):
+                    print(f"  {desc}")
+                    print(f"  Get your key at: {var['url']}")
+                else:
+                    print(f"  {desc}")
+                
+                if var.get("password"):
+                    import getpass
+                    value = getpass.getpass(f"  {var['prompt']} (Enter to skip): ")
+                else:
+                    value = input(f"  {var['prompt']} (Enter to skip): ").strip()
+                
+                if value:
+                    save_env_value(var["name"], value)
+                    results["env_added"].append(var["name"])
+                    print(f"  ✓ Saved {var['name']}")
+                print()
+    
+    # Check for missing config fields
+    missing_config = get_missing_config_fields()
+    
+    if missing_config:
+        config = load_config()
+        
+        for field in missing_config:
+            key = field["key"]
+            default = field["default"]
+            
+            # Add with default value
+            if "." in key:
+                # Nested key
+                parent, child = key.split(".", 1)
+                if parent not in config:
+                    config[parent] = {}
+                config[parent][child] = default
+            else:
+                config[key] = default
+            
+            results["config_added"].append(key)
+            if not quiet:
+                print(f"  ✓ Added {key} = {default}")
+        
+        # Update version and save
+        config["_config_version"] = latest_ver
+        save_config(config)
+    elif current_ver < latest_ver:
+        # Just update version
+        config = load_config()
+        config["_config_version"] = latest_ver
+        save_config(config)
+    
+    return results
+
+
+def load_config() -> Dict[str, Any]:
+    """Load configuration from ~/.hermes/config.yaml."""
+    import copy
+    config_path = get_config_path()
+    
+    # Deep copy to avoid mutating DEFAULT_CONFIG
+    config = copy.deepcopy(DEFAULT_CONFIG)
+    
+    if config_path.exists():
+        try:
+            with open(config_path) as f:
+                user_config = yaml.safe_load(f) or {}
+            
+            # Deep merge user values over defaults
+            for key, value in user_config.items():
+                if isinstance(value, dict) and key in config and isinstance(config[key], dict):
+                    config[key].update(value)
+                else:
+                    config[key] = value
+        except Exception as e:
+            print(f"Warning: Failed to load config: {e}")
+    
+    return config
+
+
+def save_config(config: Dict[str, Any]):
+    """Save configuration to ~/.hermes/config.yaml."""
+    ensure_hermes_home()
+    config_path = get_config_path()
+    
+    with open(config_path, 'w') as f:
+        yaml.dump(config, f, default_flow_style=False, sort_keys=False)
+
+
+def load_env() -> Dict[str, str]:
+    """Load environment variables from ~/.hermes/.env."""
+    env_path = get_env_path()
+    env_vars = {}
+    
+    if env_path.exists():
+        with open(env_path) as f:
+            for line in f:
+                line = line.strip()
+                if line and not line.startswith('#') and '=' in line:
+                    key, _, value = line.partition('=')
+                    env_vars[key.strip()] = value.strip().strip('"\'')
+    
+    return env_vars
+
+
+def save_env_value(key: str, value: str):
+    """Save or update a value in ~/.hermes/.env."""
+    ensure_hermes_home()
+    env_path = get_env_path()
+    
+    # Load existing
+    lines = []
+    if env_path.exists():
+        with open(env_path) as f:
+            lines = f.readlines()
+    
+    # Find and update or append
+    found = False
+    for i, line in enumerate(lines):
+        if line.strip().startswith(f"{key}="):
+            lines[i] = f"{key}={value}\n"
+            found = True
+            break
+    
+    if not found:
+        # Ensure there's a newline at the end of the file before appending
+        if lines and not lines[-1].endswith("\n"):
+            lines[-1] += "\n"
+        lines.append(f"{key}={value}\n")
+    
+    with open(env_path, 'w') as f:
+        f.writelines(lines)
+
+
+def get_env_value(key: str) -> Optional[str]:
+    """Get a value from ~/.hermes/.env or environment."""
+    # Check environment first
+    if key in os.environ:
+        return os.environ[key]
+    
+    # Then check .env file
+    env_vars = load_env()
+    return env_vars.get(key)
+
+
+# =============================================================================
+# Config display
+# =============================================================================
+
+def redact_key(key: str) -> str:
+    """Redact an API key for display."""
+    if not key:
+        return color("(not set)", Colors.DIM)
+    if len(key) < 12:
+        return "***"
+    return key[:4] + "..." + key[-4:]
+
+
+def show_config():
+    """Display current configuration."""
+    config = load_config()
+    env_vars = load_env()
+    
+    print()
+    print(color("┌─────────────────────────────────────────────────────────┐", Colors.CYAN))
+    print(color("│              🦋 Hermes Configuration                    │", Colors.CYAN))
+    print(color("└─────────────────────────────────────────────────────────┘", Colors.CYAN))
+    
+    # Paths
+    print()
+    print(color("◆ Paths", Colors.CYAN, Colors.BOLD))
+    print(f"  Config:       {get_config_path()}")
+    print(f"  Secrets:      {get_env_path()}")
+    print(f"  Install:      {get_project_root()}")
+    
+    # API Keys
+    print()
+    print(color("◆ API Keys", Colors.CYAN, Colors.BOLD))
+    
+    keys = [
+        ("OPENROUTER_API_KEY", "OpenRouter"),
+        ("ANTHROPIC_API_KEY", "Anthropic"),
+        ("HERMES_OPENAI_API_KEY", "OpenAI (STT/TTS)"),
+        ("FIRECRAWL_API_KEY", "Firecrawl"),
+        ("BROWSERBASE_API_KEY", "Browserbase"),
+        ("FAL_KEY", "FAL"),
+    ]
+    
+    for env_key, name in keys:
+        value = get_env_value(env_key)
+        print(f"  {name:<14} {redact_key(value)}")
+    
+    # Model settings
+    print()
+    print(color("◆ Model", Colors.CYAN, Colors.BOLD))
+    print(f"  Model:        {config.get('model', 'not set')}")
+    print(f"  Max turns:    {config.get('max_turns', 100)}")
+    print(f"  Toolsets:     {', '.join(config.get('toolsets', ['all']))}")
+    
+    # Terminal
+    print()
+    print(color("◆ Terminal", Colors.CYAN, Colors.BOLD))
+    terminal = config.get('terminal', {})
+    print(f"  Backend:      {terminal.get('backend', 'local')}")
+    print(f"  Working dir:  {terminal.get('cwd', '.')}")
+    print(f"  Timeout:      {terminal.get('timeout', 60)}s")
+    
+    if terminal.get('backend') == 'docker':
+        print(f"  Docker image: {terminal.get('docker_image', 'python:3.11-slim')}")
+    elif terminal.get('backend') == 'singularity':
+        print(f"  Image:        {terminal.get('singularity_image', 'docker://python:3.11')}")
+    elif terminal.get('backend') == 'modal':
+        print(f"  Modal image:  {terminal.get('modal_image', 'python:3.11')}")
+        modal_token = get_env_value('MODAL_TOKEN_ID')
+        print(f"  Modal token:  {'configured' if modal_token else '(not set)'}")
+    elif terminal.get('backend') == 'ssh':
+        ssh_host = get_env_value('TERMINAL_SSH_HOST')
+        ssh_user = get_env_value('TERMINAL_SSH_USER')
+        print(f"  SSH host:     {ssh_host or '(not set)'}")
+        print(f"  SSH user:     {ssh_user or '(not set)'}")
+    
+    # Compression
+    print()
+    print(color("◆ Context Compression", Colors.CYAN, Colors.BOLD))
+    compression = config.get('compression', {})
+    enabled = compression.get('enabled', True)
+    print(f"  Enabled:      {'yes' if enabled else 'no'}")
+    if enabled:
+        print(f"  Threshold:    {compression.get('threshold', 0.85) * 100:.0f}%")
+        print(f"  Model:        {compression.get('summary_model', 'google/gemini-3-flash-preview')}")
+    
+    # Messaging
+    print()
+    print(color("◆ Messaging Platforms", Colors.CYAN, Colors.BOLD))
+    
+    telegram_token = get_env_value('TELEGRAM_BOT_TOKEN')
+    discord_token = get_env_value('DISCORD_BOT_TOKEN')
+    
+    print(f"  Telegram:     {'configured' if telegram_token else color('not configured', Colors.DIM)}")
+    print(f"  Discord:      {'configured' if discord_token else color('not configured', Colors.DIM)}")
+    
+    print()
+    print(color("─" * 60, Colors.DIM))
+    print(color("  hermes config edit     # Edit config file", Colors.DIM))
+    print(color("  hermes config set KEY VALUE", Colors.DIM))
+    print(color("  hermes setup           # Run setup wizard", Colors.DIM))
+    print()
+
+
+def edit_config():
+    """Open config file in user's editor."""
+    config_path = get_config_path()
+    
+    # Ensure config exists
+    if not config_path.exists():
+        save_config(DEFAULT_CONFIG)
+        print(f"Created {config_path}")
+    
+    # Find editor
+    editor = os.getenv('EDITOR') or os.getenv('VISUAL')
+    
+    if not editor:
+        # Try common editors
+        for cmd in ['nano', 'vim', 'vi', 'code', 'notepad']:
+            import shutil
+            if shutil.which(cmd):
+                editor = cmd
+                break
+    
+    if not editor:
+        print(f"No editor found. Config file is at:")
+        print(f"  {config_path}")
+        return
+    
+    print(f"Opening {config_path} in {editor}...")
+    subprocess.run([editor, str(config_path)])
+
+
+def set_config_value(key: str, value: str):
+    """Set a configuration value."""
+    # Check if it's an API key (goes to .env)
+    api_keys = [
+        'OPENROUTER_API_KEY', 'ANTHROPIC_API_KEY', 'HERMES_OPENAI_API_KEY',
+        'FIRECRAWL_API_KEY', 'BROWSERBASE_API_KEY', 'BROWSERBASE_PROJECT_ID',
+        'FAL_KEY', 'TELEGRAM_BOT_TOKEN', 'DISCORD_BOT_TOKEN',
+        'TERMINAL_SSH_HOST', 'TERMINAL_SSH_USER', 'TERMINAL_SSH_KEY',
+        'SUDO_PASSWORD', 'SLACK_BOT_TOKEN', 'SLACK_APP_TOKEN',
+    ]
+    
+    if key.upper() in api_keys or key.upper().startswith('TERMINAL_SSH'):
+        save_env_value(key.upper(), value)
+        print(f"✓ Set {key} in {get_env_path()}")
+        return
+    
+    # Otherwise it goes to config.yaml
+    # Read the raw user config (not merged with defaults) to avoid
+    # dumping all default values back to the file
+    config_path = get_config_path()
+    user_config = {}
+    if config_path.exists():
+        try:
+            with open(config_path) as f:
+                user_config = yaml.safe_load(f) or {}
+        except Exception:
+            user_config = {}
+    
+    # Handle nested keys (e.g., "tts.provider")
+    parts = key.split('.')
+    current = user_config
+    
+    for part in parts[:-1]:
+        if part not in current or not isinstance(current.get(part), dict):
+            current[part] = {}
+        current = current[part]
+    
+    # Convert value to appropriate type
+    if value.lower() in ('true', 'yes', 'on'):
+        value = True
+    elif value.lower() in ('false', 'no', 'off'):
+        value = False
+    elif value.isdigit():
+        value = int(value)
+    elif value.replace('.', '', 1).isdigit():
+        value = float(value)
+    
+    current[parts[-1]] = value
+    
+    # Write only user config back (not the full merged defaults)
+    ensure_hermes_home()
+    with open(config_path, 'w') as f:
+        yaml.dump(user_config, f, default_flow_style=False, sort_keys=False)
+    
+    print(f"✓ Set {key} = {value} in {config_path}")
+
+
+# =============================================================================
+# Command handler
+# =============================================================================
+
+def config_command(args):
+    """Handle config subcommands."""
+    subcmd = getattr(args, 'config_command', None)
+    
+    if subcmd is None or subcmd == "show":
+        show_config()
+    
+    elif subcmd == "edit":
+        edit_config()
+    
+    elif subcmd == "set":
+        key = getattr(args, 'key', None)
+        value = getattr(args, 'value', None)
+        if not key or not value:
+            print("Usage: hermes config set KEY VALUE")
+            print()
+            print("Examples:")
+            print("  hermes config set model anthropic/claude-sonnet-4")
+            print("  hermes config set terminal.backend docker")
+            print("  hermes config set OPENROUTER_API_KEY sk-or-...")
+            sys.exit(1)
+        set_config_value(key, value)
+    
+    elif subcmd == "path":
+        print(get_config_path())
+    
+    elif subcmd == "env-path":
+        print(get_env_path())
+    
+    elif subcmd == "migrate":
+        print()
+        print(color("🔄 Checking configuration for updates...", Colors.CYAN, Colors.BOLD))
+        print()
+        
+        # Check what's missing
+        missing_env = get_missing_env_vars(required_only=False)
+        missing_config = get_missing_config_fields()
+        current_ver, latest_ver = check_config_version()
+        
+        if not missing_env and not missing_config and current_ver >= latest_ver:
+            print(color("✓ Configuration is up to date!", Colors.GREEN))
+            print()
+            return
+        
+        # Show what needs to be updated
+        if current_ver < latest_ver:
+            print(f"  Config version: {current_ver} → {latest_ver}")
+        
+        if missing_config:
+            print(f"\n  {len(missing_config)} new config option(s) will be added with defaults")
+        
+        required_missing = [v for v in missing_env if v.get("is_required")]
+        optional_missing = [
+            v for v in missing_env
+            if not v.get("is_required") and not v.get("advanced")
+        ]
+        
+        if required_missing:
+            print(f"\n  ⚠️  {len(required_missing)} required API key(s) missing:")
+            for var in required_missing:
+                print(f"     • {var['name']}")
+        
+        if optional_missing:
+            print(f"\n  ℹ️  {len(optional_missing)} optional API key(s) not configured:")
+            for var in optional_missing:
+                tools = var.get("tools", [])
+                tools_str = f" (enables: {', '.join(tools[:2])})" if tools else ""
+                print(f"     • {var['name']}{tools_str}")
+        
+        print()
+        
+        # Run migration
+        results = migrate_config(interactive=True, quiet=False)
+        
+        print()
+        if results["env_added"] or results["config_added"]:
+            print(color("✓ Configuration updated!", Colors.GREEN))
+        
+        if results["warnings"]:
+            print()
+            for warning in results["warnings"]:
+                print(color(f"  ⚠️  {warning}", Colors.YELLOW))
+        
+        print()
+    
+    elif subcmd == "check":
+        # Non-interactive check for what's missing
+        print()
+        print(color("📋 Configuration Status", Colors.CYAN, Colors.BOLD))
+        print()
+        
+        current_ver, latest_ver = check_config_version()
+        if current_ver >= latest_ver:
+            print(f"  Config version: {current_ver} ✓")
+        else:
+            print(color(f"  Config version: {current_ver} → {latest_ver} (update available)", Colors.YELLOW))
+        
+        print()
+        print(color("  Required:", Colors.BOLD))
+        for var_name in REQUIRED_ENV_VARS:
+            if get_env_value(var_name):
+                print(f"    ✓ {var_name}")
+            else:
+                print(color(f"    ✗ {var_name} (missing)", Colors.RED))
+        
+        print()
+        print(color("  Optional:", Colors.BOLD))
+        for var_name, info in OPTIONAL_ENV_VARS.items():
+            if get_env_value(var_name):
+                print(f"    ✓ {var_name}")
+            else:
+                tools = info.get("tools", [])
+                tools_str = f" → {', '.join(tools[:2])}" if tools else ""
+                print(color(f"    ○ {var_name}{tools_str}", Colors.DIM))
+        
+        missing_config = get_missing_config_fields()
+        if missing_config:
+            print()
+            print(color(f"  {len(missing_config)} new config option(s) available", Colors.YELLOW))
+            print(f"    Run 'hermes config migrate' to add them")
+        
+        print()
+    
+    else:
+        print(f"Unknown config command: {subcmd}")
+        print()
+        print("Available commands:")
+        print("  hermes config           Show current configuration")
+        print("  hermes config edit      Open config in editor")
+        print("  hermes config set K V   Set a config value")
+        print("  hermes config check     Check for missing/outdated config")
+        print("  hermes config migrate   Update config with new options")
+        print("  hermes config path      Show config file path")
+        print("  hermes config env-path  Show .env file path")
+        sys.exit(1)
--- a/hermes_cli/cron.py
+++ b/hermes_cli/cron.py
@@ -0,0 +1,131 @@
+"""
+Cron subcommand for hermes CLI.
+
+Handles: hermes cron [list|daemon|tick]
+"""
+
+import json
+import sys
+import time
+from pathlib import Path
+from datetime import datetime
+
+PROJECT_ROOT = Path(__file__).parent.parent.resolve()
+sys.path.insert(0, str(PROJECT_ROOT))
+
+# ANSI colors
+class Colors:
+    RESET = "\033[0m"
+    BOLD = "\033[1m"
+    DIM = "\033[2m"
+    RED = "\033[31m"
+    GREEN = "\033[32m"
+    YELLOW = "\033[33m"
+    CYAN = "\033[36m"
+
+def color(text: str, *codes) -> str:
+    if not sys.stdout.isatty():
+        return text
+    return "".join(codes) + text + Colors.RESET
+
+
+def cron_list(show_all: bool = False):
+    """List all scheduled jobs."""
+    from cron.jobs import list_jobs
+    
+    jobs = list_jobs(include_disabled=show_all)
+    
+    if not jobs:
+        print(color("No scheduled jobs.", Colors.DIM))
+        print(color("Create one with: hermes cron add <schedule> <prompt>", Colors.DIM))
+        return
+    
+    print()
+    print(color("┌─────────────────────────────────────────────────────────────────────────┐", Colors.CYAN))
+    print(color("│                         Scheduled Jobs                                  │", Colors.CYAN))
+    print(color("└─────────────────────────────────────────────────────────────────────────┘", Colors.CYAN))
+    print()
+    
+    for job in jobs:
+        job_id = job.get("id", "?")[:8]
+        name = job.get("name", "(unnamed)")
+        schedule = job.get("schedule_display", job.get("schedule", {}).get("value", "?"))
+        enabled = job.get("enabled", True)
+        next_run = job.get("next_run_at", "?")
+        
+        # Repeat info
+        repeat_info = job.get("repeat", {})
+        repeat_times = repeat_info.get("times")
+        repeat_completed = repeat_info.get("completed", 0)
+        
+        if repeat_times:
+            repeat_str = f"{repeat_completed}/{repeat_times}"
+        else:
+            repeat_str = "∞"
+        
+        # Delivery targets
+        deliver = job.get("deliver", ["local"])
+        if isinstance(deliver, str):
+            deliver = [deliver]
+        deliver_str = ", ".join(deliver)
+        
+        # Status indicator
+        if not enabled:
+            status = color("[disabled]", Colors.RED)
+        else:
+            status = color("[active]", Colors.GREEN)
+        
+        print(f"  {color(job_id, Colors.YELLOW)} {status}")
+        print(f"    Name:      {name}")
+        print(f"    Schedule:  {schedule}")
+        print(f"    Repeat:    {repeat_str}")
+        print(f"    Next run:  {next_run}")
+        print(f"    Deliver:   {deliver_str}")
+        print()
+
+
+def cron_daemon(interval: int = 60):
+    """Run the cron daemon."""
+    from cron.scheduler import start_daemon
+    
+    print(color("┌─────────────────────────────────────────────────────────┐", Colors.CYAN))
+    print(color("│              🦋 Hermes Cron Daemon                      │", Colors.CYAN))
+    print(color("├─────────────────────────────────────────────────────────┤", Colors.CYAN))
+    print(color("│  Press Ctrl+C to stop                                   │", Colors.CYAN))
+    print(color("└─────────────────────────────────────────────────────────┘", Colors.CYAN))
+    print()
+    
+    try:
+        start_daemon(interval=interval)
+    except KeyboardInterrupt:
+        print()
+        print(color("Cron daemon stopped.", Colors.YELLOW))
+
+
+def cron_tick():
+    """Run due jobs once (for system cron integration)."""
+    from cron.scheduler import tick
+    
+    print(f"[{datetime.now().isoformat()}] Running cron tick...")
+    tick()
+
+
+def cron_command(args):
+    """Handle cron subcommands."""
+    subcmd = getattr(args, 'cron_command', None)
+    
+    if subcmd is None or subcmd == "list":
+        show_all = getattr(args, 'all', False)
+        cron_list(show_all)
+    
+    elif subcmd == "daemon":
+        interval = getattr(args, 'interval', 60)
+        cron_daemon(interval)
+    
+    elif subcmd == "tick":
+        cron_tick()
+    
+    else:
+        print(f"Unknown cron command: {subcmd}")
+        print("Usage: hermes cron [list|daemon|tick]")
+        sys.exit(1)
--- a/hermes_cli/doctor.py
+++ b/hermes_cli/doctor.py
@@ -0,0 +1,402 @@
+"""
+Doctor command for hermes CLI.
+
+Diagnoses issues with Hermes Agent setup.
+"""
+
+import os
+import sys
+import subprocess
+import shutil
+from pathlib import Path
+
+from hermes_cli.config import get_project_root, get_hermes_home, get_env_path
+
+PROJECT_ROOT = get_project_root()
+HERMES_HOME = get_hermes_home()
+
+# Load environment variables from ~/.hermes/.env so API key checks work
+from dotenv import load_dotenv
+_env_path = get_env_path()
+if _env_path.exists():
+    load_dotenv(_env_path)
+# Also try project .env as fallback
+load_dotenv(PROJECT_ROOT / ".env", override=False)
+
+# ANSI colors
+class Colors:
+    RESET = "\033[0m"
+    BOLD = "\033[1m"
+    DIM = "\033[2m"
+    RED = "\033[31m"
+    GREEN = "\033[32m"
+    YELLOW = "\033[33m"
+    CYAN = "\033[36m"
+
+def color(text: str, *codes) -> str:
+    if not sys.stdout.isatty():
+        return text
+    return "".join(codes) + text + Colors.RESET
+
+def check_ok(text: str, detail: str = ""):
+    print(f"  {color('✓', Colors.GREEN)} {text}" + (f" {color(detail, Colors.DIM)}" if detail else ""))
+
+def check_warn(text: str, detail: str = ""):
+    print(f"  {color('⚠', Colors.YELLOW)} {text}" + (f" {color(detail, Colors.DIM)}" if detail else ""))
+
+def check_fail(text: str, detail: str = ""):
+    print(f"  {color('✗', Colors.RED)} {text}" + (f" {color(detail, Colors.DIM)}" if detail else ""))
+
+def check_info(text: str):
+    print(f"    {color('→', Colors.CYAN)} {text}")
+
+
+def run_doctor(args):
+    """Run diagnostic checks."""
+    should_fix = getattr(args, 'fix', False)
+    
+    issues = []
+    
+    print()
+    print(color("┌─────────────────────────────────────────────────────────┐", Colors.CYAN))
+    print(color("│                 🩺 Hermes Doctor                        │", Colors.CYAN))
+    print(color("└─────────────────────────────────────────────────────────┘", Colors.CYAN))
+    
+    # =========================================================================
+    # Check: Python version
+    # =========================================================================
+    print()
+    print(color("◆ Python Environment", Colors.CYAN, Colors.BOLD))
+    
+    py_version = sys.version_info
+    if py_version >= (3, 11):
+        check_ok(f"Python {py_version.major}.{py_version.minor}.{py_version.micro}")
+    elif py_version >= (3, 10):
+        check_ok(f"Python {py_version.major}.{py_version.minor}.{py_version.micro}")
+        check_warn("Python 3.11+ recommended for RL Training tools (tinker requires >= 3.11)")
+    elif py_version >= (3, 8):
+        check_warn(f"Python {py_version.major}.{py_version.minor}.{py_version.micro}", "(3.10+ recommended)")
+    else:
+        check_fail(f"Python {py_version.major}.{py_version.minor}.{py_version.micro}", "(3.10+ required)")
+        issues.append("Upgrade Python to 3.10+")
+    
+    # Check if in virtual environment
+    in_venv = sys.prefix != sys.base_prefix
+    if in_venv:
+        check_ok("Virtual environment active")
+    else:
+        check_warn("Not in virtual environment", "(recommended)")
+    
+    # =========================================================================
+    # Check: Required packages
+    # =========================================================================
+    print()
+    print(color("◆ Required Packages", Colors.CYAN, Colors.BOLD))
+    
+    required_packages = [
+        ("openai", "OpenAI SDK"),
+        ("rich", "Rich (terminal UI)"),
+        ("dotenv", "python-dotenv"),
+        ("yaml", "PyYAML"),
+        ("httpx", "HTTPX"),
+    ]
+    
+    optional_packages = [
+        ("croniter", "Croniter (cron expressions)"),
+        ("telegram", "python-telegram-bot"),
+        ("discord", "discord.py"),
+    ]
+    
+    for module, name in required_packages:
+        try:
+            __import__(module)
+            check_ok(name)
+        except ImportError:
+            check_fail(name, "(missing)")
+            issues.append(f"Install {name}: uv pip install {module}")
+    
+    for module, name in optional_packages:
+        try:
+            __import__(module)
+            check_ok(name, "(optional)")
+        except ImportError:
+            check_warn(name, "(optional, not installed)")
+    
+    # =========================================================================
+    # Check: Configuration files
+    # =========================================================================
+    print()
+    print(color("◆ Configuration Files", Colors.CYAN, Colors.BOLD))
+    
+    # Check ~/.hermes/.env (primary location for user config)
+    env_path = HERMES_HOME / '.env'
+    if env_path.exists():
+        check_ok("~/.hermes/.env file exists")
+        
+        # Check for common issues
+        content = env_path.read_text()
+        if "OPENROUTER_API_KEY" in content or "ANTHROPIC_API_KEY" in content:
+            check_ok("API key configured")
+        else:
+            check_warn("No API key found in ~/.hermes/.env")
+            issues.append("Run 'hermes setup' to configure API keys")
+    else:
+        # Also check project root as fallback
+        fallback_env = PROJECT_ROOT / '.env'
+        if fallback_env.exists():
+            check_ok(".env file exists (in project directory)")
+        else:
+            check_fail("~/.hermes/.env file missing")
+            check_info("Run 'hermes setup' to create one")
+            issues.append("Run 'hermes setup' to create .env")
+    
+    # Check ~/.hermes/config.yaml (primary) or project cli-config.yaml (fallback)
+    config_path = HERMES_HOME / 'config.yaml'
+    if config_path.exists():
+        check_ok("~/.hermes/config.yaml exists")
+    else:
+        fallback_config = PROJECT_ROOT / 'cli-config.yaml'
+        if fallback_config.exists():
+            check_ok("cli-config.yaml exists (in project directory)")
+        else:
+            check_warn("config.yaml not found", "(using defaults)")
+    
+    # =========================================================================
+    # Check: Directory structure
+    # =========================================================================
+    print()
+    print(color("◆ Directory Structure", Colors.CYAN, Colors.BOLD))
+    
+    hermes_home = Path.home() / ".hermes"
+    if hermes_home.exists():
+        check_ok("~/.hermes directory exists")
+    else:
+        check_warn("~/.hermes not found", "(will be created on first use)")
+    
+    # Check for SOUL.md persona file
+    soul_path = hermes_home / "SOUL.md"
+    if soul_path.exists():
+        content = soul_path.read_text(encoding="utf-8").strip()
+        # Check if it's just the template comments (no real content)
+        lines = [l for l in content.splitlines() if l.strip() and not l.strip().startswith(("<!--", "-->", "#"))]
+        if lines:
+            check_ok("~/.hermes/SOUL.md exists (persona configured)")
+        else:
+            check_info("~/.hermes/SOUL.md exists but is empty — edit it to customize personality")
+    else:
+        check_warn("~/.hermes/SOUL.md not found", "(create it to give Hermes a custom personality)")
+        if should_fix:
+            soul_path.parent.mkdir(parents=True, exist_ok=True)
+            soul_path.write_text("# Hermes Agent Persona\n\n<!-- Edit this file to customize how Hermes communicates. -->\n", encoding="utf-8")
+            check_ok("Created ~/.hermes/SOUL.md")
+    
+    logs_dir = PROJECT_ROOT / "logs"
+    if logs_dir.exists():
+        check_ok("logs/ directory exists")
+    else:
+        check_warn("logs/ not found", "(will be created on first use)")
+    
+    # =========================================================================
+    # Check: External tools
+    # =========================================================================
+    print()
+    print(color("◆ External Tools", Colors.CYAN, Colors.BOLD))
+    
+    # Git
+    if shutil.which("git"):
+        check_ok("git")
+    else:
+        check_warn("git not found", "(optional)")
+    
+    # ripgrep (optional, for faster file search)
+    if shutil.which("rg"):
+        check_ok("ripgrep (rg)", "(faster file search)")
+    else:
+        check_warn("ripgrep (rg) not found", "(file search uses grep fallback)")
+        check_info("Install for faster search: sudo apt install ripgrep")
+    
+    # Docker (optional)
+    terminal_env = os.getenv("TERMINAL_ENV", "local")
+    if terminal_env == "docker":
+        if shutil.which("docker"):
+            # Check if docker daemon is running
+            result = subprocess.run(["docker", "info"], capture_output=True)
+            if result.returncode == 0:
+                check_ok("docker", "(daemon running)")
+            else:
+                check_fail("docker daemon not running")
+                issues.append("Start Docker daemon")
+        else:
+            check_fail("docker not found", "(required for TERMINAL_ENV=docker)")
+            issues.append("Install Docker or change TERMINAL_ENV")
+    else:
+        if shutil.which("docker"):
+            check_ok("docker", "(optional)")
+        else:
+            check_warn("docker not found", "(optional)")
+    
+    # SSH (if using ssh backend)
+    if terminal_env == "ssh":
+        ssh_host = os.getenv("TERMINAL_SSH_HOST")
+        if ssh_host:
+            # Try to connect
+            result = subprocess.run(
+                ["ssh", "-o", "ConnectTimeout=5", "-o", "BatchMode=yes", ssh_host, "echo ok"],
+                capture_output=True,
+                text=True
+            )
+            if result.returncode == 0:
+                check_ok(f"SSH connection to {ssh_host}")
+            else:
+                check_fail(f"SSH connection to {ssh_host}")
+                issues.append(f"Check SSH configuration for {ssh_host}")
+        else:
+            check_fail("TERMINAL_SSH_HOST not set", "(required for TERMINAL_ENV=ssh)")
+            issues.append("Set TERMINAL_SSH_HOST in .env")
+    
+    # Node.js + agent-browser (for browser automation tools)
+    if shutil.which("node"):
+        check_ok("Node.js")
+        # Check if agent-browser is installed
+        agent_browser_path = PROJECT_ROOT / "node_modules" / "agent-browser"
+        if agent_browser_path.exists():
+            check_ok("agent-browser (Node.js)", "(browser automation)")
+        else:
+            check_warn("agent-browser not installed", "(run: npm install)")
+    else:
+        check_warn("Node.js not found", "(optional, needed for browser tools)")
+    
+    # =========================================================================
+    # Check: API connectivity
+    # =========================================================================
+    print()
+    print(color("◆ API Connectivity", Colors.CYAN, Colors.BOLD))
+    
+    openrouter_key = os.getenv("OPENROUTER_API_KEY")
+    if openrouter_key:
+        try:
+            import httpx
+            response = httpx.get(
+                "https://openrouter.ai/api/v1/models",
+                headers={"Authorization": f"Bearer {openrouter_key}"},
+                timeout=10
+            )
+            if response.status_code == 200:
+                check_ok("OpenRouter API")
+            elif response.status_code == 401:
+                check_fail("OpenRouter API", "(invalid API key)")
+                issues.append("Check OPENROUTER_API_KEY in .env")
+            else:
+                check_fail("OpenRouter API", f"(HTTP {response.status_code})")
+        except Exception as e:
+            check_fail("OpenRouter API", f"({e})")
+            issues.append("Check network connectivity")
+    else:
+        check_warn("OpenRouter API", "(not configured)")
+    
+    anthropic_key = os.getenv("ANTHROPIC_API_KEY")
+    if anthropic_key:
+        try:
+            import httpx
+            response = httpx.get(
+                "https://api.anthropic.com/v1/models",
+                headers={
+                    "x-api-key": anthropic_key,
+                    "anthropic-version": "2023-06-01"
+                },
+                timeout=10
+            )
+            if response.status_code == 200:
+                check_ok("Anthropic API")
+            elif response.status_code == 401:
+                check_fail("Anthropic API", "(invalid API key)")
+            else:
+                # Note: Anthropic may not have /models endpoint
+                check_warn("Anthropic API", "(couldn't verify)")
+        except Exception as e:
+            check_warn("Anthropic API", f"({e})")
+    
+    # =========================================================================
+    # Check: Submodules
+    # =========================================================================
+    print()
+    print(color("◆ Submodules", Colors.CYAN, Colors.BOLD))
+    
+    # mini-swe-agent (terminal tool backend)
+    mini_swe_dir = PROJECT_ROOT / "mini-swe-agent"
+    if mini_swe_dir.exists() and (mini_swe_dir / "pyproject.toml").exists():
+        try:
+            __import__("minisweagent")
+            check_ok("mini-swe-agent", "(terminal backend)")
+        except ImportError:
+            check_warn("mini-swe-agent found but not installed", "(run: uv pip install -e ./mini-swe-agent)")
+            issues.append("Install mini-swe-agent: uv pip install -e ./mini-swe-agent")
+    else:
+        check_warn("mini-swe-agent not found", "(run: git submodule update --init --recursive)")
+    
+    # tinker-atropos (RL training backend)
+    tinker_dir = PROJECT_ROOT / "tinker-atropos"
+    if tinker_dir.exists() and (tinker_dir / "pyproject.toml").exists():
+        if py_version >= (3, 11):
+            try:
+                __import__("tinker_atropos")
+                check_ok("tinker-atropos", "(RL training backend)")
+            except ImportError:
+                check_warn("tinker-atropos found but not installed", "(run: uv pip install -e ./tinker-atropos)")
+                issues.append("Install tinker-atropos: uv pip install -e ./tinker-atropos")
+        else:
+            check_warn("tinker-atropos requires Python 3.11+", f"(current: {py_version.major}.{py_version.minor})")
+    else:
+        check_warn("tinker-atropos not found", "(run: git submodule update --init --recursive)")
+    
+    # =========================================================================
+    # Check: Tool Availability
+    # =========================================================================
+    print()
+    print(color("◆ Tool Availability", Colors.CYAN, Colors.BOLD))
+    
+    try:
+        # Add project root to path for imports
+        sys.path.insert(0, str(PROJECT_ROOT))
+        from model_tools import check_tool_availability, TOOLSET_REQUIREMENTS
+        
+        available, unavailable = check_tool_availability()
+        
+        for tid in available:
+            info = TOOLSET_REQUIREMENTS.get(tid, {})
+            check_ok(info.get("name", tid))
+        
+        for item in unavailable:
+            if item["missing_vars"]:
+                vars_str = ", ".join(item["missing_vars"])
+                check_warn(item["name"], f"(missing {vars_str})")
+            else:
+                check_warn(item["name"], "(system dependency not met)")
+        
+        # Count disabled tools with API key requirements
+        api_disabled = [u for u in unavailable if u["missing_vars"]]
+        if api_disabled:
+            issues.append("Run 'hermes setup' to configure missing API keys for full tool access")
+    except Exception as e:
+        check_warn("Could not check tool availability", f"({e})")
+    
+    # =========================================================================
+    # Summary
+    # =========================================================================
+    print()
+    if issues:
+        print(color("─" * 60, Colors.YELLOW))
+        print(color(f"  Found {len(issues)} issue(s) to address:", Colors.YELLOW, Colors.BOLD))
+        print()
+        for i, issue in enumerate(issues, 1):
+            print(f"  {i}. {issue}")
+        print()
+        
+        if should_fix:
+            print(color("  Attempting auto-fix is not yet implemented.", Colors.DIM))
+            print(color("  Please resolve issues manually.", Colors.DIM))
+    else:
+        print(color("─" * 60, Colors.GREEN))
+        print(color("  All checks passed! 🎉", Colors.GREEN, Colors.BOLD))
+    
+    print()
--- a/hermes_cli/gateway.py
+++ b/hermes_cli/gateway.py
@@ -0,0 +1,491 @@
+"""
+Gateway subcommand for hermes CLI.
+
+Handles: hermes gateway [run|start|stop|restart|status|install|uninstall]
+"""
+
+import asyncio
+import os
+import signal
+import subprocess
+import sys
+from pathlib import Path
+
+PROJECT_ROOT = Path(__file__).parent.parent.resolve()
+
+
+# =============================================================================
+# Process Management (for manual gateway runs)
+# =============================================================================
+
+def find_gateway_pids() -> list:
+    """Find PIDs of running gateway processes."""
+    pids = []
+    try:
+        # Look for gateway processes with multiple patterns
+        patterns = [
+            "hermes_cli.main gateway",
+            "hermes gateway",
+            "gateway/run.py",
+        ]
+        
+        result = subprocess.run(
+            ["ps", "aux"],
+            capture_output=True,
+            text=True
+        )
+        
+        for line in result.stdout.split('\n'):
+            # Skip grep and current process
+            if 'grep' in line or str(os.getpid()) in line:
+                continue
+            
+            for pattern in patterns:
+                if pattern in line:
+                    parts = line.split()
+                    if len(parts) > 1:
+                        try:
+                            pid = int(parts[1])
+                            if pid not in pids:
+                                pids.append(pid)
+                        except ValueError:
+                            continue
+                    break
+    except Exception:
+        pass
+    
+    return pids
+
+
+def kill_gateway_processes(force: bool = False) -> int:
+    """Kill any running gateway processes. Returns count killed."""
+    pids = find_gateway_pids()
+    killed = 0
+    
+    for pid in pids:
+        try:
+            if force:
+                os.kill(pid, signal.SIGKILL)
+            else:
+                os.kill(pid, signal.SIGTERM)
+            killed += 1
+        except ProcessLookupError:
+            # Process already gone
+            pass
+        except PermissionError:
+            print(f"⚠ Permission denied to kill PID {pid}")
+    
+    return killed
+
+
+def is_linux() -> bool:
+    return sys.platform.startswith('linux')
+
+def is_macos() -> bool:
+    return sys.platform == 'darwin'
+
+def is_windows() -> bool:
+    return sys.platform == 'win32'
+
+
+# =============================================================================
+# Service Configuration
+# =============================================================================
+
+SERVICE_NAME = "hermes-gateway"
+SERVICE_DESCRIPTION = "Hermes Agent Gateway - Messaging Platform Integration"
+
+def get_systemd_unit_path() -> Path:
+    return Path.home() / ".config" / "systemd" / "user" / f"{SERVICE_NAME}.service"
+
+def get_launchd_plist_path() -> Path:
+    return Path.home() / "Library" / "LaunchAgents" / "ai.hermes.gateway.plist"
+
+def get_python_path() -> str:
+    venv_python = PROJECT_ROOT / "venv" / "bin" / "python"
+    if venv_python.exists():
+        return str(venv_python)
+    return sys.executable
+
+def get_hermes_cli_path() -> str:
+    """Get the path to the hermes CLI."""
+    # Check if installed via pip
+    import shutil
+    hermes_bin = shutil.which("hermes")
+    if hermes_bin:
+        return hermes_bin
+    
+    # Fallback to direct module execution
+    return f"{get_python_path()} -m hermes_cli.main"
+
+
+# =============================================================================
+# Systemd (Linux)
+# =============================================================================
+
+def generate_systemd_unit() -> str:
+    python_path = get_python_path()
+    working_dir = str(PROJECT_ROOT)
+    
+    return f"""[Unit]
+Description={SERVICE_DESCRIPTION}
+After=network.target
+
+[Service]
+Type=simple
+ExecStart={python_path} -m hermes_cli.main gateway run
+WorkingDirectory={working_dir}
+Restart=on-failure
+RestartSec=10
+StandardOutput=journal
+StandardError=journal
+
+[Install]
+WantedBy=default.target
+"""
+
+def systemd_install(force: bool = False):
+    unit_path = get_systemd_unit_path()
+    
+    if unit_path.exists() and not force:
+        print(f"Service already installed at: {unit_path}")
+        print("Use --force to reinstall")
+        return
+    
+    unit_path.parent.mkdir(parents=True, exist_ok=True)
+    print(f"Installing systemd service to: {unit_path}")
+    unit_path.write_text(generate_systemd_unit())
+    
+    subprocess.run(["systemctl", "--user", "daemon-reload"], check=True)
+    subprocess.run(["systemctl", "--user", "enable", SERVICE_NAME], check=True)
+    
+    print()
+    print("✓ Service installed and enabled!")
+    print()
+    print("Next steps:")
+    print(f"  hermes gateway start              # Start the service")
+    print(f"  hermes gateway status             # Check status")
+    print(f"  journalctl --user -u {SERVICE_NAME} -f  # View logs")
+    print()
+    print("To enable lingering (keeps running after logout):")
+    print("  sudo loginctl enable-linger $USER")
+
+def systemd_uninstall():
+    subprocess.run(["systemctl", "--user", "stop", SERVICE_NAME], check=False)
+    subprocess.run(["systemctl", "--user", "disable", SERVICE_NAME], check=False)
+    
+    unit_path = get_systemd_unit_path()
+    if unit_path.exists():
+        unit_path.unlink()
+        print(f"✓ Removed {unit_path}")
+    
+    subprocess.run(["systemctl", "--user", "daemon-reload"], check=True)
+    print("✓ Service uninstalled")
+
+def systemd_start():
+    subprocess.run(["systemctl", "--user", "start", SERVICE_NAME], check=True)
+    print("✓ Service started")
+
+def systemd_stop():
+    subprocess.run(["systemctl", "--user", "stop", SERVICE_NAME], check=True)
+    print("✓ Service stopped")
+
+def systemd_restart():
+    subprocess.run(["systemctl", "--user", "restart", SERVICE_NAME], check=True)
+    print("✓ Service restarted")
+
+def systemd_status(deep: bool = False):
+    # Check if service unit file exists
+    unit_path = get_systemd_unit_path()
+    if not unit_path.exists():
+        print("✗ Gateway service is not installed")
+        print("  Run: hermes gateway install")
+        return
+    
+    # Show detailed status first
+    subprocess.run(
+        ["systemctl", "--user", "status", SERVICE_NAME, "--no-pager"],
+        capture_output=False
+    )
+    
+    # Check if service is active
+    result = subprocess.run(
+        ["systemctl", "--user", "is-active", SERVICE_NAME],
+        capture_output=True,
+        text=True
+    )
+    
+    status = result.stdout.strip()
+    
+    if status == "active":
+        print("✓ Gateway service is running")
+    else:
+        print("✗ Gateway service is stopped")
+        print("  Run: hermes gateway start")
+    
+    if deep:
+        print()
+        print("Recent logs:")
+        subprocess.run([
+            "journalctl", "--user", "-u", SERVICE_NAME,
+            "-n", "20", "--no-pager"
+        ])
+
+
+# =============================================================================
+# Launchd (macOS)
+# =============================================================================
+
+def generate_launchd_plist() -> str:
+    python_path = get_python_path()
+    working_dir = str(PROJECT_ROOT)
+    log_dir = Path.home() / ".hermes" / "logs"
+    log_dir.mkdir(parents=True, exist_ok=True)
+    
+    return f"""<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
+<plist version="1.0">
+<dict>
+    <key>Label</key>
+    <string>ai.hermes.gateway</string>
+    
+    <key>ProgramArguments</key>
+    <array>
+        <string>{python_path}</string>
+        <string>-m</string>
+        <string>hermes_cli.main</string>
+        <string>gateway</string>
+        <string>run</string>
+    </array>
+    
+    <key>WorkingDirectory</key>
+    <string>{working_dir}</string>
+    
+    <key>RunAtLoad</key>
+    <true/>
+    
+    <key>KeepAlive</key>
+    <dict>
+        <key>SuccessfulExit</key>
+        <false/>
+    </dict>
+    
+    <key>StandardOutPath</key>
+    <string>{log_dir}/gateway.log</string>
+    
+    <key>StandardErrorPath</key>
+    <string>{log_dir}/gateway.error.log</string>
+</dict>
+</plist>
+"""
+
+def launchd_install(force: bool = False):
+    plist_path = get_launchd_plist_path()
+    
+    if plist_path.exists() and not force:
+        print(f"Service already installed at: {plist_path}")
+        print("Use --force to reinstall")
+        return
+    
+    plist_path.parent.mkdir(parents=True, exist_ok=True)
+    print(f"Installing launchd service to: {plist_path}")
+    plist_path.write_text(generate_launchd_plist())
+    
+    subprocess.run(["launchctl", "load", str(plist_path)], check=True)
+    
+    print()
+    print("✓ Service installed and loaded!")
+    print()
+    print("Next steps:")
+    print("  hermes gateway status             # Check status")
+    print("  tail -f ~/.hermes/logs/gateway.log  # View logs")
+
+def launchd_uninstall():
+    plist_path = get_launchd_plist_path()
+    subprocess.run(["launchctl", "unload", str(plist_path)], check=False)
+    
+    if plist_path.exists():
+        plist_path.unlink()
+        print(f"✓ Removed {plist_path}")
+    
+    print("✓ Service uninstalled")
+
+def launchd_start():
+    subprocess.run(["launchctl", "start", "ai.hermes.gateway"], check=True)
+    print("✓ Service started")
+
+def launchd_stop():
+    subprocess.run(["launchctl", "stop", "ai.hermes.gateway"], check=True)
+    print("✓ Service stopped")
+
+def launchd_restart():
+    launchd_stop()
+    launchd_start()
+
+def launchd_status(deep: bool = False):
+    result = subprocess.run(
+        ["launchctl", "list", "ai.hermes.gateway"],
+        capture_output=True,
+        text=True
+    )
+    
+    if result.returncode == 0:
+        print("✓ Gateway service is loaded")
+        print(result.stdout)
+    else:
+        print("✗ Gateway service is not loaded")
+    
+    if deep:
+        log_file = Path.home() / ".hermes" / "logs" / "gateway.log"
+        if log_file.exists():
+            print()
+            print("Recent logs:")
+            subprocess.run(["tail", "-20", str(log_file)])
+
+
+# =============================================================================
+# Gateway Runner
+# =============================================================================
+
+def run_gateway(verbose: bool = False):
+    """Run the gateway in foreground."""
+    sys.path.insert(0, str(PROJECT_ROOT))
+    
+    from gateway.run import start_gateway
+    
+    print("┌─────────────────────────────────────────────────────────┐")
+    print("│           🦋 Hermes Gateway Starting...                 │")
+    print("├─────────────────────────────────────────────────────────┤")
+    print("│  Press Ctrl+C to stop                                   │")
+    print("└─────────────────────────────────────────────────────────┘")
+    print()
+    
+    # Exit with code 1 if gateway fails to connect any platform,
+    # so systemd Restart=on-failure will retry on transient errors
+    success = asyncio.run(start_gateway())
+    if not success:
+        sys.exit(1)
+
+
+# =============================================================================
+# Main Command Handler
+# =============================================================================
+
+def gateway_command(args):
+    """Handle gateway subcommands."""
+    subcmd = getattr(args, 'gateway_command', None)
+    
+    # Default to run if no subcommand
+    if subcmd is None or subcmd == "run":
+        verbose = getattr(args, 'verbose', False)
+        run_gateway(verbose)
+        return
+    
+    # Service management commands
+    if subcmd == "install":
+        force = getattr(args, 'force', False)
+        if is_linux():
+            systemd_install(force)
+        elif is_macos():
+            launchd_install(force)
+        else:
+            print("Service installation not supported on this platform.")
+            print("Run manually: hermes gateway run")
+            sys.exit(1)
+    
+    elif subcmd == "uninstall":
+        if is_linux():
+            systemd_uninstall()
+        elif is_macos():
+            launchd_uninstall()
+        else:
+            print("Not supported on this platform.")
+            sys.exit(1)
+    
+    elif subcmd == "start":
+        if is_linux():
+            systemd_start()
+        elif is_macos():
+            launchd_start()
+        else:
+            print("Not supported on this platform.")
+            sys.exit(1)
+    
+    elif subcmd == "stop":
+        # Try service first, fall back to killing processes directly
+        service_available = False
+        
+        if is_linux() and get_systemd_unit_path().exists():
+            try:
+                systemd_stop()
+                service_available = True
+            except subprocess.CalledProcessError:
+                pass  # Fall through to process kill
+        elif is_macos() and get_launchd_plist_path().exists():
+            try:
+                launchd_stop()
+                service_available = True
+            except subprocess.CalledProcessError:
+                pass
+        
+        if not service_available:
+            # Kill gateway processes directly
+            killed = kill_gateway_processes()
+            if killed:
+                print(f"✓ Stopped {killed} gateway process(es)")
+            else:
+                print("✗ No gateway processes found")
+    
+    elif subcmd == "restart":
+        # Try service first, fall back to killing and restarting
+        service_available = False
+        
+        if is_linux() and get_systemd_unit_path().exists():
+            try:
+                systemd_restart()
+                service_available = True
+            except subprocess.CalledProcessError:
+                pass
+        elif is_macos() and get_launchd_plist_path().exists():
+            try:
+                launchd_restart()
+                service_available = True
+            except subprocess.CalledProcessError:
+                pass
+        
+        if not service_available:
+            # Manual restart: kill existing processes
+            killed = kill_gateway_processes()
+            if killed:
+                print(f"✓ Stopped {killed} gateway process(es)")
+            
+            import time
+            time.sleep(2)
+            
+            # Start fresh
+            print("Starting gateway...")
+            run_gateway(verbose=False)
+    
+    elif subcmd == "status":
+        deep = getattr(args, 'deep', False)
+        
+        # Check for service first
+        if is_linux() and get_systemd_unit_path().exists():
+            systemd_status(deep)
+        elif is_macos() and get_launchd_plist_path().exists():
+            launchd_status(deep)
+        else:
+            # Check for manually running processes
+            pids = find_gateway_pids()
+            if pids:
+                print(f"✓ Gateway is running (PID: {', '.join(map(str, pids))})")
+                print("  (Running manually, not as a system service)")
+                print()
+                print("To install as a service:")
+                print("  hermes gateway install")
+            else:
+                print("✗ Gateway is not running")
+                print()
+                print("To start:")
+                print("  hermes gateway          # Run in foreground")
+                print("  hermes gateway install  # Install as service")
--- a/hermes_cli/main.py
+++ b/hermes_cli/main.py
@@ -0,0 +1,544 @@
+#!/usr/bin/env python3
+"""
+Hermes CLI - Main entry point.
+
+Usage:
+    hermes                     # Interactive chat (default)
+    hermes chat                # Interactive chat
+    hermes gateway             # Run gateway in foreground
+    hermes gateway start       # Start gateway as service
+    hermes gateway stop        # Stop gateway service
+    hermes gateway status      # Show gateway status
+    hermes gateway install     # Install gateway service
+    hermes gateway uninstall   # Uninstall gateway service
+    hermes setup               # Interactive setup wizard
+    hermes status              # Show status of all components
+    hermes cron                # Manage cron jobs
+    hermes cron list           # List cron jobs
+    hermes cron daemon         # Run cron daemon
+    hermes doctor              # Check configuration and dependencies
+    hermes version             # Show version
+    hermes update              # Update to latest version
+    hermes uninstall           # Uninstall Hermes Agent
+"""
+
+import argparse
+import os
+import sys
+from pathlib import Path
+
+# Add project root to path
+PROJECT_ROOT = Path(__file__).parent.parent.resolve()
+sys.path.insert(0, str(PROJECT_ROOT))
+
+# Load .env file
+from dotenv import load_dotenv
+env_path = PROJECT_ROOT / '.env'
+if env_path.exists():
+    load_dotenv(dotenv_path=env_path)
+
+from hermes_cli import __version__
+
+
+def cmd_chat(args):
+    """Run interactive chat CLI."""
+    # Import and run the CLI
+    from cli import main as cli_main
+    
+    # Build kwargs from args
+    kwargs = {
+        "model": args.model,
+        "toolsets": args.toolsets,
+        "verbose": args.verbose,
+        "query": args.query,
+    }
+    # Filter out None values
+    kwargs = {k: v for k, v in kwargs.items() if v is not None}
+    
+    cli_main(**kwargs)
+
+
+def cmd_gateway(args):
+    """Gateway management commands."""
+    from hermes_cli.gateway import gateway_command
+    gateway_command(args)
+
+
+def cmd_setup(args):
+    """Interactive setup wizard."""
+    from hermes_cli.setup import run_setup_wizard
+    run_setup_wizard(args)
+
+
+def cmd_status(args):
+    """Show status of all components."""
+    from hermes_cli.status import show_status
+    show_status(args)
+
+
+def cmd_cron(args):
+    """Cron job management."""
+    from hermes_cli.cron import cron_command
+    cron_command(args)
+
+
+def cmd_doctor(args):
+    """Check configuration and dependencies."""
+    from hermes_cli.doctor import run_doctor
+    run_doctor(args)
+
+
+def cmd_config(args):
+    """Configuration management."""
+    from hermes_cli.config import config_command
+    config_command(args)
+
+
+def cmd_version(args):
+    """Show version."""
+    print(f"Hermes Agent v{__version__}")
+    print(f"Project: {PROJECT_ROOT}")
+    
+    # Show Python version
+    print(f"Python: {sys.version.split()[0]}")
+    
+    # Check for key dependencies
+    try:
+        import openai
+        print(f"OpenAI SDK: {openai.__version__}")
+    except ImportError:
+        print("OpenAI SDK: Not installed")
+
+
+def cmd_uninstall(args):
+    """Uninstall Hermes Agent."""
+    from hermes_cli.uninstall import run_uninstall
+    run_uninstall(args)
+
+
+def cmd_update(args):
+    """Update Hermes Agent to the latest version."""
+    import subprocess
+    import shutil
+    
+    print("🦋 Updating Hermes Agent...")
+    print()
+    
+    # Check if we're in a git repo
+    git_dir = PROJECT_ROOT / '.git'
+    if not git_dir.exists():
+        print("✗ Not a git repository. Please reinstall:")
+        print("  curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash")
+        sys.exit(1)
+    
+    # Fetch and pull
+    try:
+        print("→ Fetching updates...")
+        subprocess.run(["git", "fetch", "origin"], cwd=PROJECT_ROOT, check=True)
+        
+        # Get current branch
+        result = subprocess.run(
+            ["git", "rev-parse", "--abbrev-ref", "HEAD"],
+            cwd=PROJECT_ROOT,
+            capture_output=True,
+            text=True,
+            check=True
+        )
+        branch = result.stdout.strip()
+        
+        # Check if there are updates
+        result = subprocess.run(
+            ["git", "rev-list", f"HEAD..origin/{branch}", "--count"],
+            cwd=PROJECT_ROOT,
+            capture_output=True,
+            text=True,
+            check=True
+        )
+        commit_count = int(result.stdout.strip())
+        
+        if commit_count == 0:
+            print("✓ Already up to date!")
+            return
+        
+        print(f"→ Found {commit_count} new commit(s)")
+        print("→ Pulling updates...")
+        subprocess.run(["git", "pull", "origin", branch], cwd=PROJECT_ROOT, check=True)
+        
+        # Reinstall Python dependencies (prefer uv for speed, fall back to pip)
+        print("→ Updating Python dependencies...")
+        uv_bin = shutil.which("uv")
+        if uv_bin:
+            subprocess.run(
+                [uv_bin, "pip", "install", "-e", ".", "--quiet"],
+                cwd=PROJECT_ROOT, check=True,
+                env={**os.environ, "VIRTUAL_ENV": str(PROJECT_ROOT / "venv")}
+            )
+        else:
+            venv_pip = PROJECT_ROOT / "venv" / "bin" / "pip"
+            if venv_pip.exists():
+                subprocess.run([str(venv_pip), "install", "-e", ".", "--quiet"], cwd=PROJECT_ROOT, check=True)
+            else:
+                subprocess.run(["pip", "install", "-e", ".", "--quiet"], cwd=PROJECT_ROOT, check=True)
+        
+        # Check for Node.js deps
+        if (PROJECT_ROOT / "package.json").exists():
+            import shutil
+            if shutil.which("npm"):
+                print("→ Updating Node.js dependencies...")
+                subprocess.run(["npm", "install", "--silent"], cwd=PROJECT_ROOT, check=False)
+        
+        print()
+        print("✓ Code updated!")
+        
+        # Check for config migrations
+        print()
+        print("→ Checking configuration for new options...")
+        
+        from hermes_cli.config import (
+            get_missing_env_vars, get_missing_config_fields, 
+            check_config_version, migrate_config
+        )
+        
+        missing_env = get_missing_env_vars(required_only=True)
+        missing_config = get_missing_config_fields()
+        current_ver, latest_ver = check_config_version()
+        
+        needs_migration = missing_env or missing_config or current_ver < latest_ver
+        
+        if needs_migration:
+            print()
+            if missing_env:
+                print(f"  ⚠️  {len(missing_env)} new required setting(s) need configuration")
+            if missing_config:
+                print(f"  ℹ️  {len(missing_config)} new config option(s) available")
+            
+            print()
+            response = input("Would you like to configure them now? [Y/n]: ").strip().lower()
+            
+            if response in ('', 'y', 'yes'):
+                print()
+                results = migrate_config(interactive=True, quiet=False)
+                
+                if results["env_added"] or results["config_added"]:
+                    print()
+                    print("✓ Configuration updated!")
+            else:
+                print()
+                print("Skipped. Run 'hermes config migrate' later to configure.")
+        else:
+            print("  ✓ Configuration is up to date")
+        
+        print()
+        print("✓ Update complete!")
+        print()
+        print("Note: If you have the gateway service running, restart it:")
+        print("  hermes gateway restart")
+        
+    except subprocess.CalledProcessError as e:
+        print(f"✗ Update failed: {e}")
+        sys.exit(1)
+
+
+def main():
+    """Main entry point for hermes CLI."""
+    parser = argparse.ArgumentParser(
+        prog="hermes",
+        description="Hermes Agent - AI assistant with tool-calling capabilities",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Examples:
+    hermes                        Start interactive chat
+    hermes chat -q "Hello"        Single query mode
+    hermes setup                  Run setup wizard
+    hermes config                 View configuration
+    hermes config edit            Edit config in $EDITOR
+    hermes config set model gpt-4 Set a config value
+    hermes gateway                Run messaging gateway
+    hermes gateway install        Install as system service
+    hermes update                 Update to latest version
+
+For more help on a command:
+    hermes <command> --help
+"""
+    )
+    
+    parser.add_argument(
+        "--version", "-V",
+        action="store_true",
+        help="Show version and exit"
+    )
+    
+    subparsers = parser.add_subparsers(dest="command", help="Command to run")
+    
+    # =========================================================================
+    # chat command
+    # =========================================================================
+    chat_parser = subparsers.add_parser(
+        "chat",
+        help="Interactive chat with the agent",
+        description="Start an interactive chat session with Hermes Agent"
+    )
+    chat_parser.add_argument(
+        "-q", "--query",
+        help="Single query (non-interactive mode)"
+    )
+    chat_parser.add_argument(
+        "-m", "--model",
+        help="Model to use (e.g., anthropic/claude-sonnet-4)"
+    )
+    chat_parser.add_argument(
+        "-t", "--toolsets",
+        help="Comma-separated toolsets to enable"
+    )
+    chat_parser.add_argument(
+        "-v", "--verbose",
+        action="store_true",
+        help="Verbose output"
+    )
+    chat_parser.set_defaults(func=cmd_chat)
+    
+    # =========================================================================
+    # gateway command
+    # =========================================================================
+    gateway_parser = subparsers.add_parser(
+        "gateway",
+        help="Messaging gateway management",
+        description="Manage the messaging gateway (Telegram, Discord, WhatsApp)"
+    )
+    gateway_subparsers = gateway_parser.add_subparsers(dest="gateway_command")
+    
+    # gateway run (default)
+    gateway_run = gateway_subparsers.add_parser("run", help="Run gateway in foreground")
+    gateway_run.add_argument("-v", "--verbose", action="store_true")
+    
+    # gateway start
+    gateway_start = gateway_subparsers.add_parser("start", help="Start gateway service")
+    
+    # gateway stop
+    gateway_stop = gateway_subparsers.add_parser("stop", help="Stop gateway service")
+    
+    # gateway restart
+    gateway_restart = gateway_subparsers.add_parser("restart", help="Restart gateway service")
+    
+    # gateway status
+    gateway_status = gateway_subparsers.add_parser("status", help="Show gateway status")
+    gateway_status.add_argument("--deep", action="store_true", help="Deep status check")
+    
+    # gateway install
+    gateway_install = gateway_subparsers.add_parser("install", help="Install gateway as service")
+    gateway_install.add_argument("--force", action="store_true", help="Force reinstall")
+    
+    # gateway uninstall
+    gateway_uninstall = gateway_subparsers.add_parser("uninstall", help="Uninstall gateway service")
+    
+    gateway_parser.set_defaults(func=cmd_gateway)
+    
+    # =========================================================================
+    # setup command
+    # =========================================================================
+    setup_parser = subparsers.add_parser(
+        "setup",
+        help="Interactive setup wizard",
+        description="Configure Hermes Agent with an interactive wizard"
+    )
+    setup_parser.add_argument(
+        "--non-interactive",
+        action="store_true",
+        help="Non-interactive mode (use defaults/env vars)"
+    )
+    setup_parser.add_argument(
+        "--reset",
+        action="store_true",
+        help="Reset configuration to defaults"
+    )
+    setup_parser.set_defaults(func=cmd_setup)
+    
+    # =========================================================================
+    # status command
+    # =========================================================================
+    status_parser = subparsers.add_parser(
+        "status",
+        help="Show status of all components",
+        description="Display status of Hermes Agent components"
+    )
+    status_parser.add_argument(
+        "--all",
+        action="store_true",
+        help="Show all details (redacted for sharing)"
+    )
+    status_parser.add_argument(
+        "--deep",
+        action="store_true",
+        help="Run deep checks (may take longer)"
+    )
+    status_parser.set_defaults(func=cmd_status)
+    
+    # =========================================================================
+    # cron command
+    # =========================================================================
+    cron_parser = subparsers.add_parser(
+        "cron",
+        help="Cron job management",
+        description="Manage scheduled tasks"
+    )
+    cron_subparsers = cron_parser.add_subparsers(dest="cron_command")
+    
+    # cron list
+    cron_list = cron_subparsers.add_parser("list", help="List scheduled jobs")
+    cron_list.add_argument("--all", action="store_true", help="Include disabled jobs")
+    
+    # cron daemon
+    cron_daemon = cron_subparsers.add_parser("daemon", help="Run cron daemon")
+    cron_daemon.add_argument("--interval", type=int, default=60, help="Check interval in seconds")
+    
+    # cron tick
+    cron_tick = cron_subparsers.add_parser("tick", help="Run due jobs once (for system cron)")
+    
+    cron_parser.set_defaults(func=cmd_cron)
+    
+    # =========================================================================
+    # doctor command
+    # =========================================================================
+    doctor_parser = subparsers.add_parser(
+        "doctor",
+        help="Check configuration and dependencies",
+        description="Diagnose issues with Hermes Agent setup"
+    )
+    doctor_parser.add_argument(
+        "--fix",
+        action="store_true",
+        help="Attempt to fix issues automatically"
+    )
+    doctor_parser.set_defaults(func=cmd_doctor)
+    
+    # =========================================================================
+    # config command
+    # =========================================================================
+    config_parser = subparsers.add_parser(
+        "config",
+        help="View and edit configuration",
+        description="Manage Hermes Agent configuration"
+    )
+    config_subparsers = config_parser.add_subparsers(dest="config_command")
+    
+    # config show (default)
+    config_show = config_subparsers.add_parser("show", help="Show current configuration")
+    
+    # config edit
+    config_edit = config_subparsers.add_parser("edit", help="Open config file in editor")
+    
+    # config set
+    config_set = config_subparsers.add_parser("set", help="Set a configuration value")
+    config_set.add_argument("key", nargs="?", help="Configuration key (e.g., model, terminal.backend)")
+    config_set.add_argument("value", nargs="?", help="Value to set")
+    
+    # config path
+    config_path = config_subparsers.add_parser("path", help="Print config file path")
+    
+    # config env-path
+    config_env = config_subparsers.add_parser("env-path", help="Print .env file path")
+    
+    # config check
+    config_check = config_subparsers.add_parser("check", help="Check for missing/outdated config")
+    
+    # config migrate
+    config_migrate = config_subparsers.add_parser("migrate", help="Update config with new options")
+    
+    config_parser.set_defaults(func=cmd_config)
+    
+    # =========================================================================
+    # pairing command
+    # =========================================================================
+    pairing_parser = subparsers.add_parser(
+        "pairing",
+        help="Manage DM pairing codes for user authorization",
+        description="Approve or revoke user access via pairing codes"
+    )
+    pairing_sub = pairing_parser.add_subparsers(dest="pairing_action")
+
+    pairing_list_parser = pairing_sub.add_parser("list", help="Show pending + approved users")
+
+    pairing_approve_parser = pairing_sub.add_parser("approve", help="Approve a pairing code")
+    pairing_approve_parser.add_argument("platform", help="Platform name (telegram, discord, slack, whatsapp)")
+    pairing_approve_parser.add_argument("code", help="Pairing code to approve")
+
+    pairing_revoke_parser = pairing_sub.add_parser("revoke", help="Revoke user access")
+    pairing_revoke_parser.add_argument("platform", help="Platform name")
+    pairing_revoke_parser.add_argument("user_id", help="User ID to revoke")
+
+    pairing_clear_parser = pairing_sub.add_parser("clear-pending", help="Clear all pending codes")
+
+    def cmd_pairing(args):
+        from hermes_cli.pairing import pairing_command
+        pairing_command(args)
+
+    pairing_parser.set_defaults(func=cmd_pairing)
+
+    # =========================================================================
+    # version command
+    # =========================================================================
+    version_parser = subparsers.add_parser(
+        "version",
+        help="Show version information"
+    )
+    version_parser.set_defaults(func=cmd_version)
+    
+    # =========================================================================
+    # update command
+    # =========================================================================
+    update_parser = subparsers.add_parser(
+        "update",
+        help="Update Hermes Agent to the latest version",
+        description="Pull the latest changes from git and reinstall dependencies"
+    )
+    update_parser.set_defaults(func=cmd_update)
+    
+    # =========================================================================
+    # uninstall command
+    # =========================================================================
+    uninstall_parser = subparsers.add_parser(
+        "uninstall",
+        help="Uninstall Hermes Agent",
+        description="Remove Hermes Agent from your system. Can keep configs/data for reinstall."
+    )
+    uninstall_parser.add_argument(
+        "--full",
+        action="store_true",
+        help="Full uninstall - remove everything including configs and data"
+    )
+    uninstall_parser.add_argument(
+        "--yes", "-y",
+        action="store_true",
+        help="Skip confirmation prompts"
+    )
+    uninstall_parser.set_defaults(func=cmd_uninstall)
+    
+    # =========================================================================
+    # Parse and execute
+    # =========================================================================
+    args = parser.parse_args()
+    
+    # Handle --version flag
+    if args.version:
+        cmd_version(args)
+        return
+    
+    # Default to chat if no command specified
+    if args.command is None:
+        # No command = run chat
+        args.query = None
+        args.model = None
+        args.toolsets = None
+        args.verbose = False
+        cmd_chat(args)
+        return
+    
+    # Execute the command
+    if hasattr(args, 'func'):
+        args.func(args)
+    else:
+        parser.print_help()
+
+
+if __name__ == "__main__":
+    main()
--- a/hermes_cli/pairing.py
+++ b/hermes_cli/pairing.py
@@ -0,0 +1,100 @@
+"""
+CLI commands for the DM pairing system.
+
+Usage:
+    hermes pairing list              # Show all pending + approved users
+    hermes pairing approve <platform> <code>  # Approve a pairing code
+    hermes pairing revoke <platform> <user_id> # Revoke user access
+    hermes pairing clear-pending     # Clear all expired/pending codes
+"""
+
+import sys
+
+
+def pairing_command(args):
+    """Handle hermes pairing subcommands."""
+    from gateway.pairing import PairingStore
+
+    store = PairingStore()
+    action = getattr(args, "pairing_action", None)
+
+    if action == "list":
+        _cmd_list(store)
+    elif action == "approve":
+        _cmd_approve(store, args.platform, args.code)
+    elif action == "revoke":
+        _cmd_revoke(store, args.platform, args.user_id)
+    elif action == "clear-pending":
+        _cmd_clear_pending(store)
+    else:
+        print("Usage: hermes pairing {list|approve|revoke|clear-pending}")
+        print("Run 'hermes pairing --help' for details.")
+
+
+def _cmd_list(store):
+    """List all pending and approved users."""
+    pending = store.list_pending()
+    approved = store.list_approved()
+
+    if not pending and not approved:
+        print("No pairing data found. No one has tried to pair yet~")
+        return
+
+    if pending:
+        print(f"\n  Pending Pairing Requests ({len(pending)}):")
+        print(f"  {'Platform':<12} {'Code':<10} {'User ID':<20} {'Name':<20} {'Age'}")
+        print(f"  {'--------':<12} {'----':<10} {'-------':<20} {'----':<20} {'---'}")
+        for p in pending:
+            print(
+                f"  {p['platform']:<12} {p['code']:<10} {p['user_id']:<20} "
+                f"{p.get('user_name', ''):<20} {p['age_minutes']}m ago"
+            )
+    else:
+        print("\n  No pending pairing requests.")
+
+    if approved:
+        print(f"\n  Approved Users ({len(approved)}):")
+        print(f"  {'Platform':<12} {'User ID':<20} {'Name':<20}")
+        print(f"  {'--------':<12} {'-------':<20} {'----':<20}")
+        for a in approved:
+            print(f"  {a['platform']:<12} {a['user_id']:<20} {a.get('user_name', ''):<20}")
+    else:
+        print("\n  No approved users.")
+
+    print()
+
+
+def _cmd_approve(store, platform: str, code: str):
+    """Approve a pairing code."""
+    platform = platform.lower().strip()
+    code = code.upper().strip()
+
+    result = store.approve_code(platform, code)
+    if result:
+        uid = result["user_id"]
+        name = result.get("user_name", "")
+        display = f"{name} ({uid})" if name else uid
+        print(f"\n  Approved! User {display} on {platform} can now use the bot~")
+        print(f"  They'll be recognized automatically on their next message.\n")
+    else:
+        print(f"\n  Code '{code}' not found or expired for platform '{platform}'.")
+        print(f"  Run 'hermes pairing list' to see pending codes.\n")
+
+
+def _cmd_revoke(store, platform: str, user_id: str):
+    """Revoke a user's access."""
+    platform = platform.lower().strip()
+
+    if store.revoke(platform, user_id):
+        print(f"\n  Revoked access for user {user_id} on {platform}.\n")
+    else:
+        print(f"\n  User {user_id} not found in approved list for {platform}.\n")
+
+
+def _cmd_clear_pending(store):
+    """Clear all pending pairing codes."""
+    count = store.clear_pending()
+    if count:
+        print(f"\n  Cleared {count} pending pairing request(s).\n")
+    else:
+        print("\n  No pending requests to clear.\n")
--- a/hermes_cli/setup.py
+++ b/hermes_cli/setup.py
--- a/hermes_cli/status.py
+++ b/hermes_cli/status.py
@@ -0,0 +1,251 @@
+"""
+Status command for hermes CLI.
+
+Shows the status of all Hermes Agent components.
+"""
+
+import os
+import sys
+import subprocess
+from pathlib import Path
+
+PROJECT_ROOT = Path(__file__).parent.parent.resolve()
+
+# ANSI colors
+class Colors:
+    RESET = "\033[0m"
+    BOLD = "\033[1m"
+    DIM = "\033[2m"
+    RED = "\033[31m"
+    GREEN = "\033[32m"
+    YELLOW = "\033[33m"
+    CYAN = "\033[36m"
+
+def color(text: str, *codes) -> str:
+    if not sys.stdout.isatty():
+        return text
+    return "".join(codes) + text + Colors.RESET
+
+def check_mark(ok: bool) -> str:
+    if ok:
+        return color("✓", Colors.GREEN)
+    return color("✗", Colors.RED)
+
+def redact_key(key: str) -> str:
+    """Redact an API key for display."""
+    if not key:
+        return "(not set)"
+    if len(key) < 12:
+        return "***"
+    return key[:4] + "..." + key[-4:]
+
+
+def show_status(args):
+    """Show status of all Hermes Agent components."""
+    show_all = getattr(args, 'all', False)
+    deep = getattr(args, 'deep', False)
+    
+    print()
+    print(color("┌─────────────────────────────────────────────────────────┐", Colors.CYAN))
+    print(color("│                 🦋 Hermes Agent Status                  │", Colors.CYAN))
+    print(color("└─────────────────────────────────────────────────────────┘", Colors.CYAN))
+    
+    # =========================================================================
+    # Environment
+    # =========================================================================
+    print()
+    print(color("◆ Environment", Colors.CYAN, Colors.BOLD))
+    print(f"  Project:      {PROJECT_ROOT}")
+    print(f"  Python:       {sys.version.split()[0]}")
+    
+    env_path = PROJECT_ROOT / '.env'
+    print(f"  .env file:    {check_mark(env_path.exists())} {'exists' if env_path.exists() else 'not found'}")
+    
+    # =========================================================================
+    # API Keys
+    # =========================================================================
+    print()
+    print(color("◆ API Keys", Colors.CYAN, Colors.BOLD))
+    
+    keys = {
+        "OpenRouter": "OPENROUTER_API_KEY",
+        "Anthropic": "ANTHROPIC_API_KEY", 
+        "OpenAI": "OPENAI_API_KEY",
+        "Firecrawl": "FIRECRAWL_API_KEY",
+        "Browserbase": "BROWSERBASE_API_KEY",
+        "FAL": "FAL_KEY",
+        "Tinker": "TINKER_API_KEY",
+        "WandB": "WANDB_API_KEY",
+        "ElevenLabs": "ELEVENLABS_API_KEY",
+    }
+    
+    for name, env_var in keys.items():
+        value = os.getenv(env_var, "")
+        has_key = bool(value)
+        display = redact_key(value) if not show_all else value
+        print(f"  {name:<12}  {check_mark(has_key)} {display}")
+    
+    # =========================================================================
+    # Terminal Configuration
+    # =========================================================================
+    print()
+    print(color("◆ Terminal Backend", Colors.CYAN, Colors.BOLD))
+    
+    terminal_env = os.getenv("TERMINAL_ENV", "")
+    if not terminal_env:
+        # Fall back to config file value when env var isn't set
+        # (hermes status doesn't go through cli.py's config loading)
+        try:
+            from hermes_cli.config import load_config
+            _cfg = load_config()
+            terminal_env = _cfg.get("terminal", {}).get("backend", "local")
+        except Exception:
+            terminal_env = "local"
+    print(f"  Backend:      {terminal_env}")
+    
+    if terminal_env == "ssh":
+        ssh_host = os.getenv("TERMINAL_SSH_HOST", "")
+        ssh_user = os.getenv("TERMINAL_SSH_USER", "")
+        print(f"  SSH Host:     {ssh_host or '(not set)'}")
+        print(f"  SSH User:     {ssh_user or '(not set)'}")
+    elif terminal_env == "docker":
+        docker_image = os.getenv("TERMINAL_DOCKER_IMAGE", "python:3.11-slim")
+        print(f"  Docker Image: {docker_image}")
+    
+    sudo_password = os.getenv("SUDO_PASSWORD", "")
+    print(f"  Sudo:         {check_mark(bool(sudo_password))} {'enabled' if sudo_password else 'disabled'}")
+    
+    # =========================================================================
+    # Messaging Platforms
+    # =========================================================================
+    print()
+    print(color("◆ Messaging Platforms", Colors.CYAN, Colors.BOLD))
+    
+    platforms = {
+        "Telegram": ("TELEGRAM_BOT_TOKEN", "TELEGRAM_HOME_CHANNEL"),
+        "Discord": ("DISCORD_BOT_TOKEN", "DISCORD_HOME_CHANNEL"),
+        "WhatsApp": ("WHATSAPP_ENABLED", None),
+    }
+    
+    for name, (token_var, home_var) in platforms.items():
+        token = os.getenv(token_var, "")
+        has_token = bool(token)
+        
+        home_channel = ""
+        if home_var:
+            home_channel = os.getenv(home_var, "")
+        
+        status = "configured" if has_token else "not configured"
+        if home_channel:
+            status += f" (home: {home_channel})"
+        
+        print(f"  {name:<12}  {check_mark(has_token)} {status}")
+    
+    # =========================================================================
+    # Gateway Status
+    # =========================================================================
+    print()
+    print(color("◆ Gateway Service", Colors.CYAN, Colors.BOLD))
+    
+    if sys.platform.startswith('linux'):
+        result = subprocess.run(
+            ["systemctl", "--user", "is-active", "hermes-gateway"],
+            capture_output=True,
+            text=True
+        )
+        is_active = result.stdout.strip() == "active"
+        print(f"  Status:       {check_mark(is_active)} {'running' if is_active else 'stopped'}")
+        print(f"  Manager:      systemd (user)")
+        
+    elif sys.platform == 'darwin':
+        result = subprocess.run(
+            ["launchctl", "list", "ai.hermes.gateway"],
+            capture_output=True,
+            text=True
+        )
+        is_loaded = result.returncode == 0
+        print(f"  Status:       {check_mark(is_loaded)} {'loaded' if is_loaded else 'not loaded'}")
+        print(f"  Manager:      launchd")
+    else:
+        print(f"  Status:       {color('N/A', Colors.DIM)}")
+        print(f"  Manager:      (not supported on this platform)")
+    
+    # =========================================================================
+    # Cron Jobs
+    # =========================================================================
+    print()
+    print(color("◆ Scheduled Jobs", Colors.CYAN, Colors.BOLD))
+    
+    jobs_file = Path.home() / ".hermes" / "cron" / "jobs.json"
+    if jobs_file.exists():
+        import json
+        try:
+            with open(jobs_file) as f:
+                data = json.load(f)
+                jobs = data.get("jobs", [])
+                enabled_jobs = [j for j in jobs if j.get("enabled", True)]
+                print(f"  Jobs:         {len(enabled_jobs)} active, {len(jobs)} total")
+        except:
+            print(f"  Jobs:         (error reading jobs file)")
+    else:
+        print(f"  Jobs:         0")
+    
+    # =========================================================================
+    # Sessions
+    # =========================================================================
+    print()
+    print(color("◆ Sessions", Colors.CYAN, Colors.BOLD))
+    
+    sessions_file = Path.home() / ".hermes" / "sessions" / "sessions.json"
+    if sessions_file.exists():
+        import json
+        try:
+            with open(sessions_file) as f:
+                data = json.load(f)
+                print(f"  Active:       {len(data)} session(s)")
+        except:
+            print(f"  Active:       (error reading sessions file)")
+    else:
+        print(f"  Active:       0")
+    
+    # =========================================================================
+    # Deep checks
+    # =========================================================================
+    if deep:
+        print()
+        print(color("◆ Deep Checks", Colors.CYAN, Colors.BOLD))
+        
+        # Check OpenRouter connectivity
+        openrouter_key = os.getenv("OPENROUTER_API_KEY", "")
+        if openrouter_key:
+            try:
+                import httpx
+                response = httpx.get(
+                    "https://openrouter.ai/api/v1/models",
+                    headers={"Authorization": f"Bearer {openrouter_key}"},
+                    timeout=10
+                )
+                ok = response.status_code == 200
+                print(f"  OpenRouter:   {check_mark(ok)} {'reachable' if ok else f'error ({response.status_code})'}")
+            except Exception as e:
+                print(f"  OpenRouter:   {check_mark(False)} error: {e}")
+        
+        # Check gateway port
+        try:
+            import socket
+            sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+            sock.settimeout(1)
+            result = sock.connect_ex(('127.0.0.1', 18789))
+            sock.close()
+            # Port in use = gateway likely running
+            port_in_use = result == 0
+            # This is informational, not necessarily bad
+            print(f"  Port 18789:   {'in use' if port_in_use else 'available'}")
+        except:
+            pass
+    
+    print()
+    print(color("─" * 60, Colors.DIM))
+    print(color("  Run 'hermes doctor' for detailed diagnostics", Colors.DIM))
+    print(color("  Run 'hermes setup' to configure", Colors.DIM))
+    print()
--- a/hermes_cli/uninstall.py
+++ b/hermes_cli/uninstall.py
@@ -0,0 +1,341 @@
+"""
+Hermes Agent Uninstaller.
+
+Provides options for:
+- Full uninstall: Remove everything including configs and data
+- Keep data: Remove code but keep ~/.hermes/ (configs, sessions, logs)
+"""
+
+import os
+import sys
+import shutil
+import subprocess
+from pathlib import Path
+from typing import Optional
+
+# ANSI colors
+class Colors:
+    RESET = "\033[0m"
+    BOLD = "\033[1m"
+    DIM = "\033[2m"
+    RED = "\033[31m"
+    GREEN = "\033[32m"
+    YELLOW = "\033[33m"
+    BLUE = "\033[34m"
+    MAGENTA = "\033[35m"
+    CYAN = "\033[36m"
+
+def color(text: str, *codes) -> str:
+    """Apply color codes to text (only in TTY)."""
+    if not sys.stdout.isatty():
+        return text
+    return "".join(codes) + text + Colors.RESET
+
+def log_info(msg: str):
+    print(f"{color('→', Colors.CYAN)} {msg}")
+
+def log_success(msg: str):
+    print(f"{color('✓', Colors.GREEN)} {msg}")
+
+def log_warn(msg: str):
+    print(f"{color('⚠', Colors.YELLOW)} {msg}")
+
+def log_error(msg: str):
+    print(f"{color('✗', Colors.RED)} {msg}")
+
+
+def get_project_root() -> Path:
+    """Get the project installation directory."""
+    return Path(__file__).parent.parent.resolve()
+
+
+def get_hermes_home() -> Path:
+    """Get the Hermes home directory (~/.hermes)."""
+    return Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
+
+
+def find_shell_configs() -> list:
+    """Find shell configuration files that might have PATH entries."""
+    home = Path.home()
+    configs = []
+    
+    candidates = [
+        home / ".bashrc",
+        home / ".bash_profile",
+        home / ".profile",
+        home / ".zshrc",
+        home / ".zprofile",
+    ]
+    
+    for config in candidates:
+        if config.exists():
+            configs.append(config)
+    
+    return configs
+
+
+def remove_path_from_shell_configs():
+    """Remove Hermes PATH entries from shell configuration files."""
+    configs = find_shell_configs()
+    removed_from = []
+    
+    for config_path in configs:
+        try:
+            content = config_path.read_text()
+            original_content = content
+            
+            # Remove lines containing hermes-agent or hermes PATH entries
+            new_lines = []
+            skip_next = False
+            
+            for line in content.split('\n'):
+                # Skip the "# Hermes Agent" comment and following line
+                if '# Hermes Agent' in line or '# hermes-agent' in line:
+                    skip_next = True
+                    continue
+                if skip_next and ('hermes' in line.lower() and 'PATH' in line):
+                    skip_next = False
+                    continue
+                skip_next = False
+                
+                # Remove any PATH line containing hermes
+                if 'hermes' in line.lower() and ('PATH=' in line or 'path=' in line.lower()):
+                    continue
+                    
+                new_lines.append(line)
+            
+            new_content = '\n'.join(new_lines)
+            
+            # Clean up multiple blank lines
+            while '\n\n\n' in new_content:
+                new_content = new_content.replace('\n\n\n', '\n\n')
+            
+            if new_content != original_content:
+                config_path.write_text(new_content)
+                removed_from.append(config_path)
+                
+        except Exception as e:
+            log_warn(f"Could not update {config_path}: {e}")
+    
+    return removed_from
+
+
+def remove_wrapper_script():
+    """Remove the hermes wrapper script if it exists."""
+    wrapper_paths = [
+        Path.home() / ".local" / "bin" / "hermes",
+        Path("/usr/local/bin/hermes"),
+    ]
+    
+    removed = []
+    for wrapper in wrapper_paths:
+        if wrapper.exists():
+            try:
+                # Check if it's our wrapper (contains hermes_cli reference)
+                content = wrapper.read_text()
+                if 'hermes_cli' in content or 'hermes-agent' in content:
+                    wrapper.unlink()
+                    removed.append(wrapper)
+            except Exception as e:
+                log_warn(f"Could not remove {wrapper}: {e}")
+    
+    return removed
+
+
+def uninstall_gateway_service():
+    """Stop and uninstall the gateway service if running."""
+    import platform
+    
+    if platform.system() != "Linux":
+        return False
+    
+    service_file = Path.home() / ".config" / "systemd" / "user" / "hermes-gateway.service"
+    
+    if not service_file.exists():
+        return False
+    
+    try:
+        # Stop the service
+        subprocess.run(
+            ["systemctl", "--user", "stop", "hermes-gateway"],
+            capture_output=True,
+            check=False
+        )
+        
+        # Disable the service
+        subprocess.run(
+            ["systemctl", "--user", "disable", "hermes-gateway"],
+            capture_output=True,
+            check=False
+        )
+        
+        # Remove service file
+        service_file.unlink()
+        
+        # Reload systemd
+        subprocess.run(
+            ["systemctl", "--user", "daemon-reload"],
+            capture_output=True,
+            check=False
+        )
+        
+        return True
+        
+    except Exception as e:
+        log_warn(f"Could not fully remove gateway service: {e}")
+        return False
+
+
+def run_uninstall(args):
+    """
+    Run the uninstall process.
+    
+    Options:
+    - Full uninstall: removes code + ~/.hermes/ (configs, data, logs)
+    - Keep data: removes code but keeps ~/.hermes/ for future reinstall
+    """
+    project_root = get_project_root()
+    hermes_home = get_hermes_home()
+    
+    print()
+    print(color("┌─────────────────────────────────────────────────────────┐", Colors.MAGENTA, Colors.BOLD))
+    print(color("│            🦋 Hermes Agent Uninstaller                  │", Colors.MAGENTA, Colors.BOLD))
+    print(color("└─────────────────────────────────────────────────────────┘", Colors.MAGENTA, Colors.BOLD))
+    print()
+    
+    # Show what will be affected
+    print(color("Current Installation:", Colors.CYAN, Colors.BOLD))
+    print(f"  Code:    {project_root}")
+    print(f"  Config:  {hermes_home / 'config.yaml'}")
+    print(f"  Secrets: {hermes_home / '.env'}")
+    print(f"  Data:    {hermes_home / 'cron/'}, {hermes_home / 'sessions/'}, {hermes_home / 'logs/'}")
+    print()
+    
+    # Ask for confirmation
+    print(color("Uninstall Options:", Colors.YELLOW, Colors.BOLD))
+    print()
+    print("  1) " + color("Keep data", Colors.GREEN) + " - Remove code only, keep configs/sessions/logs")
+    print("     (Recommended - you can reinstall later with your settings intact)")
+    print()
+    print("  2) " + color("Full uninstall", Colors.RED) + " - Remove everything including all data")
+    print("     (Warning: This deletes all configs, sessions, and logs permanently)")
+    print()
+    print("  3) " + color("Cancel", Colors.CYAN) + " - Don't uninstall")
+    print()
+    
+    try:
+        choice = input(color("Select option [1/2/3]: ", Colors.BOLD)).strip()
+    except (KeyboardInterrupt, EOFError):
+        print()
+        print("Cancelled.")
+        return
+    
+    if choice == "3" or choice.lower() in ("c", "cancel", "q", "quit", "n", "no"):
+        print()
+        print("Uninstall cancelled.")
+        return
+    
+    full_uninstall = (choice == "2")
+    
+    # Final confirmation
+    print()
+    if full_uninstall:
+        print(color("⚠️  WARNING: This will permanently delete ALL Hermes data!", Colors.RED, Colors.BOLD))
+        print(color("   Including: configs, API keys, sessions, scheduled jobs, logs", Colors.RED))
+    else:
+        print("This will remove the Hermes code but keep your configuration and data.")
+    
+    print()
+    try:
+        confirm = input(f"Type '{color('yes', Colors.YELLOW)}' to confirm: ").strip().lower()
+    except (KeyboardInterrupt, EOFError):
+        print()
+        print("Cancelled.")
+        return
+    
+    if confirm != "yes":
+        print()
+        print("Uninstall cancelled.")
+        return
+    
+    print()
+    print(color("Uninstalling...", Colors.CYAN, Colors.BOLD))
+    print()
+    
+    # 1. Stop and uninstall gateway service
+    log_info("Checking for gateway service...")
+    if uninstall_gateway_service():
+        log_success("Gateway service stopped and removed")
+    else:
+        log_info("No gateway service found")
+    
+    # 2. Remove PATH entries from shell configs
+    log_info("Removing PATH entries from shell configs...")
+    removed_configs = remove_path_from_shell_configs()
+    if removed_configs:
+        for config in removed_configs:
+            log_success(f"Updated {config}")
+    else:
+        log_info("No PATH entries found to remove")
+    
+    # 3. Remove wrapper script
+    log_info("Removing hermes command...")
+    removed_wrappers = remove_wrapper_script()
+    if removed_wrappers:
+        for wrapper in removed_wrappers:
+            log_success(f"Removed {wrapper}")
+    else:
+        log_info("No wrapper script found")
+    
+    # 4. Remove installation directory (code)
+    log_info(f"Removing installation directory...")
+    
+    # Check if we're running from within the install dir
+    # We need to be careful here
+    try:
+        if project_root.exists():
+            # If the install is inside ~/.hermes/, just remove the hermes-agent subdir
+            if hermes_home in project_root.parents or project_root.parent == hermes_home:
+                shutil.rmtree(project_root)
+                log_success(f"Removed {project_root}")
+            else:
+                # Installation is somewhere else entirely
+                shutil.rmtree(project_root)
+                log_success(f"Removed {project_root}")
+    except Exception as e:
+        log_warn(f"Could not fully remove {project_root}: {e}")
+        log_info("You may need to manually remove it")
+    
+    # 5. Optionally remove ~/.hermes/ data directory
+    if full_uninstall:
+        log_info("Removing configuration and data...")
+        try:
+            if hermes_home.exists():
+                shutil.rmtree(hermes_home)
+                log_success(f"Removed {hermes_home}")
+        except Exception as e:
+            log_warn(f"Could not fully remove {hermes_home}: {e}")
+            log_info("You may need to manually remove it")
+    else:
+        log_info(f"Keeping configuration and data in {hermes_home}")
+    
+    # Done
+    print()
+    print(color("┌─────────────────────────────────────────────────────────┐", Colors.GREEN, Colors.BOLD))
+    print(color("│              ✓ Uninstall Complete!                      │", Colors.GREEN, Colors.BOLD))
+    print(color("└─────────────────────────────────────────────────────────┘", Colors.GREEN, Colors.BOLD))
+    print()
+    
+    if not full_uninstall:
+        print(color("Your configuration and data have been preserved:", Colors.CYAN))
+        print(f"  {hermes_home}/")
+        print()
+        print("To reinstall later with your existing settings:")
+        print(color("  curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash", Colors.DIM))
+        print()
+    
+    print(color("Reload your shell to complete the process:", Colors.YELLOW))
+    print("  source ~/.bashrc  # or ~/.zshrc")
+    print()
+    print("Thank you for using Hermes Agent! 🦋")
+    print()
--- a/model_tools.py
+++ b/model_tools.py
--- a/package-lock.json
+++ b/package-lock.json
@@ -7,9 +7,13 @@
    "": {
      "name": "hermes-agent",
      "version": "1.0.0",
-      "license": "ISC",
+      "hasInstallScript": true,
+      "license": "MIT",
      "dependencies": {
        "agent-browser": "^0.7.6"
+      },
+      "engines": {
+        "node": ">=18.0.0"
      }
    },
    "node_modules/agent-browser": {
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -22,9 +22,13 @@ dependencies = [
  "requests",
  "jinja2",
  "pydantic>=2.0",
+  # Interactive CLI (prompt_toolkit is used directly by cli.py)
+  "prompt_toolkit",
  # Tools
  "firecrawl-py",
  "fal-client",
+  # Text-to-speech (Edge TTS is free, no API key needed)
+  "edge-tts",
  # mini-swe-agent deps (terminal tool)
  "litellm>=1.75.5",
  "typer",
@@ -32,14 +36,31 @@ dependencies = [
 ]

 [project.optional-dependencies]
-modal = ["modal", "boto3"]
+modal = ["swe-rex[modal]>=1.4.0"]
 dev = ["pytest", "pytest-asyncio"]
+messaging = ["python-telegram-bot>=20.0", "discord.py>=2.0", "aiohttp>=3.9.0", "slack-bolt>=1.18.0", "slack-sdk>=3.27.0"]
+cron = ["croniter"]
+slack = ["slack-bolt>=1.18.0", "slack-sdk>=3.27.0"]
+cli = ["simple-term-menu"]
+tts-premium = ["elevenlabs"]
+pty = ["ptyprocess>=0.7.0"]
+all = [
+  "hermes-agent[modal]",
+  "hermes-agent[messaging]",
+  "hermes-agent[cron]",
+  "hermes-agent[cli]",
+  "hermes-agent[dev]",
+  "hermes-agent[tts-premium]",
+  "hermes-agent[slack]",
+  "hermes-agent[pty]",
+]

 [project.scripts]
+hermes = "hermes_cli.main:main"
 hermes-agent = "run_agent:main"

 [tool.setuptools]
-py-modules = ["run_agent", "model_tools", "toolsets", "batch_runner", "trajectory_compressor", "toolset_distributions"]
+py-modules = ["run_agent", "model_tools", "toolsets", "batch_runner", "trajectory_compressor", "toolset_distributions", "cli"]

 [tool.setuptools.packages.find]
-include = ["tools"]
+include = ["tools", "hermes_cli", "gateway", "cron"]
--- a/requirements.txt
+++ b/requirements.txt
@@ -6,6 +6,10 @@ httpx
 rich
 tenacity
 prompt_toolkit
+pyyaml
+requests
+jinja2
+pydantic>=2.0

 # Web tools
 firecrawl-py
@@ -15,10 +19,6 @@ fal-client

 # mini-swe-agent dependencies (for terminal tool)
 # Note: Install mini-swe-agent itself with: pip install -e ./mini-swe-agent
-pyyaml
-requests
-jinja2
-pydantic>=2.0
 litellm>=1.75.5
 typer
 platformdirs
@@ -27,8 +27,23 @@ platformdirs
 # Requires Docker installed and user in 'docker' group

 # Optional: For Modal backend (cloud execution)
-# modal
-# boto3
+# swe-rex[modal]>=1.4.0  # Includes modal + boto3 + swe-rex runtime

-# Optional: Legacy Hecate terminal backend
-# git+ssh://git@github.com/NousResearch/hecate.git
+# Text-to-speech (Edge TTS is free, no API key needed)
+edge-tts
+
+# Optional: Premium TTS providers
+# elevenlabs  # Uncomment if using ElevenLabs TTS (needs ELEVENLABS_API_KEY)
+
+# Optional: For cron expression parsing (cronjob scheduling)
+croniter
+
+# Optional: For messaging platform integrations (gateway)
+# Telegram
+python-telegram-bot>=20.0
+
+# Discord
+discord.py>=2.0
+
+# WhatsApp bridge communication + general async HTTP (used by gateway)
+aiohttp>=3.9.0
--- a/rl_cli.py
+++ b/rl_cli.py
@@ -0,0 +1,448 @@
+#!/usr/bin/env python3
+"""
+RL Training CLI Runner
+
+Dedicated CLI runner for RL training workflows with:
+- Extended timeouts for long-running training
+- RL-focused system prompts
+- Full toolset including RL training tools
+- Special handling for 30-minute check intervals
+
+Usage:
+    python rl_cli.py "Train a model on GSM8k for math reasoning"
+    python rl_cli.py --interactive
+    python rl_cli.py --list-environments
+
+Environment Variables:
+    TINKER_API_KEY: API key for Tinker service (required)
+    WANDB_API_KEY: API key for WandB metrics (required)
+    OPENROUTER_API_KEY: API key for OpenRouter (required for agent)
+"""
+
+import asyncio
+import os
+import sys
+from pathlib import Path
+
+import fire
+import yaml
+
+# Load environment variables from .env file
+from dotenv import load_dotenv
+
+# Load from ~/.hermes/.env first, then local .env
+hermes_env_path = Path.home() / '.hermes' / '.env'
+local_env_path = Path(__file__).parent / '.env'
+
+if hermes_env_path.exists():
+    load_dotenv(dotenv_path=hermes_env_path)
+    print(f"✅ Loaded environment variables from {hermes_env_path}")
+elif local_env_path.exists():
+    load_dotenv(dotenv_path=local_env_path)
+    print(f"✅ Loaded environment variables from {local_env_path}")
+
+# Set terminal working directory to tinker-atropos submodule
+# This ensures terminal commands run in the right context for RL work
+tinker_atropos_dir = Path(__file__).parent / 'tinker-atropos'
+if tinker_atropos_dir.exists():
+    os.environ['TERMINAL_CWD'] = str(tinker_atropos_dir)
+    os.environ['HERMES_QUIET'] = '1'  # Disable temp subdirectory creation
+    print(f"📂 Terminal working directory: {tinker_atropos_dir}")
+else:
+    # Fall back to hermes-agent directory if submodule not found
+    os.environ['TERMINAL_CWD'] = str(Path(__file__).parent)
+    os.environ['HERMES_QUIET'] = '1'
+    print(f"⚠️  tinker-atropos submodule not found, using: {Path(__file__).parent}")
+
+# Import agent and tools
+from run_agent import AIAgent
+from model_tools import get_tool_definitions, check_toolset_requirements
+from tools.rl_training_tool import check_rl_api_keys, get_missing_keys
+
+
+# ============================================================================
+# Config Loading
+# ============================================================================
+
+DEFAULT_MODEL = "anthropic/claude-opus-4.5"
+DEFAULT_BASE_URL = "https://openrouter.ai/api/v1"
+
+
+def load_hermes_config() -> dict:
+    """
+    Load configuration from ~/.hermes/config.yaml.
+    
+    Returns:
+        dict: Configuration with model, base_url, etc.
+    """
+    config_path = Path.home() / '.hermes' / 'config.yaml'
+    
+    config = {
+        "model": DEFAULT_MODEL,
+        "base_url": DEFAULT_BASE_URL,
+    }
+    
+    if config_path.exists():
+        try:
+            with open(config_path, "r") as f:
+                file_config = yaml.safe_load(f) or {}
+            
+            # Get model from config
+            if "model" in file_config:
+                if isinstance(file_config["model"], str):
+                    config["model"] = file_config["model"]
+                elif isinstance(file_config["model"], dict):
+                    config["model"] = file_config["model"].get("default", DEFAULT_MODEL)
+            
+            # Get base_url if specified
+            if "base_url" in file_config:
+                config["base_url"] = file_config["base_url"]
+                
+        except Exception as e:
+            print(f"⚠️  Warning: Failed to load config.yaml: {e}")
+    
+    return config
+
+
+# ============================================================================
+# RL-Specific Configuration
+# ============================================================================
+
+# Extended timeouts for long-running RL operations
+RL_MAX_ITERATIONS = 200  # Allow many more iterations for long workflows
+
+# RL-focused system prompt
+RL_SYSTEM_PROMPT = """You are an automated post-training engineer specializing in reinforcement learning for language models.
+
+## Your Capabilities
+
+You have access to RL training tools for running reinforcement learning on models through Tinker-Atropos:
+
+1. **DISCOVER**: Use `rl_list_environments` to see available RL environments
+2. **INSPECT**: Read environment files to understand how they work (verifiers, data loading, rewards)
+3. **INSPECT DATA**: Use terminal to explore HuggingFace datasets and understand their format
+4. **CREATE**: Copy existing environments as templates, modify for your needs
+5. **CONFIGURE**: Use `rl_select_environment` and `rl_edit_config` to set up training
+6. **TEST**: Always use `rl_test_inference` before full training to validate your setup
+7. **TRAIN**: Use `rl_start_training` to begin, `rl_check_status` to monitor
+8. **EVALUATE**: Use `rl_get_results` and analyze WandB metrics to assess performance
+
+## Environment Files
+
+Environment files are located in: `tinker-atropos/tinker_atropos/environments/`
+
+Study existing environments to learn patterns. Look for:
+- `load_dataset()` calls - how data is loaded
+- `score_answer()` / `score()` - verification logic
+- `get_next_item()` - prompt formatting
+- `system_prompt` - instruction format
+- `config_init()` - default configuration
+
+## Creating New Environments
+
+To create a new environment:
+1. Read an existing environment file (e.g., gsm8k_tinker.py)
+2. Use terminal to explore the target dataset format
+3. Copy the environment file as a template
+4. Modify the dataset loading, prompt formatting, and verifier logic
+5. Test with `rl_test_inference` before training
+
+## Important Guidelines
+
+- **Always test before training**: Training runs take hours - verify everything works first
+- **Monitor metrics**: Check WandB for reward/mean and percent_correct
+- **Status check intervals**: Wait at least 30 minutes between status checks
+- **Early stopping**: Stop training early if metrics look bad or stagnant
+- **Iterate quickly**: Start with small total_steps to validate, then scale up
+
+## Available Toolsets
+
+You have access to:
+- **RL tools**: Environment discovery, config management, training, testing
+- **Terminal**: Run commands, inspect files, explore datasets
+- **Web**: Search for information, documentation, papers
+- **File tools**: Read and modify code files
+
+When asked to train a model, follow this workflow:
+1. List available environments
+2. Select and configure the appropriate environment
+3. Test with sample prompts
+4. Start training with conservative settings
+5. Monitor progress and adjust as needed
+"""
+
+# Toolsets to enable for RL workflows
+RL_TOOLSETS = ["terminal", "web", "rl"]
+
+
+# ============================================================================
+# Helper Functions
+# ============================================================================
+
+def check_requirements():
+    """Check that all required environment variables and services are available."""
+    errors = []
+    
+    # Check API keys
+    if not os.getenv("OPENROUTER_API_KEY"):
+        errors.append("OPENROUTER_API_KEY not set - required for agent")
+    
+    missing_rl_keys = get_missing_keys()
+    if missing_rl_keys:
+        errors.append(f"Missing RL API keys: {', '.join(missing_rl_keys)}")
+    
+    if errors:
+        print("❌ Missing requirements:")
+        for error in errors:
+            print(f"   - {error}")
+        print("\nPlease set these environment variables in your .env file or shell.")
+        return False
+    
+    return True
+
+
+def check_tinker_atropos():
+    """Check if tinker-atropos submodule is properly set up."""
+    tinker_path = Path(__file__).parent / "tinker-atropos"
+    
+    if not tinker_path.exists():
+        return False, "tinker-atropos submodule not found. Run: git submodule update --init"
+    
+    envs_path = tinker_path / "tinker_atropos" / "environments"
+    if not envs_path.exists():
+        return False, f"environments directory not found at {envs_path}"
+    
+    env_files = list(envs_path.glob("*.py"))
+    env_files = [f for f in env_files if not f.name.startswith("_")]
+    
+    return True, {"path": str(tinker_path), "environments_count": len(env_files)}
+
+
+def list_environments_sync():
+    """List available environments (synchronous wrapper)."""
+    from tools.rl_training_tool import rl_list_environments
+    import json
+    
+    async def _list():
+        result = await rl_list_environments()
+        return json.loads(result)
+    
+    return asyncio.run(_list())
+
+
+# ============================================================================
+# Main CLI
+# ============================================================================
+
+def main(
+    task: str = None,
+    model: str = None,
+    api_key: str = None,
+    base_url: str = None,
+    max_iterations: int = RL_MAX_ITERATIONS,
+    interactive: bool = False,
+    list_environments: bool = False,
+    check_server: bool = False,
+    verbose: bool = False,
+    save_trajectories: bool = True,
+):
+    """
+    RL Training CLI - Dedicated runner for RL training workflows.
+    
+    Args:
+        task: The training task/goal (e.g., "Train a model on GSM8k for math")
+        model: Model to use for the agent (reads from ~/.hermes/config.yaml if not provided)
+        api_key: OpenRouter API key (uses OPENROUTER_API_KEY env var if not provided)
+        base_url: API base URL (reads from config or defaults to OpenRouter)
+        max_iterations: Maximum agent iterations (default: 200 for long workflows)
+        interactive: Run in interactive mode (multiple conversations)
+        list_environments: Just list available RL environments and exit
+        check_server: Check if RL API server is running and exit
+        verbose: Enable verbose logging
+        save_trajectories: Save conversation trajectories (default: True for RL)
+    
+    Examples:
+        # Train on a specific environment
+        python rl_cli.py "Train a model on GSM8k math problems"
+        
+        # Interactive mode
+        python rl_cli.py --interactive
+        
+        # List available environments
+        python rl_cli.py --list-environments
+        
+        # Check server status
+        python rl_cli.py --check-server
+    """
+    # Load config from ~/.hermes/config.yaml
+    config = load_hermes_config()
+    
+    # Use config values if not explicitly provided
+    if model is None:
+        model = config["model"]
+    if base_url is None:
+        base_url = config["base_url"]
+    
+    print("🎯 RL Training Agent")
+    print("=" * 60)
+    
+    # Handle setup check
+    if check_server:
+        print("\n🔍 Checking tinker-atropos setup...")
+        ok, result = check_tinker_atropos()
+        if ok:
+            print("✅ tinker-atropos submodule found")
+            print(f"   Path: {result.get('path')}")
+            print(f"   Environments found: {result.get('environments_count', 0)}")
+            
+            # Also check API keys
+            missing = get_missing_keys()
+            if missing:
+                print(f"\n⚠️  Missing API keys: {', '.join(missing)}")
+                print("   Add them to ~/.hermes/.env")
+            else:
+                print("✅ API keys configured")
+        else:
+            print(f"❌ tinker-atropos not set up: {result}")
+            print("\nTo set up:")
+            print("  git submodule update --init")
+            print("  pip install -e ./tinker-atropos")
+        return
+    
+    # Handle environment listing
+    if list_environments:
+        print("\n📋 Available RL Environments:")
+        print("-" * 40)
+        try:
+            data = list_environments_sync()
+            if "error" in data:
+                print(f"❌ Error: {data['error']}")
+                return
+            
+            envs = data.get("environments", [])
+            if not envs:
+                print("No environments found.")
+                print("\nMake sure tinker-atropos is set up:")
+                print("  git submodule update --init")
+                return
+            
+            for env in envs:
+                print(f"\n  📦 {env['name']}")
+                print(f"     Class: {env['class_name']}")
+                print(f"     Path: {env['file_path']}")
+                if env.get('description'):
+                    desc = env['description'][:100] + "..." if len(env.get('description', '')) > 100 else env.get('description', '')
+                    print(f"     Description: {desc}")
+            
+            print(f"\n📊 Total: {len(envs)} environments")
+            print("\nUse `rl_select_environment(name)` to select an environment for training.")
+        except Exception as e:
+            print(f"❌ Error listing environments: {e}")
+            print("\nMake sure tinker-atropos is set up:")
+            print("  git submodule update --init")
+            print("  pip install -e ./tinker-atropos")
+        return
+    
+    # Check requirements
+    if not check_requirements():
+        sys.exit(1)
+    
+    # Set default task if none provided
+    if not task and not interactive:
+        print("\n⚠️  No task provided. Use --interactive for interactive mode or provide a task.")
+        print("\nExamples:")
+        print('  python rl_cli.py "Train a model on GSM8k math problems"')
+        print('  python rl_cli.py "Create an RL environment for code generation"')
+        print('  python rl_cli.py --interactive')
+        return
+    
+    # Get API key
+    api_key = api_key or os.getenv("OPENROUTER_API_KEY")
+    if not api_key:
+        print("❌ No API key provided. Set OPENROUTER_API_KEY or pass --api-key")
+        sys.exit(1)
+    
+    print(f"\n🤖 Model: {model}")
+    print(f"🔧 Max iterations: {max_iterations}")
+    print(f"📁 Toolsets: {', '.join(RL_TOOLSETS)}")
+    print("=" * 60)
+    
+    # Create agent with RL configuration
+    agent = AIAgent(
+        base_url=base_url,
+        api_key=api_key,
+        model=model,
+        max_iterations=max_iterations,
+        enabled_toolsets=RL_TOOLSETS,
+        save_trajectories=save_trajectories,
+        verbose_logging=verbose,
+        quiet_mode=False,
+        ephemeral_system_prompt=RL_SYSTEM_PROMPT,
+    )
+    
+    if interactive:
+        # Interactive mode - multiple conversations
+        print("\n🔄 Interactive RL Training Mode")
+        print("Type 'quit' or 'exit' to end the session.")
+        print("Type 'status' to check active training runs.")
+        print("-" * 40)
+        
+        while True:
+            try:
+                user_input = input("\n🎯 RL Task> ").strip()
+                
+                if not user_input:
+                    continue
+                
+                if user_input.lower() in ('quit', 'exit', 'q'):
+                    print("\n👋 Goodbye!")
+                    break
+                
+                if user_input.lower() == 'status':
+                    # Quick status check
+                    from tools.rl_training_tool import rl_list_runs
+                    import json
+                    result = asyncio.run(rl_list_runs())
+                    runs = json.loads(result)
+                    if isinstance(runs, list) and runs:
+                        print("\n📊 Active Runs:")
+                        for run in runs:
+                            print(f"  - {run['run_id']}: {run['environment']} ({run['status']})")
+                    else:
+                        print("\nNo active runs.")
+                    continue
+                
+                # Run the agent
+                print("\n" + "=" * 60)
+                response = agent.run_conversation(user_input)
+                print("\n" + "=" * 60)
+                
+            except KeyboardInterrupt:
+                print("\n\n👋 Interrupted. Goodbye!")
+                break
+            except Exception as e:
+                print(f"\n❌ Error: {e}")
+                if verbose:
+                    import traceback
+                    traceback.print_exc()
+    else:
+        # Single task mode
+        print(f"\n📝 Task: {task}")
+        print("-" * 40)
+        
+        try:
+            response = agent.run_conversation(task)
+            print("\n" + "=" * 60)
+            print("✅ Task completed")
+        except KeyboardInterrupt:
+            print("\n\n⚠️ Interrupted by user")
+        except Exception as e:
+            print(f"\n❌ Error: {e}")
+            if verbose:
+                import traceback
+                traceback.print_exc()
+            sys.exit(1)
+
+
+if __name__ == "__main__":
+    fire.Fire(main)
--- a/run_agent.py
+++ b/run_agent.py
--- a/scripts/hermes-gateway
+++ b/scripts/hermes-gateway
@@ -0,0 +1,414 @@
+#!/usr/bin/env python3
+"""
+Hermes Gateway - Standalone messaging platform integration.
+
+This is the proper entry point for running the gateway as a service.
+NOT tied to the CLI - runs independently.
+
+Usage:
+    # Run in foreground (for testing)
+    ./scripts/hermes-gateway
+    
+    # Install as systemd service
+    ./scripts/hermes-gateway install
+    
+    # Manage the service
+    ./scripts/hermes-gateway start
+    ./scripts/hermes-gateway stop
+    ./scripts/hermes-gateway restart
+    ./scripts/hermes-gateway status
+    
+    # Uninstall
+    ./scripts/hermes-gateway uninstall
+"""
+
+import argparse
+import asyncio
+import os
+import subprocess
+import sys
+from pathlib import Path
+
+# Add parent directory to path
+SCRIPT_DIR = Path(__file__).parent.resolve()
+PROJECT_DIR = SCRIPT_DIR.parent
+sys.path.insert(0, str(PROJECT_DIR))
+
+# Load .env file
+from dotenv import load_dotenv
+env_path = PROJECT_DIR / '.env'
+if env_path.exists():
+    load_dotenv(dotenv_path=env_path)
+
+
+# =============================================================================
+# Service Configuration
+# =============================================================================
+
+SERVICE_NAME = "hermes-gateway"
+SERVICE_DESCRIPTION = "Hermes Agent Gateway - Messaging Platform Integration"
+
+def get_systemd_unit_path() -> Path:
+    """Get the path for the systemd user service file."""
+    return Path.home() / ".config" / "systemd" / "user" / f"{SERVICE_NAME}.service"
+
+def get_launchd_plist_path() -> Path:
+    """Get the path for the launchd plist file (macOS)."""
+    return Path.home() / "Library" / "LaunchAgents" / f"ai.hermes.gateway.plist"
+
+def get_python_path() -> str:
+    """Get the path to the Python interpreter."""
+    # Prefer the venv if it exists
+    venv_python = PROJECT_DIR / "venv" / "bin" / "python"
+    if venv_python.exists():
+        return str(venv_python)
+    return sys.executable
+
+def get_gateway_script_path() -> str:
+    """Get the path to this script."""
+    return str(Path(__file__).resolve())
+
+
+# =============================================================================
+# Systemd Service (Linux)
+# =============================================================================
+
+def generate_systemd_unit() -> str:
+    """Generate the systemd unit file content."""
+    python_path = get_python_path()
+    script_path = get_gateway_script_path()
+    working_dir = str(PROJECT_DIR)
+    
+    return f"""[Unit]
+Description={SERVICE_DESCRIPTION}
+After=network.target
+
+[Service]
+Type=simple
+ExecStart={python_path} {script_path} run
+WorkingDirectory={working_dir}
+Restart=on-failure
+RestartSec=10
+StandardOutput=journal
+StandardError=journal
+
+# Environment (optional - can also use .env file)
+# Environment="TELEGRAM_BOT_TOKEN=your_token"
+# Environment="DISCORD_BOT_TOKEN=your_token"
+
+[Install]
+WantedBy=default.target
+"""
+
+def install_systemd():
+    """Install the systemd user service."""
+    unit_path = get_systemd_unit_path()
+    unit_path.parent.mkdir(parents=True, exist_ok=True)
+    
+    print(f"Installing systemd service to: {unit_path}")
+    unit_path.write_text(generate_systemd_unit())
+    
+    # Reload systemd
+    subprocess.run(["systemctl", "--user", "daemon-reload"], check=True)
+    
+    # Enable the service (start on boot)
+    subprocess.run(["systemctl", "--user", "enable", SERVICE_NAME], check=True)
+    
+    print(f"✓ Service installed and enabled")
+    print(f"")
+    print(f"To start the service:")
+    print(f"  systemctl --user start {SERVICE_NAME}")
+    print(f"")
+    print(f"To view logs:")
+    print(f"  journalctl --user -u {SERVICE_NAME} -f")
+    print(f"")
+    print(f"To enable lingering (keeps service running after logout):")
+    print(f"  sudo loginctl enable-linger $USER")
+
+def uninstall_systemd():
+    """Uninstall the systemd user service."""
+    unit_path = get_systemd_unit_path()
+    
+    # Stop and disable first
+    subprocess.run(["systemctl", "--user", "stop", SERVICE_NAME], check=False)
+    subprocess.run(["systemctl", "--user", "disable", SERVICE_NAME], check=False)
+    
+    # Remove the unit file
+    if unit_path.exists():
+        unit_path.unlink()
+        print(f"✓ Removed {unit_path}")
+    
+    # Reload systemd
+    subprocess.run(["systemctl", "--user", "daemon-reload"], check=True)
+    print(f"✓ Service uninstalled")
+
+def systemd_status():
+    """Show systemd service status."""
+    subprocess.run(["systemctl", "--user", "status", SERVICE_NAME])
+
+def systemd_start():
+    """Start the systemd service."""
+    subprocess.run(["systemctl", "--user", "start", SERVICE_NAME], check=True)
+    print(f"✓ Service started")
+
+def systemd_stop():
+    """Stop the systemd service."""
+    subprocess.run(["systemctl", "--user", "stop", SERVICE_NAME], check=True)
+    print(f"✓ Service stopped")
+
+def systemd_restart():
+    """Restart the systemd service."""
+    subprocess.run(["systemctl", "--user", "restart", SERVICE_NAME], check=True)
+    print(f"✓ Service restarted")
+
+
+# =============================================================================
+# Launchd Service (macOS)
+# =============================================================================
+
+def generate_launchd_plist() -> str:
+    """Generate the launchd plist file content."""
+    python_path = get_python_path()
+    script_path = get_gateway_script_path()
+    working_dir = str(PROJECT_DIR)
+    log_dir = Path.home() / ".hermes" / "logs"
+    
+    return f"""<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
+<plist version="1.0">
+<dict>
+    <key>Label</key>
+    <string>ai.hermes.gateway</string>
+    
+    <key>ProgramArguments</key>
+    <array>
+        <string>{python_path}</string>
+        <string>{script_path}</string>
+        <string>run</string>
+    </array>
+    
+    <key>WorkingDirectory</key>
+    <string>{working_dir}</string>
+    
+    <key>RunAtLoad</key>
+    <true/>
+    
+    <key>KeepAlive</key>
+    <dict>
+        <key>SuccessfulExit</key>
+        <false/>
+    </dict>
+    
+    <key>StandardOutPath</key>
+    <string>{log_dir}/gateway.log</string>
+    
+    <key>StandardErrorPath</key>
+    <string>{log_dir}/gateway.error.log</string>
+    
+    <key>EnvironmentVariables</key>
+    <dict>
+        <key>PATH</key>
+        <string>/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin</string>
+    </dict>
+</dict>
+</plist>
+"""
+
+def install_launchd():
+    """Install the launchd service (macOS)."""
+    plist_path = get_launchd_plist_path()
+    plist_path.parent.mkdir(parents=True, exist_ok=True)
+    
+    # Ensure log directory exists
+    log_dir = Path.home() / ".hermes" / "logs"
+    log_dir.mkdir(parents=True, exist_ok=True)
+    
+    print(f"Installing launchd service to: {plist_path}")
+    plist_path.write_text(generate_launchd_plist())
+    
+    # Load the service
+    subprocess.run(["launchctl", "load", str(plist_path)], check=True)
+    
+    print(f"✓ Service installed and loaded")
+    print(f"")
+    print(f"To view logs:")
+    print(f"  tail -f ~/.hermes/logs/gateway.log")
+    print(f"")
+    print(f"To manage the service:")
+    print(f"  launchctl start ai.hermes.gateway")
+    print(f"  launchctl stop ai.hermes.gateway")
+
+def uninstall_launchd():
+    """Uninstall the launchd service (macOS)."""
+    plist_path = get_launchd_plist_path()
+    
+    # Unload first
+    subprocess.run(["launchctl", "unload", str(plist_path)], check=False)
+    
+    # Remove the plist file
+    if plist_path.exists():
+        plist_path.unlink()
+        print(f"✓ Removed {plist_path}")
+    
+    print(f"✓ Service uninstalled")
+
+def launchd_status():
+    """Show launchd service status."""
+    subprocess.run(["launchctl", "list", "ai.hermes.gateway"])
+
+def launchd_start():
+    """Start the launchd service."""
+    subprocess.run(["launchctl", "start", "ai.hermes.gateway"], check=True)
+    print(f"✓ Service started")
+
+def launchd_stop():
+    """Stop the launchd service."""
+    subprocess.run(["launchctl", "stop", "ai.hermes.gateway"], check=True)
+    print(f"✓ Service stopped")
+
+def launchd_restart():
+    """Restart the launchd service."""
+    launchd_stop()
+    launchd_start()
+
+
+# =============================================================================
+# Platform Detection
+# =============================================================================
+
+def is_linux() -> bool:
+    return sys.platform.startswith('linux')
+
+def is_macos() -> bool:
+    return sys.platform == 'darwin'
+
+def is_windows() -> bool:
+    return sys.platform == 'win32'
+
+
+# =============================================================================
+# Gateway Runner
+# =============================================================================
+
+def run_gateway():
+    """Run the gateway in foreground."""
+    from gateway.run import start_gateway
+    print("Starting Hermes Gateway...")
+    print("Press Ctrl+C to stop.")
+    print()
+    asyncio.run(start_gateway())
+
+
+# =============================================================================
+# Main CLI
+# =============================================================================
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Hermes Gateway - Messaging Platform Integration",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Examples:
+    # Run in foreground (for testing)
+    ./scripts/hermes-gateway run
+    
+    # Install as system service
+    ./scripts/hermes-gateway install
+    
+    # Manage the service
+    ./scripts/hermes-gateway start
+    ./scripts/hermes-gateway stop
+    ./scripts/hermes-gateway restart
+    ./scripts/hermes-gateway status
+    
+    # Uninstall
+    ./scripts/hermes-gateway uninstall
+
+Configuration:
+    Set environment variables in .env file or system environment:
+    - TELEGRAM_BOT_TOKEN
+    - DISCORD_BOT_TOKEN
+    - WHATSAPP_ENABLED
+    
+    Or create ~/.hermes/gateway.json for advanced configuration.
+"""
+    )
+    
+    parser.add_argument(
+        "command",
+        choices=["run", "install", "uninstall", "start", "stop", "restart", "status"],
+        nargs="?",
+        default="run",
+        help="Command to execute (default: run)"
+    )
+    
+    parser.add_argument(
+        "--verbose", "-v",
+        action="store_true",
+        help="Verbose output"
+    )
+    
+    args = parser.parse_args()
+    
+    # Detect platform and dispatch command
+    if args.command == "run":
+        run_gateway()
+    
+    elif args.command == "install":
+        if is_linux():
+            install_systemd()
+        elif is_macos():
+            install_launchd()
+        else:
+            print("Service installation not supported on this platform.")
+            print("Please run manually: ./scripts/hermes-gateway run")
+            sys.exit(1)
+    
+    elif args.command == "uninstall":
+        if is_linux():
+            uninstall_systemd()
+        elif is_macos():
+            uninstall_launchd()
+        else:
+            print("Service uninstallation not supported on this platform.")
+            sys.exit(1)
+    
+    elif args.command == "start":
+        if is_linux():
+            systemd_start()
+        elif is_macos():
+            launchd_start()
+        else:
+            print("Not supported on this platform.")
+            sys.exit(1)
+    
+    elif args.command == "stop":
+        if is_linux():
+            systemd_stop()
+        elif is_macos():
+            launchd_stop()
+        else:
+            print("Not supported on this platform.")
+            sys.exit(1)
+    
+    elif args.command == "restart":
+        if is_linux():
+            systemd_restart()
+        elif is_macos():
+            launchd_restart()
+        else:
+            print("Not supported on this platform.")
+            sys.exit(1)
+    
+    elif args.command == "status":
+        if is_linux():
+            systemd_status()
+        elif is_macos():
+            launchd_status()
+        else:
+            print("Not supported on this platform.")
+            sys.exit(1)
+
+
+if __name__ == "__main__":
+    main()
--- a/scripts/install.ps1
+++ b/scripts/install.ps1
@@ -0,0 +1,629 @@
+# ============================================================================
+# Hermes Agent Installer for Windows
+# ============================================================================
+# Installation script for Windows (PowerShell).
+# Uses uv for fast Python provisioning and package management.
+#
+# Usage:
+#   irm https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.ps1 | iex
+#
+# Or download and run with options:
+#   .\install.ps1 -NoVenv -SkipSetup
+#
+# ============================================================================
+
+param(
+    [switch]$NoVenv,
+    [switch]$SkipSetup,
+    [string]$Branch = "main",
+    [string]$HermesHome = "$env:USERPROFILE\.hermes",
+    [string]$InstallDir = "$env:USERPROFILE\.hermes\hermes-agent"
+)
+
+$ErrorActionPreference = "Stop"
+
+# ============================================================================
+# Configuration
+# ============================================================================
+
+$RepoUrlSsh = "git@github.com:NousResearch/hermes-agent.git"
+$RepoUrlHttps = "https://github.com/NousResearch/hermes-agent.git"
+$PythonVersion = "3.11"
+
+# ============================================================================
+# Helper functions
+# ============================================================================
+
+function Write-Banner {
+    Write-Host ""
+    Write-Host "┌─────────────────────────────────────────────────────────┐" -ForegroundColor Magenta
+    Write-Host "│             🦋 Hermes Agent Installer                   │" -ForegroundColor Magenta
+    Write-Host "├─────────────────────────────────────────────────────────┤" -ForegroundColor Magenta
+    Write-Host "│  I'm just a butterfly with a lot of tools.             │" -ForegroundColor Magenta
+    Write-Host "└─────────────────────────────────────────────────────────┘" -ForegroundColor Magenta
+    Write-Host ""
+}
+
+function Write-Info {
+    param([string]$Message)
+    Write-Host "→ $Message" -ForegroundColor Cyan
+}
+
+function Write-Success {
+    param([string]$Message)
+    Write-Host "✓ $Message" -ForegroundColor Green
+}
+
+function Write-Warn {
+    param([string]$Message)
+    Write-Host "⚠ $Message" -ForegroundColor Yellow
+}
+
+function Write-Err {
+    param([string]$Message)
+    Write-Host "✗ $Message" -ForegroundColor Red
+}
+
+# ============================================================================
+# Dependency checks
+# ============================================================================
+
+function Install-Uv {
+    Write-Info "Checking for uv package manager..."
+    
+    # Check if uv is already available
+    if (Get-Command uv -ErrorAction SilentlyContinue) {
+        $version = uv --version
+        $script:UvCmd = "uv"
+        Write-Success "uv found ($version)"
+        return $true
+    }
+    
+    # Check common install locations
+    $uvPaths = @(
+        "$env:USERPROFILE\.local\bin\uv.exe",
+        "$env:USERPROFILE\.cargo\bin\uv.exe"
+    )
+    foreach ($uvPath in $uvPaths) {
+        if (Test-Path $uvPath) {
+            $script:UvCmd = $uvPath
+            $version = & $uvPath --version
+            Write-Success "uv found at $uvPath ($version)"
+            return $true
+        }
+    }
+    
+    # Install uv
+    Write-Info "Installing uv (fast Python package manager)..."
+    try {
+        powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex" 2>&1 | Out-Null
+        
+        # Find the installed binary
+        $uvExe = "$env:USERPROFILE\.local\bin\uv.exe"
+        if (-not (Test-Path $uvExe)) {
+            $uvExe = "$env:USERPROFILE\.cargo\bin\uv.exe"
+        }
+        if (-not (Test-Path $uvExe)) {
+            # Refresh PATH and try again
+            $env:Path = [Environment]::GetEnvironmentVariable("Path", "User") + ";" + [Environment]::GetEnvironmentVariable("Path", "Machine")
+            if (Get-Command uv -ErrorAction SilentlyContinue) {
+                $uvExe = (Get-Command uv).Source
+            }
+        }
+        
+        if (Test-Path $uvExe) {
+            $script:UvCmd = $uvExe
+            $version = & $uvExe --version
+            Write-Success "uv installed ($version)"
+            return $true
+        }
+        
+        Write-Err "uv installed but not found on PATH"
+        Write-Info "Try restarting your terminal and re-running"
+        return $false
+    } catch {
+        Write-Err "Failed to install uv"
+        Write-Info "Install manually: https://docs.astral.sh/uv/getting-started/installation/"
+        return $false
+    }
+}
+
+function Test-Python {
+    Write-Info "Checking Python $PythonVersion..."
+    
+    # Let uv find or install Python
+    try {
+        $pythonPath = & $UvCmd python find $PythonVersion 2>$null
+        if ($pythonPath) {
+            $ver = & $pythonPath --version 2>$null
+            Write-Success "Python found: $ver"
+            return $true
+        }
+    } catch { }
+    
+    # Python not found — use uv to install it (no admin needed!)
+    Write-Info "Python $PythonVersion not found, installing via uv..."
+    try {
+        & $UvCmd python install $PythonVersion 2>&1 | Out-Null
+        $pythonPath = & $UvCmd python find $PythonVersion 2>$null
+        if ($pythonPath) {
+            $ver = & $pythonPath --version 2>$null
+            Write-Success "Python installed: $ver"
+            return $true
+        }
+    } catch { }
+    
+    Write-Err "Failed to install Python $PythonVersion"
+    Write-Info "Install Python $PythonVersion manually, then re-run this script"
+    return $false
+}
+
+function Test-Git {
+    Write-Info "Checking Git..."
+    
+    if (Get-Command git -ErrorAction SilentlyContinue) {
+        $version = git --version
+        Write-Success "Git found ($version)"
+        return $true
+    }
+    
+    Write-Err "Git not found"
+    Write-Info "Please install Git from:"
+    Write-Info "  https://git-scm.com/download/win"
+    return $false
+}
+
+function Test-Node {
+    Write-Info "Checking Node.js (optional, for browser tools)..."
+    
+    if (Get-Command node -ErrorAction SilentlyContinue) {
+        $version = node --version
+        Write-Success "Node.js $version found"
+        $script:HasNode = $true
+        return $true
+    }
+    
+    Write-Warn "Node.js not found (browser tools will be limited)"
+    Write-Info "To install Node.js (optional):"
+    Write-Info "  https://nodejs.org/en/download/"
+    $script:HasNode = $false
+    return $true  # Don't fail - Node is optional
+}
+
+function Test-Ripgrep {
+    Write-Info "Checking ripgrep (optional, for faster file search)..."
+    
+    if (Get-Command rg -ErrorAction SilentlyContinue) {
+        $version = rg --version | Select-Object -First 1
+        Write-Success "$version found"
+        $script:HasRipgrep = $true
+        return $true
+    }
+    
+    Write-Warn "ripgrep not found (file search will use findstr fallback)"
+    
+    # Check what package managers are available
+    $hasWinget = Get-Command winget -ErrorAction SilentlyContinue
+    $hasChoco = Get-Command choco -ErrorAction SilentlyContinue
+    $hasScoop = Get-Command scoop -ErrorAction SilentlyContinue
+    
+    # Offer to install
+    Write-Host ""
+    $response = Read-Host "Would you like to install ripgrep? (faster search, recommended) [Y/n]"
+    
+    if ($response -eq "" -or $response -match "^[Yy]") {
+        Write-Info "Installing ripgrep..."
+        
+        if ($hasWinget) {
+            try {
+                winget install BurntSushi.ripgrep.MSVC --silent 2>&1 | Out-Null
+                if ($LASTEXITCODE -eq 0) {
+                    Write-Success "ripgrep installed via winget"
+                    $script:HasRipgrep = $true
+                    return $true
+                }
+            } catch { }
+        }
+        
+        if ($hasChoco) {
+            try {
+                choco install ripgrep -y 2>&1 | Out-Null
+                if ($LASTEXITCODE -eq 0) {
+                    Write-Success "ripgrep installed via chocolatey"
+                    $script:HasRipgrep = $true
+                    return $true
+                }
+            } catch { }
+        }
+        
+        if ($hasScoop) {
+            try {
+                scoop install ripgrep 2>&1 | Out-Null
+                if ($LASTEXITCODE -eq 0) {
+                    Write-Success "ripgrep installed via scoop"
+                    $script:HasRipgrep = $true
+                    return $true
+                }
+            } catch { }
+        }
+        
+        Write-Warn "Auto-install failed. You can install manually:"
+    } else {
+        Write-Info "Skipping ripgrep installation. To install manually:"
+    }
+    
+    # Show manual install instructions
+    Write-Info "  winget install BurntSushi.ripgrep.MSVC"
+    Write-Info "  Or: choco install ripgrep"
+    Write-Info "  Or: scoop install ripgrep"
+    Write-Info "  Or download from: https://github.com/BurntSushi/ripgrep/releases"
+    
+    $script:HasRipgrep = $false
+    return $true  # Don't fail - ripgrep is optional
+}
+
+function Test-Ffmpeg {
+    Write-Info "Checking ffmpeg (optional, for TTS voice messages)..."
+    
+    if (Get-Command ffmpeg -ErrorAction SilentlyContinue) {
+        $version = ffmpeg -version 2>&1 | Select-Object -First 1
+        Write-Success "ffmpeg found"
+        $script:HasFfmpeg = $true
+        return $true
+    }
+    
+    Write-Warn "ffmpeg not found (TTS voice bubbles on Telegram will send as audio files instead)"
+    Write-Info "  Install with: winget install ffmpeg"
+    Write-Info "  Or: choco install ffmpeg"
+    Write-Info "  Or download from: https://ffmpeg.org/download.html"
+    
+    $script:HasFfmpeg = $false
+    return $true  # Don't fail - ffmpeg is optional
+}
+
+# ============================================================================
+# Installation
+# ============================================================================
+
+function Install-Repository {
+    Write-Info "Installing to $InstallDir..."
+    
+    if (Test-Path $InstallDir) {
+        if (Test-Path "$InstallDir\.git") {
+            Write-Info "Existing installation found, updating..."
+            Push-Location $InstallDir
+            git fetch origin
+            git checkout $Branch
+            git pull origin $Branch
+            Pop-Location
+        } else {
+            Write-Err "Directory exists but is not a git repository: $InstallDir"
+            Write-Info "Remove it or choose a different directory with -InstallDir"
+            exit 1
+        }
+    } else {
+        # Try SSH first (for private repo access), fall back to HTTPS
+        Write-Info "Trying SSH clone..."
+        $sshResult = git clone --branch $Branch --recurse-submodules $RepoUrlSsh $InstallDir 2>&1
+        
+        if ($LASTEXITCODE -eq 0) {
+            Write-Success "Cloned via SSH"
+        } else {
+            Write-Info "SSH failed, trying HTTPS..."
+            $httpsResult = git clone --branch $Branch --recurse-submodules $RepoUrlHttps $InstallDir 2>&1
+            
+            if ($LASTEXITCODE -eq 0) {
+                Write-Success "Cloned via HTTPS"
+            } else {
+                Write-Err "Failed to clone repository"
+                Write-Info "For private repo access, ensure your SSH key is added to GitHub:"
+                Write-Info "  ssh-add ~/.ssh/id_rsa"
+                Write-Info "  ssh -T git@github.com  # Test connection"
+                exit 1
+            }
+        }
+    }
+    
+    # Ensure submodules are initialized and updated
+    Write-Info "Initializing submodules (mini-swe-agent, tinker-atropos)..."
+    Push-Location $InstallDir
+    git submodule update --init --recursive
+    Pop-Location
+    Write-Success "Submodules ready"
+    
+    Write-Success "Repository ready"
+}
+
+function Install-Venv {
+    if ($NoVenv) {
+        Write-Info "Skipping virtual environment (-NoVenv)"
+        return
+    }
+    
+    Write-Info "Creating virtual environment with Python $PythonVersion..."
+    
+    Push-Location $InstallDir
+    
+    if (Test-Path "venv") {
+        Write-Info "Virtual environment already exists, recreating..."
+        Remove-Item -Recurse -Force "venv"
+    }
+    
+    # uv creates the venv and pins the Python version in one step
+    & $UvCmd venv venv --python $PythonVersion
+    
+    Pop-Location
+    
+    Write-Success "Virtual environment ready (Python $PythonVersion)"
+}
+
+function Install-Dependencies {
+    Write-Info "Installing dependencies..."
+    
+    Push-Location $InstallDir
+    
+    if (-not $NoVenv) {
+        # Tell uv to install into our venv (no activation needed)
+        $env:VIRTUAL_ENV = "$InstallDir\venv"
+    }
+    
+    # Install main package with all extras
+    try {
+        & $UvCmd pip install -e ".[all]" 2>&1 | Out-Null
+    } catch {
+        & $UvCmd pip install -e "." | Out-Null
+    }
+    
+    Write-Success "Main package installed"
+    
+    # Install submodules
+    Write-Info "Installing mini-swe-agent (terminal tool backend)..."
+    if (Test-Path "mini-swe-agent\pyproject.toml") {
+        try {
+            & $UvCmd pip install -e ".\mini-swe-agent" 2>&1 | Out-Null
+            Write-Success "mini-swe-agent installed"
+        } catch {
+            Write-Warn "mini-swe-agent install failed (terminal tools may not work)"
+        }
+    } else {
+        Write-Warn "mini-swe-agent not found (run: git submodule update --init)"
+    }
+    
+    Write-Info "Installing tinker-atropos (RL training backend)..."
+    if (Test-Path "tinker-atropos\pyproject.toml") {
+        try {
+            & $UvCmd pip install -e ".\tinker-atropos" 2>&1 | Out-Null
+            Write-Success "tinker-atropos installed"
+        } catch {
+            Write-Warn "tinker-atropos install failed (RL tools may not work)"
+        }
+    } else {
+        Write-Warn "tinker-atropos not found (run: git submodule update --init)"
+    }
+    
+    Pop-Location
+    
+    Write-Success "All dependencies installed"
+}
+
+function Set-PathVariable {
+    Write-Info "Setting up hermes command..."
+    
+    if ($NoVenv) {
+        $hermesBin = "$InstallDir"
+    } else {
+        $hermesBin = "$InstallDir\venv\Scripts"
+    }
+    
+    # Add the venv Scripts dir to user PATH so hermes is globally available
+    # On Windows, the hermes.exe in venv\Scripts\ has the venv Python baked in
+    $currentPath = [Environment]::GetEnvironmentVariable("Path", "User")
+    
+    if ($currentPath -notlike "*$hermesBin*") {
+        [Environment]::SetEnvironmentVariable(
+            "Path",
+            "$hermesBin;$currentPath",
+            "User"
+        )
+        Write-Success "Added to user PATH: $hermesBin"
+    } else {
+        Write-Info "PATH already configured"
+    }
+    
+    # Update current session
+    $env:Path = "$hermesBin;$env:Path"
+    
+    Write-Success "hermes command ready"
+}
+
+function Copy-ConfigTemplates {
+    Write-Info "Setting up configuration files..."
+    
+    # Create ~/.hermes directory structure
+    New-Item -ItemType Directory -Force -Path "$HermesHome\cron" | Out-Null
+    New-Item -ItemType Directory -Force -Path "$HermesHome\sessions" | Out-Null
+    New-Item -ItemType Directory -Force -Path "$HermesHome\logs" | Out-Null
+    New-Item -ItemType Directory -Force -Path "$HermesHome\pairing" | Out-Null
+    New-Item -ItemType Directory -Force -Path "$HermesHome\hooks" | Out-Null
+    New-Item -ItemType Directory -Force -Path "$HermesHome\image_cache" | Out-Null
+    New-Item -ItemType Directory -Force -Path "$HermesHome\audio_cache" | Out-Null
+    
+    # Create .env
+    $envPath = "$HermesHome\.env"
+    if (-not (Test-Path $envPath)) {
+        $examplePath = "$InstallDir\.env.example"
+        if (Test-Path $examplePath) {
+            Copy-Item $examplePath $envPath
+            Write-Success "Created ~/.hermes/.env from template"
+        } else {
+            New-Item -ItemType File -Force -Path $envPath | Out-Null
+            Write-Success "Created ~/.hermes/.env"
+        }
+    } else {
+        Write-Info "~/.hermes/.env already exists, keeping it"
+    }
+    
+    # Create config.yaml
+    $configPath = "$HermesHome\config.yaml"
+    if (-not (Test-Path $configPath)) {
+        $examplePath = "$InstallDir\cli-config.yaml.example"
+        if (Test-Path $examplePath) {
+            Copy-Item $examplePath $configPath
+            Write-Success "Created ~/.hermes/config.yaml from template"
+        }
+    } else {
+        Write-Info "~/.hermes/config.yaml already exists, keeping it"
+    }
+    
+    # Create SOUL.md if it doesn't exist (global persona file)
+    $soulPath = "$HermesHome\SOUL.md"
+    if (-not (Test-Path $soulPath)) {
+        @"
+# Hermes Agent Persona
+
+<!-- 
+This file defines the agent's personality and tone.
+The agent will embody whatever you write here.
+Edit this to customize how Hermes communicates with you.
+
+Examples:
+  - "You are a warm, playful assistant who uses kaomoji occasionally."
+  - "You are a concise technical expert. No fluff, just facts."
+  - "You speak like a friendly coworker who happens to know everything."
+
+This file is loaded fresh each message -- no restart needed.
+Delete the contents (or this file) to use the default personality.
+-->
+"@ | Set-Content -Path $soulPath -Encoding UTF8
+        Write-Success "Created ~/.hermes/SOUL.md (edit to customize personality)"
+    }
+    
+    Write-Success "Configuration directory ready: ~/.hermes/"
+}
+
+function Install-NodeDeps {
+    if (-not $HasNode) {
+        Write-Info "Skipping Node.js dependencies (Node not installed)"
+        return
+    }
+    
+    Push-Location $InstallDir
+    
+    if (Test-Path "package.json") {
+        Write-Info "Installing Node.js dependencies..."
+        try {
+            npm install --silent 2>&1 | Out-Null
+            Write-Success "Node.js dependencies installed"
+        } catch {
+            Write-Warn "npm install failed (browser tools may not work)"
+        }
+    }
+    
+    Pop-Location
+}
+
+function Invoke-SetupWizard {
+    if ($SkipSetup) {
+        Write-Info "Skipping setup wizard (-SkipSetup)"
+        return
+    }
+    
+    Write-Host ""
+    Write-Info "Starting setup wizard..."
+    Write-Host ""
+    
+    Push-Location $InstallDir
+    
+    # Run hermes setup using the venv Python directly (no activation needed)
+    if (-not $NoVenv) {
+        & ".\venv\Scripts\python.exe" -m hermes_cli.main setup
+    } else {
+        python -m hermes_cli.main setup
+    }
+    
+    Pop-Location
+}
+
+function Write-Completion {
+    Write-Host ""
+    Write-Host "┌─────────────────────────────────────────────────────────┐" -ForegroundColor Green
+    Write-Host "│              ✓ Installation Complete!                   │" -ForegroundColor Green
+    Write-Host "└─────────────────────────────────────────────────────────┘" -ForegroundColor Green
+    Write-Host ""
+    
+    # Show file locations
+    Write-Host "📁 Your files (all in ~/.hermes/):" -ForegroundColor Cyan
+    Write-Host ""
+    Write-Host "   Config:    " -NoNewline -ForegroundColor Yellow
+    Write-Host "$HermesHome\config.yaml"
+    Write-Host "   API Keys:  " -NoNewline -ForegroundColor Yellow
+    Write-Host "$HermesHome\.env"
+    Write-Host "   Data:      " -NoNewline -ForegroundColor Yellow
+    Write-Host "$HermesHome\cron\, sessions\, logs\"
+    Write-Host "   Code:      " -NoNewline -ForegroundColor Yellow
+    Write-Host "$HermesHome\hermes-agent\"
+    Write-Host ""
+    
+    Write-Host "─────────────────────────────────────────────────────────" -ForegroundColor Cyan
+    Write-Host ""
+    Write-Host "🚀 Commands:" -ForegroundColor Cyan
+    Write-Host ""
+    Write-Host "   hermes              " -NoNewline -ForegroundColor Green
+    Write-Host "Start chatting"
+    Write-Host "   hermes setup        " -NoNewline -ForegroundColor Green
+    Write-Host "Configure API keys & settings"
+    Write-Host "   hermes config       " -NoNewline -ForegroundColor Green
+    Write-Host "View/edit configuration"
+    Write-Host "   hermes config edit  " -NoNewline -ForegroundColor Green
+    Write-Host "Open config in editor"
+    Write-Host "   hermes gateway      " -NoNewline -ForegroundColor Green
+    Write-Host "Run messaging gateway"
+    Write-Host "   hermes update       " -NoNewline -ForegroundColor Green
+    Write-Host "Update to latest version"
+    Write-Host ""
+    
+    Write-Host "─────────────────────────────────────────────────────────" -ForegroundColor Cyan
+    Write-Host ""
+    Write-Host "⚡ Restart your terminal for PATH changes to take effect" -ForegroundColor Yellow
+    Write-Host ""
+    
+    if (-not $HasNode) {
+        Write-Host "Note: Node.js was not found. Browser automation tools" -ForegroundColor Yellow
+        Write-Host "will have limited functionality." -ForegroundColor Yellow
+        Write-Host ""
+    }
+    
+    if (-not $HasRipgrep) {
+        Write-Host "Note: ripgrep (rg) was not found. File search will use" -ForegroundColor Yellow
+        Write-Host "findstr as a fallback. For faster search:" -ForegroundColor Yellow
+        Write-Host "  winget install BurntSushi.ripgrep.MSVC" -ForegroundColor Yellow
+        Write-Host ""
+    }
+}
+
+# ============================================================================
+# Main
+# ============================================================================
+
+function Main {
+    Write-Banner
+    
+    if (-not (Install-Uv)) { exit 1 }
+    if (-not (Test-Python)) { exit 1 }
+    if (-not (Test-Git)) { exit 1 }
+    Test-Node      # Optional, doesn't fail
+    Test-Ripgrep   # Optional, doesn't fail
+    Test-Ffmpeg    # Optional, doesn't fail
+    
+    Install-Repository
+    Install-Venv
+    Install-Dependencies
+    Install-NodeDeps
+    Set-PathVariable
+    Copy-ConfigTemplates
+    Invoke-SetupWizard
+    
+    Write-Completion
+}
+
+Main
--- a/scripts/install.sh
+++ b/scripts/install.sh
@@ -0,0 +1,786 @@
+#!/bin/bash
+# ============================================================================
+# Hermes Agent Installer
+# ============================================================================
+# Installation script for Linux and macOS.
+# Uses uv for fast Python provisioning and package management.
+#
+# Usage:
+#   curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
+#
+# Or with options:
+#   curl -fsSL ... | bash -s -- --no-venv --skip-setup
+#
+# ============================================================================
+
+set -e
+
+# Colors
+RED='\033[0;31m'
+GREEN='\033[0;32m'
+YELLOW='\033[0;33m'
+BLUE='\033[0;34m'
+MAGENTA='\033[0;35m'
+CYAN='\033[0;36m'
+NC='\033[0m' # No Color
+BOLD='\033[1m'
+
+# Configuration
+REPO_URL_SSH="git@github.com:NousResearch/hermes-agent.git"
+REPO_URL_HTTPS="https://github.com/NousResearch/hermes-agent.git"
+HERMES_HOME="$HOME/.hermes"
+INSTALL_DIR="${HERMES_INSTALL_DIR:-$HERMES_HOME/hermes-agent}"
+PYTHON_VERSION="3.11"
+
+# Options
+USE_VENV=true
+RUN_SETUP=true
+BRANCH="main"
+
+# Parse arguments
+while [[ $# -gt 0 ]]; do
+    case $1 in
+        --no-venv)
+            USE_VENV=false
+            shift
+            ;;
+        --skip-setup)
+            RUN_SETUP=false
+            shift
+            ;;
+        --branch)
+            BRANCH="$2"
+            shift 2
+            ;;
+        --dir)
+            INSTALL_DIR="$2"
+            shift 2
+            ;;
+        -h|--help)
+            echo "Hermes Agent Installer"
+            echo ""
+            echo "Usage: install.sh [OPTIONS]"
+            echo ""
+            echo "Options:"
+            echo "  --no-venv      Don't create virtual environment"
+            echo "  --skip-setup   Skip interactive setup wizard"
+            echo "  --branch NAME  Git branch to install (default: main)"
+            echo "  --dir PATH     Installation directory (default: ~/.hermes/hermes-agent)"
+            echo "  -h, --help     Show this help"
+            exit 0
+            ;;
+        *)
+            echo "Unknown option: $1"
+            exit 1
+            ;;
+    esac
+done
+
+# ============================================================================
+# Helper functions
+# ============================================================================
+
+print_banner() {
+    echo ""
+    echo -e "${MAGENTA}${BOLD}"
+    echo "┌─────────────────────────────────────────────────────────┐"
+    echo "│             🦋 Hermes Agent Installer                   │"
+    echo "├─────────────────────────────────────────────────────────┤"
+    echo "│  I'm just a butterfly with a lot of tools.             │"
+    echo "└─────────────────────────────────────────────────────────┘"
+    echo -e "${NC}"
+}
+
+log_info() {
+    echo -e "${CYAN}→${NC} $1"
+}
+
+log_success() {
+    echo -e "${GREEN}✓${NC} $1"
+}
+
+log_warn() {
+    echo -e "${YELLOW}⚠${NC} $1"
+}
+
+log_error() {
+    echo -e "${RED}✗${NC} $1"
+}
+
+# ============================================================================
+# System detection
+# ============================================================================
+
+detect_os() {
+    case "$(uname -s)" in
+        Linux*)
+            OS="linux"
+            if [ -f /etc/os-release ]; then
+                . /etc/os-release
+                DISTRO="$ID"
+            else
+                DISTRO="unknown"
+            fi
+            ;;
+        Darwin*)
+            OS="macos"
+            DISTRO="macos"
+            ;;
+        CYGWIN*|MINGW*|MSYS*)
+            OS="windows"
+            DISTRO="windows"
+            log_error "Windows detected. Please use the PowerShell installer:"
+            log_info "  irm https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.ps1 | iex"
+            exit 1
+            ;;
+        *)
+            OS="unknown"
+            DISTRO="unknown"
+            log_warn "Unknown operating system"
+            ;;
+    esac
+    
+    log_success "Detected: $OS ($DISTRO)"
+}
+
+# ============================================================================
+# Dependency checks
+# ============================================================================
+
+install_uv() {
+    log_info "Checking for uv package manager..."
+    
+    # Check common locations for uv
+    if command -v uv &> /dev/null; then
+        UV_CMD="uv"
+        UV_VERSION=$($UV_CMD --version 2>/dev/null)
+        log_success "uv found ($UV_VERSION)"
+        return 0
+    fi
+    
+    # Check ~/.local/bin (default uv install location) even if not on PATH yet
+    if [ -x "$HOME/.local/bin/uv" ]; then
+        UV_CMD="$HOME/.local/bin/uv"
+        UV_VERSION=$($UV_CMD --version 2>/dev/null)
+        log_success "uv found at ~/.local/bin ($UV_VERSION)"
+        return 0
+    fi
+    
+    # Check ~/.cargo/bin (alternative uv install location)
+    if [ -x "$HOME/.cargo/bin/uv" ]; then
+        UV_CMD="$HOME/.cargo/bin/uv"
+        UV_VERSION=$($UV_CMD --version 2>/dev/null)
+        log_success "uv found at ~/.cargo/bin ($UV_VERSION)"
+        return 0
+    fi
+    
+    # Install uv
+    log_info "Installing uv (fast Python package manager)..."
+    if curl -LsSf https://astral.sh/uv/install.sh | sh 2>/dev/null; then
+        # uv installs to ~/.local/bin by default
+        if [ -x "$HOME/.local/bin/uv" ]; then
+            UV_CMD="$HOME/.local/bin/uv"
+        elif [ -x "$HOME/.cargo/bin/uv" ]; then
+            UV_CMD="$HOME/.cargo/bin/uv"
+        elif command -v uv &> /dev/null; then
+            UV_CMD="uv"
+        else
+            log_error "uv installed but not found on PATH"
+            log_info "Try adding ~/.local/bin to your PATH and re-running"
+            exit 1
+        fi
+        UV_VERSION=$($UV_CMD --version 2>/dev/null)
+        log_success "uv installed ($UV_VERSION)"
+    else
+        log_error "Failed to install uv"
+        log_info "Install manually: https://docs.astral.sh/uv/getting-started/installation/"
+        exit 1
+    fi
+}
+
+check_python() {
+    log_info "Checking Python $PYTHON_VERSION..."
+    
+    # Let uv handle Python — it can download and manage Python versions
+    # First check if a suitable Python is already available
+    if $UV_CMD python find "$PYTHON_VERSION" &> /dev/null; then
+        PYTHON_PATH=$($UV_CMD python find "$PYTHON_VERSION")
+        PYTHON_FOUND_VERSION=$($PYTHON_PATH --version 2>/dev/null)
+        log_success "Python found: $PYTHON_FOUND_VERSION"
+        return 0
+    fi
+    
+    # Python not found — use uv to install it (no sudo needed!)
+    log_info "Python $PYTHON_VERSION not found, installing via uv..."
+    if $UV_CMD python install "$PYTHON_VERSION"; then
+        PYTHON_PATH=$($UV_CMD python find "$PYTHON_VERSION")
+        PYTHON_FOUND_VERSION=$($PYTHON_PATH --version 2>/dev/null)
+        log_success "Python installed: $PYTHON_FOUND_VERSION"
+    else
+        log_error "Failed to install Python $PYTHON_VERSION"
+        log_info "Install Python $PYTHON_VERSION manually, then re-run this script"
+        exit 1
+    fi
+}
+
+check_git() {
+    log_info "Checking Git..."
+    
+    if command -v git &> /dev/null; then
+        GIT_VERSION=$(git --version | awk '{print $3}')
+        log_success "Git $GIT_VERSION found"
+        return 0
+    fi
+    
+    log_error "Git not found"
+    log_info "Please install Git:"
+    
+    case "$OS" in
+        linux)
+            case "$DISTRO" in
+                ubuntu|debian)
+                    log_info "  sudo apt update && sudo apt install git"
+                    ;;
+                fedora)
+                    log_info "  sudo dnf install git"
+                    ;;
+                arch)
+                    log_info "  sudo pacman -S git"
+                    ;;
+                *)
+                    log_info "  Use your package manager to install git"
+                    ;;
+            esac
+            ;;
+        macos)
+            log_info "  xcode-select --install"
+            log_info "  Or: brew install git"
+            ;;
+    esac
+    
+    exit 1
+}
+
+check_node() {
+    log_info "Checking Node.js (optional, for browser tools)..."
+    
+    if command -v node &> /dev/null; then
+        NODE_VERSION=$(node --version)
+        log_success "Node.js $NODE_VERSION found"
+        HAS_NODE=true
+        return 0
+    fi
+    
+    log_warn "Node.js not found (browser tools will be limited)"
+    log_info "To install Node.js (optional):"
+    
+    case "$OS" in
+        linux)
+            case "$DISTRO" in
+                ubuntu|debian)
+                    log_info "  curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -"
+                    log_info "  sudo apt install -y nodejs"
+                    ;;
+                fedora)
+                    log_info "  sudo dnf install nodejs"
+                    ;;
+                arch)
+                    log_info "  sudo pacman -S nodejs npm"
+                    ;;
+                *)
+                    log_info "  https://nodejs.org/en/download/"
+                    ;;
+            esac
+            ;;
+        macos)
+            log_info "  brew install node"
+            log_info "  Or: https://nodejs.org/en/download/"
+            ;;
+    esac
+    
+    HAS_NODE=false
+    # Don't exit - Node is optional
+}
+
+check_ripgrep() {
+    log_info "Checking ripgrep (optional, for faster file search)..."
+    
+    if command -v rg &> /dev/null; then
+        RG_VERSION=$(rg --version | head -1)
+        log_success "$RG_VERSION found"
+        HAS_RIPGREP=true
+        return 0
+    fi
+    
+    log_warn "ripgrep not found (file search will use grep fallback)"
+    
+    # Offer to install
+    echo ""
+    read -p "Would you like to install ripgrep? (faster search, recommended) [Y/n] " -n 1 -r
+    echo
+    
+    if [[ $REPLY =~ ^[Yy]$ ]] || [[ -z $REPLY ]]; then
+        log_info "Installing ripgrep..."
+        
+        # Check if we can use sudo
+        CAN_SUDO=false
+        if command -v sudo &> /dev/null; then
+            if sudo -n true 2>/dev/null || sudo -v 2>/dev/null; then
+                CAN_SUDO=true
+            fi
+        fi
+        
+        case "$OS" in
+            linux)
+                if [ "$CAN_SUDO" = true ]; then
+                    case "$DISTRO" in
+                        ubuntu|debian)
+                            if sudo apt install -y ripgrep 2>/dev/null; then
+                                log_success "ripgrep installed"
+                                HAS_RIPGREP=true
+                                return 0
+                            fi
+                            ;;
+                        fedora)
+                            if sudo dnf install -y ripgrep 2>/dev/null; then
+                                log_success "ripgrep installed"
+                                HAS_RIPGREP=true
+                                return 0
+                            fi
+                            ;;
+                        arch)
+                            if sudo pacman -S --noconfirm ripgrep 2>/dev/null; then
+                                log_success "ripgrep installed"
+                                HAS_RIPGREP=true
+                                return 0
+                            fi
+                            ;;
+                    esac
+                else
+                    log_warn "sudo not available - cannot auto-install system packages"
+                    if command -v cargo &> /dev/null; then
+                        log_info "Trying cargo install (no sudo required)..."
+                        if cargo install ripgrep 2>/dev/null; then
+                            log_success "ripgrep installed via cargo"
+                            HAS_RIPGREP=true
+                            return 0
+                        fi
+                    fi
+                fi
+                ;;
+            macos)
+                if command -v brew &> /dev/null; then
+                    if brew install ripgrep 2>/dev/null; then
+                        log_success "ripgrep installed"
+                        HAS_RIPGREP=true
+                        return 0
+                    fi
+                fi
+                ;;
+        esac
+        log_warn "Auto-install failed. You can install manually later:"
+    else
+        log_info "Skipping ripgrep installation. To install manually:"
+    fi
+    
+    # Show manual install instructions
+    case "$OS" in
+        linux)
+            case "$DISTRO" in
+                ubuntu|debian)
+                    log_info "  sudo apt install ripgrep"
+                    ;;
+                fedora)
+                    log_info "  sudo dnf install ripgrep"
+                    ;;
+                arch)
+                    log_info "  sudo pacman -S ripgrep"
+                    ;;
+                *)
+                    log_info "  https://github.com/BurntSushi/ripgrep#installation"
+                    ;;
+            esac
+            if command -v cargo &> /dev/null; then
+                log_info "  Or without sudo: cargo install ripgrep"
+            fi
+            ;;
+        macos)
+            log_info "  brew install ripgrep"
+            ;;
+    esac
+    
+    HAS_RIPGREP=false
+    # Don't exit - ripgrep is optional (grep fallback exists)
+}
+
+check_ffmpeg() {
+    log_info "Checking ffmpeg (optional, for TTS voice messages)..."
+    
+    if command -v ffmpeg &> /dev/null; then
+        local ffmpeg_version=$(ffmpeg -version 2>/dev/null | head -1 | awk '{print $3}')
+        log_success "ffmpeg found: $ffmpeg_version"
+        HAS_FFMPEG=true
+        return
+    fi
+    
+    log_warn "ffmpeg not found"
+    log_info "ffmpeg is needed for Telegram voice bubbles when using the default Edge TTS provider."
+    log_info "Without it, Edge TTS audio is sent as a file instead of a voice bubble."
+    log_info "(OpenAI and ElevenLabs TTS produce Opus natively and don't need ffmpeg.)"
+    log_info ""
+    log_info "To install ffmpeg:"
+    
+    case "$OS" in
+        linux)
+            case "$DISTRO" in
+                ubuntu|debian)
+                    log_info "  sudo apt install ffmpeg"
+                    ;;
+                fedora)
+                    log_info "  sudo dnf install ffmpeg"
+                    ;;
+                arch)
+                    log_info "  sudo pacman -S ffmpeg"
+                    ;;
+                *)
+                    log_info "  https://ffmpeg.org/download.html"
+                    ;;
+            esac
+            ;;
+        macos)
+            log_info "  brew install ffmpeg"
+            ;;
+    esac
+    
+    HAS_FFMPEG=false
+    # Don't exit - ffmpeg is optional
+}
+
+# ============================================================================
+# Installation
+# ============================================================================
+
+clone_repo() {
+    log_info "Installing to $INSTALL_DIR..."
+    
+    if [ -d "$INSTALL_DIR" ]; then
+        if [ -d "$INSTALL_DIR/.git" ]; then
+            log_info "Existing installation found, updating..."
+            cd "$INSTALL_DIR"
+            git fetch origin
+            git checkout "$BRANCH"
+            git pull origin "$BRANCH"
+        else
+            log_error "Directory exists but is not a git repository: $INSTALL_DIR"
+            log_info "Remove it or choose a different directory with --dir"
+            exit 1
+        fi
+    else
+        # Try SSH first (for private repo access), fall back to HTTPS
+        # Use --recurse-submodules to also clone mini-swe-agent and tinker-atropos
+        log_info "Trying SSH clone..."
+        if git clone --branch "$BRANCH" --recurse-submodules "$REPO_URL_SSH" "$INSTALL_DIR" 2>/dev/null; then
+            log_success "Cloned via SSH"
+        else
+            log_info "SSH failed, trying HTTPS..."
+            if git clone --branch "$BRANCH" --recurse-submodules "$REPO_URL_HTTPS" "$INSTALL_DIR"; then
+                log_success "Cloned via HTTPS"
+            else
+                log_error "Failed to clone repository"
+                log_info "For private repo access, ensure your SSH key is added to GitHub:"
+                log_info "  ssh-add ~/.ssh/id_rsa"
+                log_info "  ssh -T git@github.com  # Test connection"
+                exit 1
+            fi
+        fi
+    fi
+    
+    cd "$INSTALL_DIR"
+    
+    # Ensure submodules are initialized and updated (for existing installs or if --recurse failed)
+    log_info "Initializing submodules (mini-swe-agent, tinker-atropos)..."
+    git submodule update --init --recursive
+    log_success "Submodules ready"
+    
+    log_success "Repository ready"
+}
+
+setup_venv() {
+    if [ "$USE_VENV" = false ]; then
+        log_info "Skipping virtual environment (--no-venv)"
+        return 0
+    fi
+    
+    log_info "Creating virtual environment with Python $PYTHON_VERSION..."
+    
+    if [ -d "venv" ]; then
+        log_info "Virtual environment already exists, recreating..."
+        rm -rf venv
+    fi
+    
+    # uv creates the venv and pins the Python version in one step
+    $UV_CMD venv venv --python "$PYTHON_VERSION"
+    
+    log_success "Virtual environment ready (Python $PYTHON_VERSION)"
+}
+
+install_deps() {
+    log_info "Installing dependencies..."
+    
+    if [ "$USE_VENV" = true ]; then
+        # Tell uv to install into our venv (no need to activate)
+        export VIRTUAL_ENV="$INSTALL_DIR/venv"
+    fi
+    
+    # Install the main package in editable mode with all extras
+    $UV_CMD pip install -e ".[all]" || $UV_CMD pip install -e "."
+    
+    log_success "Main package installed"
+    
+    # Install submodules
+    log_info "Installing mini-swe-agent (terminal tool backend)..."
+    if [ -d "mini-swe-agent" ] && [ -f "mini-swe-agent/pyproject.toml" ]; then
+        $UV_CMD pip install -e "./mini-swe-agent" || log_warn "mini-swe-agent install failed (terminal tools may not work)"
+        log_success "mini-swe-agent installed"
+    else
+        log_warn "mini-swe-agent not found (run: git submodule update --init)"
+    fi
+    
+    log_info "Installing tinker-atropos (RL training backend)..."
+    if [ -d "tinker-atropos" ] && [ -f "tinker-atropos/pyproject.toml" ]; then
+        $UV_CMD pip install -e "./tinker-atropos" || log_warn "tinker-atropos install failed (RL tools may not work)"
+        log_success "tinker-atropos installed"
+    else
+        log_warn "tinker-atropos not found (run: git submodule update --init)"
+    fi
+    
+    log_success "All dependencies installed"
+}
+
+setup_path() {
+    log_info "Setting up hermes command..."
+    
+    if [ "$USE_VENV" = true ]; then
+        HERMES_BIN="$INSTALL_DIR/venv/bin/hermes"
+    else
+        HERMES_BIN="$(which hermes 2>/dev/null || echo "")"
+        if [ -z "$HERMES_BIN" ]; then
+            log_warn "hermes not found on PATH after install"
+            return 0
+        fi
+    fi
+    
+    # Create symlink in ~/.local/bin (standard user binary location, usually on PATH)
+    mkdir -p "$HOME/.local/bin"
+    ln -sf "$HERMES_BIN" "$HOME/.local/bin/hermes"
+    log_success "Symlinked hermes → ~/.local/bin/hermes"
+    
+    # Check if ~/.local/bin is on PATH; if not, add it to shell config
+    if ! echo "$PATH" | tr ':' '\n' | grep -q "^$HOME/.local/bin$"; then
+        SHELL_CONFIG=""
+        if [ -n "$BASH_VERSION" ]; then
+            if [ -f "$HOME/.bashrc" ]; then
+                SHELL_CONFIG="$HOME/.bashrc"
+            elif [ -f "$HOME/.bash_profile" ]; then
+                SHELL_CONFIG="$HOME/.bash_profile"
+            fi
+        elif [ -n "$ZSH_VERSION" ] || [ -f "$HOME/.zshrc" ]; then
+            SHELL_CONFIG="$HOME/.zshrc"
+        fi
+        
+        PATH_LINE='export PATH="$HOME/.local/bin:$PATH"'
+        
+        if [ -n "$SHELL_CONFIG" ]; then
+            if ! grep -q '\.local/bin' "$SHELL_CONFIG" 2>/dev/null; then
+                echo "" >> "$SHELL_CONFIG"
+                echo "# Hermes Agent — ensure ~/.local/bin is on PATH" >> "$SHELL_CONFIG"
+                echo "$PATH_LINE" >> "$SHELL_CONFIG"
+                log_success "Added ~/.local/bin to PATH in $SHELL_CONFIG"
+            else
+                log_info "~/.local/bin already referenced in $SHELL_CONFIG"
+            fi
+        fi
+    else
+        log_info "~/.local/bin already on PATH"
+    fi
+    
+    # Export for current session so hermes works immediately
+    export PATH="$HOME/.local/bin:$PATH"
+    
+    log_success "hermes command ready"
+}
+
+copy_config_templates() {
+    log_info "Setting up configuration files..."
+    
+    # Create ~/.hermes directory structure (config at top level, code in subdir)
+    mkdir -p "$HERMES_HOME"/{cron,sessions,logs,pairing,hooks,image_cache,audio_cache}
+    
+    # Create .env at ~/.hermes/.env (top level, easy to find)
+    if [ ! -f "$HERMES_HOME/.env" ]; then
+        if [ -f "$INSTALL_DIR/.env.example" ]; then
+            cp "$INSTALL_DIR/.env.example" "$HERMES_HOME/.env"
+            log_success "Created ~/.hermes/.env from template"
+        else
+            touch "$HERMES_HOME/.env"
+            log_success "Created ~/.hermes/.env"
+        fi
+    else
+        log_info "~/.hermes/.env already exists, keeping it"
+    fi
+    
+    # Create config.yaml at ~/.hermes/config.yaml (top level, easy to find)
+    if [ ! -f "$HERMES_HOME/config.yaml" ]; then
+        if [ -f "$INSTALL_DIR/cli-config.yaml.example" ]; then
+            cp "$INSTALL_DIR/cli-config.yaml.example" "$HERMES_HOME/config.yaml"
+            log_success "Created ~/.hermes/config.yaml from template"
+        fi
+    else
+        log_info "~/.hermes/config.yaml already exists, keeping it"
+    fi
+    
+    # Create SOUL.md if it doesn't exist (global persona file)
+    if [ ! -f "$HERMES_HOME/SOUL.md" ]; then
+        cat > "$HERMES_HOME/SOUL.md" << 'SOUL_EOF'
+# Hermes Agent Persona
+
+<!-- 
+This file defines the agent's personality and tone.
+The agent will embody whatever you write here.
+Edit this to customize how Hermes communicates with you.
+
+Examples:
+  - "You are a warm, playful assistant who uses kaomoji occasionally."
+  - "You are a concise technical expert. No fluff, just facts."
+  - "You speak like a friendly coworker who happens to know everything."
+
+This file is loaded fresh each message -- no restart needed.
+Delete the contents (or this file) to use the default personality.
+-->
+SOUL_EOF
+        log_success "Created ~/.hermes/SOUL.md (edit to customize personality)"
+    fi
+    
+    log_success "Configuration directory ready: ~/.hermes/"
+}
+
+install_node_deps() {
+    if [ "$HAS_NODE" = false ]; then
+        log_info "Skipping Node.js dependencies (Node not installed)"
+        return 0
+    fi
+    
+    if [ -f "$INSTALL_DIR/package.json" ]; then
+        log_info "Installing Node.js dependencies..."
+        cd "$INSTALL_DIR"
+        npm install --silent 2>/dev/null || {
+            log_warn "npm install failed (browser tools may not work)"
+            return 0
+        }
+        log_success "Node.js dependencies installed"
+    fi
+}
+
+run_setup_wizard() {
+    if [ "$RUN_SETUP" = false ]; then
+        log_info "Skipping setup wizard (--skip-setup)"
+        return 0
+    fi
+    
+    echo ""
+    log_info "Starting setup wizard..."
+    echo ""
+    
+    cd "$INSTALL_DIR"
+    
+    # Run hermes setup using the venv Python directly (no activation needed)
+    if [ "$USE_VENV" = true ]; then
+        "$INSTALL_DIR/venv/bin/python" -m hermes_cli.main setup
+    else
+        python -m hermes_cli.main setup
+    fi
+}
+
+print_success() {
+    echo ""
+    echo -e "${GREEN}${BOLD}"
+    echo "┌─────────────────────────────────────────────────────────┐"
+    echo "│              ✓ Installation Complete!                   │"
+    echo "└─────────────────────────────────────────────────────────┘"
+    echo -e "${NC}"
+    echo ""
+    
+    # Show file locations
+    echo -e "${CYAN}${BOLD}📁 Your files (all in ~/.hermes/):${NC}"
+    echo ""
+    echo -e "   ${YELLOW}Config:${NC}    ~/.hermes/config.yaml"
+    echo -e "   ${YELLOW}API Keys:${NC}  ~/.hermes/.env"
+    echo -e "   ${YELLOW}Data:${NC}      ~/.hermes/cron/, sessions/, logs/"
+    echo -e "   ${YELLOW}Code:${NC}      ~/.hermes/hermes-agent/"
+    echo ""
+    
+    echo -e "${CYAN}─────────────────────────────────────────────────────────${NC}"
+    echo ""
+    echo -e "${CYAN}${BOLD}🚀 Commands:${NC}"
+    echo ""
+    echo -e "   ${GREEN}hermes${NC}              Start chatting"
+    echo -e "   ${GREEN}hermes setup${NC}        Configure API keys & settings"
+    echo -e "   ${GREEN}hermes config${NC}       View/edit configuration"
+    echo -e "   ${GREEN}hermes config edit${NC}  Open config in editor"
+    echo -e "   ${GREEN}hermes gateway${NC}      Run messaging gateway"
+    echo -e "   ${GREEN}hermes update${NC}       Update to latest version"
+    echo ""
+    
+    echo -e "${CYAN}─────────────────────────────────────────────────────────${NC}"
+    echo ""
+    echo -e "${YELLOW}⚡ Reload your shell to use 'hermes' command:${NC}"
+    echo ""
+    echo "   source ~/.bashrc   # or ~/.zshrc"
+    echo ""
+    
+    # Show Node.js warning if not installed
+    if [ "$HAS_NODE" = false ]; then
+        echo -e "${YELLOW}"
+        echo "Note: Node.js was not found. Browser automation tools"
+        echo "will have limited functionality. Install Node.js later"
+        echo "if you need full browser support."
+        echo -e "${NC}"
+    fi
+    
+    # Show ripgrep note if not installed
+    if [ "$HAS_RIPGREP" = false ]; then
+        echo -e "${YELLOW}"
+        echo "Note: ripgrep (rg) was not found. File search will use"
+        echo "grep as a fallback. For faster search in large codebases,"
+        echo "install ripgrep: sudo apt install ripgrep (or brew install ripgrep)"
+        echo -e "${NC}"
+    fi
+}
+
+# ============================================================================
+# Main
+# ============================================================================
+
+main() {
+    print_banner
+    
+    detect_os
+    install_uv
+    check_python
+    check_git
+    check_node
+    check_ripgrep
+    check_ffmpeg
+    
+    clone_repo
+    setup_venv
+    install_deps
+    install_node_deps
+    setup_path
+    copy_config_templates
+    run_setup_wizard
+    
+    print_success
+}
+
+main
--- a/scripts/kill_modal.sh
+++ b/scripts/kill_modal.sh
@@ -0,0 +1,34 @@
+#!/bin/bash
+# Kill all running Modal apps (sandboxes, deployments, etc.)
+#
+# Usage:
+#   bash scripts/kill_modal.sh          # Stop swe-rex (the sandbox app)
+#   bash scripts/kill_modal.sh --all    # Stop ALL Modal apps
+
+set -uo pipefail
+
+echo "Fetching Modal app list..."
+APP_LIST=$(modal app list 2>/dev/null)
+
+if [[ "${1:-}" == "--all" ]]; then
+    echo "Stopping ALL Modal apps..."
+    echo "$APP_LIST" | grep -oE 'ap-[A-Za-z0-9]+' | sort -u | while read app_id; do
+        echo "  Stopping $app_id"
+        modal app stop "$app_id" 2>/dev/null || true
+    done
+else
+    echo "Stopping swe-rex sandboxes..."
+    APPS=$(echo "$APP_LIST" | grep 'swe-rex' | grep -oE 'ap-[A-Za-z0-9]+' || true)
+    if [[ -z "$APPS" ]]; then
+        echo "  No swe-rex apps found."
+    else
+        echo "$APPS" | while read app_id; do
+            echo "  Stopping $app_id"
+            modal app stop "$app_id" 2>/dev/null || true
+        done
+    fi
+fi
+
+echo ""
+echo "Current swe-rex status:"
+modal app list 2>/dev/null | grep -E 'State|swe-rex' || echo "  (none)"
--- a/setup-hermes.sh
+++ b/setup-hermes.sh
@@ -0,0 +1,275 @@
+#!/bin/bash
+# ============================================================================
+# Hermes Agent Setup Script
+# ============================================================================
+# Quick setup for developers who cloned the repo manually.
+# Uses uv for fast Python provisioning and package management.
+#
+# Usage:
+#   ./setup-hermes.sh
+#
+# This script:
+# 1. Installs uv if not present
+# 2. Creates a virtual environment with Python 3.11 via uv
+# 3. Installs all dependencies (main package + submodules)
+# 4. Creates .env from template (if not exists)
+# 5. Symlinks the 'hermes' CLI command into ~/.local/bin
+# 6. Runs the setup wizard (optional)
+# ============================================================================
+
+set -e
+
+# Colors
+GREEN='\033[0;32m'
+YELLOW='\033[0;33m'
+CYAN='\033[0;36m'
+RED='\033[0;31m'
+NC='\033[0m'
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+cd "$SCRIPT_DIR"
+
+PYTHON_VERSION="3.11"
+
+echo ""
+echo -e "${CYAN}🦋 Hermes Agent Setup${NC}"
+echo ""
+
+# ============================================================================
+# Install / locate uv
+# ============================================================================
+
+echo -e "${CYAN}→${NC} Checking for uv..."
+
+UV_CMD=""
+if command -v uv &> /dev/null; then
+    UV_CMD="uv"
+elif [ -x "$HOME/.local/bin/uv" ]; then
+    UV_CMD="$HOME/.local/bin/uv"
+elif [ -x "$HOME/.cargo/bin/uv" ]; then
+    UV_CMD="$HOME/.cargo/bin/uv"
+fi
+
+if [ -n "$UV_CMD" ]; then
+    UV_VERSION=$($UV_CMD --version 2>/dev/null)
+    echo -e "${GREEN}✓${NC} uv found ($UV_VERSION)"
+else
+    echo -e "${CYAN}→${NC} Installing uv..."
+    if curl -LsSf https://astral.sh/uv/install.sh | sh 2>/dev/null; then
+        if [ -x "$HOME/.local/bin/uv" ]; then
+            UV_CMD="$HOME/.local/bin/uv"
+        elif [ -x "$HOME/.cargo/bin/uv" ]; then
+            UV_CMD="$HOME/.cargo/bin/uv"
+        fi
+        
+        if [ -n "$UV_CMD" ]; then
+            UV_VERSION=$($UV_CMD --version 2>/dev/null)
+            echo -e "${GREEN}✓${NC} uv installed ($UV_VERSION)"
+        else
+            echo -e "${RED}✗${NC} uv installed but not found. Add ~/.local/bin to PATH and retry."
+            exit 1
+        fi
+    else
+        echo -e "${RED}✗${NC} Failed to install uv. Visit https://docs.astral.sh/uv/"
+        exit 1
+    fi
+fi
+
+# ============================================================================
+# Python check (uv can provision it automatically)
+# ============================================================================
+
+echo -e "${CYAN}→${NC} Checking Python $PYTHON_VERSION..."
+
+if $UV_CMD python find "$PYTHON_VERSION" &> /dev/null; then
+    PYTHON_PATH=$($UV_CMD python find "$PYTHON_VERSION")
+    PYTHON_FOUND_VERSION=$($PYTHON_PATH --version 2>/dev/null)
+    echo -e "${GREEN}✓${NC} $PYTHON_FOUND_VERSION found"
+else
+    echo -e "${CYAN}→${NC} Python $PYTHON_VERSION not found, installing via uv..."
+    $UV_CMD python install "$PYTHON_VERSION"
+    PYTHON_PATH=$($UV_CMD python find "$PYTHON_VERSION")
+    PYTHON_FOUND_VERSION=$($PYTHON_PATH --version 2>/dev/null)
+    echo -e "${GREEN}✓${NC} $PYTHON_FOUND_VERSION installed"
+fi
+
+# ============================================================================
+# Virtual environment
+# ============================================================================
+
+echo -e "${CYAN}→${NC} Setting up virtual environment..."
+
+if [ -d "venv" ]; then
+    echo -e "${CYAN}→${NC} Removing old venv..."
+    rm -rf venv
+fi
+
+$UV_CMD venv venv --python "$PYTHON_VERSION"
+echo -e "${GREEN}✓${NC} venv created (Python $PYTHON_VERSION)"
+
+# Tell uv to install into this venv (no activation needed for uv)
+export VIRTUAL_ENV="$SCRIPT_DIR/venv"
+
+# ============================================================================
+# Dependencies
+# ============================================================================
+
+echo -e "${CYAN}→${NC} Installing dependencies..."
+
+$UV_CMD pip install -e ".[all]" || $UV_CMD pip install -e "."
+
+echo -e "${GREEN}✓${NC} Dependencies installed"
+
+# ============================================================================
+# Submodules (terminal backend + RL training)
+# ============================================================================
+
+echo -e "${CYAN}→${NC} Installing submodules..."
+
+# mini-swe-agent (terminal tool backend)
+if [ -d "mini-swe-agent" ] && [ -f "mini-swe-agent/pyproject.toml" ]; then
+    $UV_CMD pip install -e "./mini-swe-agent" && \
+        echo -e "${GREEN}✓${NC} mini-swe-agent installed" || \
+        echo -e "${YELLOW}⚠${NC} mini-swe-agent install failed (terminal tools may not work)"
+else
+    echo -e "${YELLOW}⚠${NC} mini-swe-agent not found (run: git submodule update --init --recursive)"
+fi
+
+# tinker-atropos (RL training backend)
+if [ -d "tinker-atropos" ] && [ -f "tinker-atropos/pyproject.toml" ]; then
+    $UV_CMD pip install -e "./tinker-atropos" && \
+        echo -e "${GREEN}✓${NC} tinker-atropos installed" || \
+        echo -e "${YELLOW}⚠${NC} tinker-atropos install failed (RL tools may not work)"
+else
+    echo -e "${YELLOW}⚠${NC} tinker-atropos not found (run: git submodule update --init --recursive)"
+fi
+
+# ============================================================================
+# Optional: ripgrep (for faster file search)
+# ============================================================================
+
+echo -e "${CYAN}→${NC} Checking ripgrep (optional, for faster search)..."
+
+if command -v rg &> /dev/null; then
+    echo -e "${GREEN}✓${NC} ripgrep found"
+else
+    echo -e "${YELLOW}⚠${NC} ripgrep not found (file search will use grep fallback)"
+    read -p "Install ripgrep for faster search? [Y/n] " -n 1 -r
+    echo
+    if [[ $REPLY =~ ^[Yy]$ ]] || [[ -z $REPLY ]]; then
+        INSTALLED=false
+        
+        # Check if sudo is available
+        if command -v sudo &> /dev/null && sudo -n true 2>/dev/null; then
+            if command -v apt &> /dev/null; then
+                sudo apt install -y ripgrep && INSTALLED=true
+            elif command -v dnf &> /dev/null; then
+                sudo dnf install -y ripgrep && INSTALLED=true
+            fi
+        fi
+        
+        # Try brew (no sudo needed)
+        if [ "$INSTALLED" = false ] && command -v brew &> /dev/null; then
+            brew install ripgrep && INSTALLED=true
+        fi
+        
+        # Try cargo (no sudo needed)
+        if [ "$INSTALLED" = false ] && command -v cargo &> /dev/null; then
+            echo -e "${CYAN}→${NC} Trying cargo install (no sudo required)..."
+            cargo install ripgrep && INSTALLED=true
+        fi
+        
+        if [ "$INSTALLED" = true ]; then
+            echo -e "${GREEN}✓${NC} ripgrep installed"
+        else
+            echo -e "${YELLOW}⚠${NC} Auto-install failed. Install options:"
+            echo "    sudo apt install ripgrep     # Debian/Ubuntu"
+            echo "    brew install ripgrep         # macOS"
+            echo "    cargo install ripgrep        # With Rust (no sudo)"
+            echo "    https://github.com/BurntSushi/ripgrep#installation"
+        fi
+    fi
+fi
+
+# ============================================================================
+# Environment file
+# ============================================================================
+
+if [ ! -f ".env" ]; then
+    if [ -f ".env.example" ]; then
+        cp .env.example .env
+        echo -e "${GREEN}✓${NC} Created .env from template"
+    fi
+else
+    echo -e "${GREEN}✓${NC} .env exists"
+fi
+
+# ============================================================================
+# PATH setup — symlink hermes into ~/.local/bin
+# ============================================================================
+
+echo -e "${CYAN}→${NC} Setting up hermes command..."
+
+HERMES_BIN="$SCRIPT_DIR/venv/bin/hermes"
+mkdir -p "$HOME/.local/bin"
+ln -sf "$HERMES_BIN" "$HOME/.local/bin/hermes"
+echo -e "${GREEN}✓${NC} Symlinked hermes → ~/.local/bin/hermes"
+
+# Ensure ~/.local/bin is on PATH in shell config
+SHELL_CONFIG=""
+if [ -f "$HOME/.zshrc" ]; then
+    SHELL_CONFIG="$HOME/.zshrc"
+elif [ -f "$HOME/.bashrc" ]; then
+    SHELL_CONFIG="$HOME/.bashrc"
+elif [ -f "$HOME/.bash_profile" ]; then
+    SHELL_CONFIG="$HOME/.bash_profile"
+fi
+
+if [ -n "$SHELL_CONFIG" ]; then
+    if ! echo "$PATH" | tr ':' '\n' | grep -q "^$HOME/.local/bin$"; then
+        if ! grep -q '\.local/bin' "$SHELL_CONFIG" 2>/dev/null; then
+            echo "" >> "$SHELL_CONFIG"
+            echo "# Hermes Agent — ensure ~/.local/bin is on PATH" >> "$SHELL_CONFIG"
+            echo 'export PATH="$HOME/.local/bin:$PATH"' >> "$SHELL_CONFIG"
+            echo -e "${GREEN}✓${NC} Added ~/.local/bin to PATH in $SHELL_CONFIG"
+        else
+            echo -e "${GREEN}✓${NC} ~/.local/bin already in $SHELL_CONFIG"
+        fi
+    else
+        echo -e "${GREEN}✓${NC} ~/.local/bin already on PATH"
+    fi
+fi
+
+# ============================================================================
+# Done
+# ============================================================================
+
+echo ""
+echo -e "${GREEN}✓ Setup complete!${NC}"
+echo ""
+echo "Next steps:"
+echo ""
+echo "  1. Reload your shell:"
+echo "     source $SHELL_CONFIG"
+echo ""
+echo "  2. Run the setup wizard to configure API keys:"
+echo "     hermes setup"
+echo ""
+echo "  3. Start chatting:"
+echo "     hermes"
+echo ""
+echo "Other commands:"
+echo "  hermes status        # Check configuration"
+echo "  hermes gateway       # Start messaging gateway"
+echo "  hermes cron daemon   # Run cron daemon"
+echo "  hermes doctor        # Diagnose issues"
+echo ""
+
+# Ask if they want to run setup wizard now
+read -p "Would you like to run the setup wizard now? [Y/n] " -n 1 -r
+echo
+if [[ $REPLY =~ ^[Yy]$ ]] || [[ -z $REPLY ]]; then
+    echo ""
+    # Run directly with venv Python (no activation needed)
+    "$SCRIPT_DIR/venv/bin/python" -m hermes_cli.main setup
+fi
--- a/skills/diagramming/DESCRIPTION.md
+++ b/skills/diagramming/DESCRIPTION.md
@@ -0,0 +1,3 @@
+---
+description: Diagram creation skills for generating visual diagrams, flowcharts, architecture diagrams, and illustrations using tools like Excalidraw.
+---
--- a/skills/diagramming/excalidraw/SKILL.md
+++ b/skills/diagramming/excalidraw/SKILL.md
@@ -0,0 +1,191 @@
+---
+name: excalidraw
+description: Create hand-drawn style diagrams using Excalidraw JSON format. Generate .excalidraw files for architecture diagrams, flowcharts, sequence diagrams, concept maps, and more. Files can be opened at excalidraw.com or uploaded for shareable links.
+version: 1.0.0
+author: Hermes Agent
+license: MIT
+tags: [Excalidraw, Diagrams, Flowcharts, Architecture, Visualization, JSON]
+dependencies: []
+related_skills: []
+---
+
+# Excalidraw Diagram Skill
+
+Create diagrams by writing standard Excalidraw element JSON and saving as `.excalidraw` files. These files can be drag-and-dropped onto [excalidraw.com](https://excalidraw.com) for viewing and editing. No accounts, no API keys, no rendering libraries -- just JSON.
+
+## Workflow
+
+1. **Load this skill** (you already did)
+2. **Write the elements JSON** -- an array of Excalidraw element objects
+3. **Save the file** using `write_file` to create a `.excalidraw` file
+4. **Optionally upload** for a shareable link using `scripts/upload.py` via `terminal`
+
+### Saving a Diagram
+
+Wrap your elements array in the standard `.excalidraw` envelope and save with `write_file`:
+
+```json
+{
+  "type": "excalidraw",
+  "version": 2,
+  "source": "hermes-agent",
+  "elements": [ ...your elements array here... ],
+  "appState": {
+    "viewBackgroundColor": "#ffffff"
+  }
+}
+```
+
+Save to any path, e.g. `~/diagrams/my_diagram.excalidraw`.
+
+### Uploading for a Shareable Link
+
+Run the upload script (located in this skill's `scripts/` directory) via terminal:
+
+```bash
+python skills/diagramming/excalidraw/scripts/upload.py ~/diagrams/my_diagram.excalidraw
+```
+
+This uploads to excalidraw.com (no account needed) and prints a shareable URL. Requires the `cryptography` pip package (`pip install cryptography`).
+
+---
+
+## Element Format Reference
+
+### Required Fields (all elements)
+`type`, `id` (unique string), `x`, `y`, `width`, `height`
+
+### Defaults (skip these -- they're applied automatically)
+- `strokeColor`: `"#1e1e1e"`
+- `backgroundColor`: `"transparent"`
+- `fillStyle`: `"solid"`
+- `strokeWidth`: `2`
+- `roughness`: `1` (hand-drawn look)
+- `opacity`: `100`
+
+Canvas background is white.
+
+### Element Types
+
+**Rectangle**:
+```json
+{ "type": "rectangle", "id": "r1", "x": 100, "y": 100, "width": 200, "height": 100 }
+```
+- `roundness: { "type": 3 }` for rounded corners
+- `backgroundColor: "#a5d8ff"`, `fillStyle: "solid"` for filled
+
+**Ellipse**:
+```json
+{ "type": "ellipse", "id": "e1", "x": 100, "y": 100, "width": 150, "height": 150 }
+```
+
+**Diamond**:
+```json
+{ "type": "diamond", "id": "d1", "x": 100, "y": 100, "width": 150, "height": 150 }
+```
+
+**Labeled shape (container binding)** -- create a text element bound to the shape:
+
+> **WARNING:** Do NOT use `"label": { "text": "..." }` on shapes. This is NOT a valid
+> Excalidraw property and will be silently ignored, producing blank shapes. You MUST
+> use the container binding approach below.
+
+The shape needs `boundElements` listing the text, and the text needs `containerId` pointing back:
+```json
+{ "type": "rectangle", "id": "r1", "x": 100, "y": 100, "width": 200, "height": 80,
+  "roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid",
+  "boundElements": [{ "id": "t_r1", "type": "text" }] },
+{ "type": "text", "id": "t_r1", "x": 105, "y": 110, "width": 190, "height": 25,
+  "text": "Hello", "fontSize": 20, "fontFamily": 1, "strokeColor": "#1e1e1e",
+  "textAlign": "center", "verticalAlign": "middle",
+  "containerId": "r1", "originalText": "Hello", "autoResize": true }
+```
+- Works on rectangle, ellipse, diamond
+- Text is auto-centered by Excalidraw when `containerId` is set
+- The text `x`/`y`/`width`/`height` are approximate -- Excalidraw recalculates them on load
+- `originalText` should match `text`
+- Always include `fontFamily: 1` (Virgil/hand-drawn font)
+
+**Labeled arrow** -- same container binding approach:
+```json
+{ "type": "arrow", "id": "a1", "x": 300, "y": 150, "width": 200, "height": 0,
+  "points": [[0,0],[200,0]], "endArrowhead": "arrow",
+  "boundElements": [{ "id": "t_a1", "type": "text" }] },
+{ "type": "text", "id": "t_a1", "x": 370, "y": 130, "width": 60, "height": 20,
+  "text": "connects", "fontSize": 16, "fontFamily": 1, "strokeColor": "#1e1e1e",
+  "textAlign": "center", "verticalAlign": "middle",
+  "containerId": "a1", "originalText": "connects", "autoResize": true }
+```
+
+**Standalone text** (titles and annotations only -- no container):
+```json
+{ "type": "text", "id": "t1", "x": 150, "y": 138, "text": "Hello", "fontSize": 20,
+  "fontFamily": 1, "strokeColor": "#1e1e1e", "originalText": "Hello", "autoResize": true }
+```
+- `x` is the LEFT edge. To center at position `cx`: `x = cx - (text.length * fontSize * 0.5) / 2`
+- Do NOT rely on `textAlign` or `width` for positioning
+
+**Arrow**:
+```json
+{ "type": "arrow", "id": "a1", "x": 300, "y": 150, "width": 200, "height": 0,
+  "points": [[0,0],[200,0]], "endArrowhead": "arrow" }
+```
+- `points`: `[dx, dy]` offsets from element `x`, `y`
+- `endArrowhead`: `null` | `"arrow"` | `"bar"` | `"dot"` | `"triangle"`
+- `strokeStyle`: `"solid"` (default) | `"dashed"` | `"dotted"`
+
+### Arrow Bindings (connect arrows to shapes)
+
+```json
+{
+  "type": "arrow", "id": "a1", "x": 300, "y": 150, "width": 150, "height": 0,
+  "points": [[0,0],[150,0]], "endArrowhead": "arrow",
+  "startBinding": { "elementId": "r1", "fixedPoint": [1, 0.5] },
+  "endBinding": { "elementId": "r2", "fixedPoint": [0, 0.5] }
+}
+```
+
+`fixedPoint` coordinates: `top=[0.5,0]`, `bottom=[0.5,1]`, `left=[0,0.5]`, `right=[1,0.5]`
+
+### Drawing Order (z-order)
+- Array order = z-order (first = back, last = front)
+- Emit progressively: background zones → shape → its bound text → its arrows → next shape
+- BAD: all rectangles, then all texts, then all arrows
+- GOOD: bg_zone → shape1 → text_for_shape1 → arrow1 → arrow_label_text → shape2 → text_for_shape2 → ...
+- Always place the bound text element immediately after its container shape
+
+### Sizing Guidelines
+
+**Font sizes:**
+- Minimum `fontSize`: **16** for body text, labels, descriptions
+- Minimum `fontSize`: **20** for titles and headings
+- Minimum `fontSize`: **14** for secondary annotations only (sparingly)
+- NEVER use `fontSize` below 14
+
+**Element sizes:**
+- Minimum shape size: 120x60 for labeled rectangles/ellipses
+- Leave 20-30px gaps between elements minimum
+- Prefer fewer, larger elements over many tiny ones
+
+### Color Palette
+
+See `references/colors.md` for full color tables. Quick reference:
+
+| Use | Fill Color | Hex |
+|-----|-----------|-----|
+| Primary / Input | Light Blue | `#a5d8ff` |
+| Success / Output | Light Green | `#b2f2bb` |
+| Warning / External | Light Orange | `#ffd8a8` |
+| Processing / Special | Light Purple | `#d0bfff` |
+| Error / Critical | Light Red | `#ffc9c9` |
+| Notes / Decisions | Light Yellow | `#fff3bf` |
+| Storage / Data | Light Teal | `#c3fae8` |
+
+### Tips
+- Use the color palette consistently across the diagram
+- **Text contrast is CRITICAL** -- never use light gray on white backgrounds. Minimum text color on white: `#757575`
+- Do NOT use emoji in text -- they don't render in Excalidraw's font
+- For dark mode diagrams, see `references/dark-mode.md`
+- For larger examples, see `references/examples.md`
+
+
--- a/skills/diagramming/excalidraw/references/colors.md
+++ b/skills/diagramming/excalidraw/references/colors.md
@@ -0,0 +1,44 @@
+# Excalidraw Color Palette
+
+Use these colors consistently across diagrams.
+
+## Primary Colors (for strokes, arrows, and accents)
+
+| Name | Hex | Use |
+|------|-----|-----|
+| Blue | `#4a9eed` | Primary actions, links, data series 1 |
+| Amber | `#f59e0b` | Warnings, highlights, data series 2 |
+| Green | `#22c55e` | Success, positive, data series 3 |
+| Red | `#ef4444` | Errors, negative, data series 4 |
+| Purple | `#8b5cf6` | Accents, special items, data series 5 |
+| Pink | `#ec4899` | Decorative, data series 6 |
+| Cyan | `#06b6d4` | Info, secondary, data series 7 |
+| Lime | `#84cc16` | Extra, data series 8 |
+
+## Pastel Fills (for shape backgrounds)
+
+| Color | Hex | Good For |
+|-------|-----|----------|
+| Light Blue | `#a5d8ff` | Input, sources, primary nodes |
+| Light Green | `#b2f2bb` | Success, output, completed |
+| Light Orange | `#ffd8a8` | Warning, pending, external |
+| Light Purple | `#d0bfff` | Processing, middleware, special |
+| Light Red | `#ffc9c9` | Error, critical, alerts |
+| Light Yellow | `#fff3bf` | Notes, decisions, planning |
+| Light Teal | `#c3fae8` | Storage, data, memory |
+| Light Pink | `#eebefa` | Analytics, metrics |
+
+## Background Zones (use with opacity: 30-35 for layered diagrams)
+
+| Color | Hex | Good For |
+|-------|-----|----------|
+| Blue zone | `#dbe4ff` | UI / frontend layer |
+| Purple zone | `#e5dbff` | Logic / agent layer |
+| Green zone | `#d3f9d8` | Data / tool layer |
+
+## Text Contrast Rules
+
+- **On white backgrounds**: minimum text color is `#757575`. Default `#1e1e1e` is best.
+- **Colored text on light fills**: use dark variants (`#15803d` not `#22c55e`, `#2563eb` not `#4a9eed`)
+- **White text**: only on dark backgrounds (`#9a5030` not `#c4795b`)
+- **Never**: light gray (`#b0b0b0`, `#999`) on white -- unreadable
--- a/skills/diagramming/excalidraw/references/dark-mode.md
+++ b/skills/diagramming/excalidraw/references/dark-mode.md
@@ -0,0 +1,68 @@
+# Excalidraw Dark Mode Diagrams
+
+To create a dark-themed diagram, use a massive dark background rectangle as the **first element** in the array. Make it large enough to cover any viewport:
+
+```json
+{
+  "type": "rectangle", "id": "darkbg",
+  "x": -4000, "y": -3000, "width": 10000, "height": 7500,
+  "backgroundColor": "#1e1e2e", "fillStyle": "solid",
+  "strokeColor": "transparent", "strokeWidth": 0
+}
+```
+
+Then use the following color palettes for elements on the dark background.
+
+## Text Colors (on dark)
+
+| Color | Hex | Use |
+|-------|-----|-----|
+| White | `#e5e5e5` | Primary text, titles |
+| Muted | `#a0a0a0` | Secondary text, annotations |
+| NEVER | `#555` or darker | Invisible on dark bg! |
+
+## Shape Fills (on dark)
+
+| Color | Hex | Good For |
+|-------|-----|----------|
+| Dark Blue | `#1e3a5f` | Primary nodes |
+| Dark Green | `#1a4d2e` | Success, output |
+| Dark Purple | `#2d1b69` | Processing, special |
+| Dark Orange | `#5c3d1a` | Warning, pending |
+| Dark Red | `#5c1a1a` | Error, critical |
+| Dark Teal | `#1a4d4d` | Storage, data |
+
+## Stroke and Arrow Colors (on dark)
+
+Use the standard Primary Colors from the main color palette -- they're bright enough on dark backgrounds:
+- Blue `#4a9eed`, Amber `#f59e0b`, Green `#22c55e`, Red `#ef4444`, Purple `#8b5cf6`
+
+For subtle shape borders, use `#555555`.
+
+## Example: Dark mode labeled rectangle
+
+Use container binding (NOT the `"label"` property, which doesn't work). On dark backgrounds, set text `strokeColor` to `"#e5e5e5"` so it's visible:
+
+```json
+[
+  {
+    "type": "rectangle", "id": "r1",
+    "x": 100, "y": 100, "width": 200, "height": 80,
+    "backgroundColor": "#1e3a5f", "fillStyle": "solid",
+    "strokeColor": "#4a9eed", "strokeWidth": 2,
+    "roundness": { "type": 3 },
+    "boundElements": [{ "id": "t_r1", "type": "text" }]
+  },
+  {
+    "type": "text", "id": "t_r1",
+    "x": 105, "y": 120, "width": 190, "height": 25,
+    "text": "Dark Node", "fontSize": 20, "fontFamily": 1,
+    "strokeColor": "#e5e5e5",
+    "textAlign": "center", "verticalAlign": "middle",
+    "containerId": "r1", "originalText": "Dark Node", "autoResize": true
+  }
+]
+```
+
+Note: For standalone text elements on dark backgrounds, always set `"strokeColor": "#e5e5e5"` explicitly. The default `#1e1e1e` is invisible on dark.
+
--- a/skills/diagramming/excalidraw/references/examples.md
+++ b/skills/diagramming/excalidraw/references/examples.md
@@ -0,0 +1,141 @@
+# Excalidraw Diagram Examples
+
+Complete, copy-pasteable examples. Wrap each in the `.excalidraw` envelope before saving:
+
+```json
+{
+  "type": "excalidraw",
+  "version": 2,
+  "source": "hermes-agent",
+  "elements": [ ...elements from examples below... ],
+  "appState": { "viewBackgroundColor": "#ffffff" }
+}
+```
+
+> **IMPORTANT:** All text labels on shapes and arrows use container binding (`containerId` + `boundElements`).
+> Do NOT use the non-existent `"label"` property -- it will be silently ignored, producing blank shapes.
+
+---
+
+## Example 1: Two Connected Labeled Boxes
+
+A minimal flowchart with two boxes and an arrow between them.
+
+```json
+[
+  { "type": "text", "id": "title", "x": 280, "y": 30, "text": "Simple Flow", "fontSize": 28, "fontFamily": 1, "strokeColor": "#1e1e1e", "originalText": "Simple Flow", "autoResize": true },
+  { "type": "rectangle", "id": "b1", "x": 100, "y": 100, "width": 200, "height": 100, "roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid", "boundElements": [{ "id": "t_b1", "type": "text" }, { "id": "a1", "type": "arrow" }] },
+  { "type": "text", "id": "t_b1", "x": 105, "y": 130, "width": 190, "height": 25, "text": "Start", "fontSize": 20, "fontFamily": 1, "strokeColor": "#1e1e1e", "textAlign": "center", "verticalAlign": "middle", "containerId": "b1", "originalText": "Start", "autoResize": true },
+  { "type": "rectangle", "id": "b2", "x": 450, "y": 100, "width": 200, "height": 100, "roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid", "boundElements": [{ "id": "t_b2", "type": "text" }, { "id": "a1", "type": "arrow" }] },
+  { "type": "text", "id": "t_b2", "x": 455, "y": 130, "width": 190, "height": 25, "text": "End", "fontSize": 20, "fontFamily": 1, "strokeColor": "#1e1e1e", "textAlign": "center", "verticalAlign": "middle", "containerId": "b2", "originalText": "End", "autoResize": true },
+  { "type": "arrow", "id": "a1", "x": 300, "y": 150, "width": 150, "height": 0, "points": [[0,0],[150,0]], "endArrowhead": "arrow", "startBinding": { "elementId": "b1", "fixedPoint": [1, 0.5] }, "endBinding": { "elementId": "b2", "fixedPoint": [0, 0.5] } }
+]
+```
+
+---
+
+## Example 2: Photosynthesis Process Diagram
+
+A larger diagram with background zones, multiple nodes, and directional arrows showing inputs/outputs.
+
+```json
+[
+  {"type":"text","id":"ti","x":280,"y":10,"text":"Photosynthesis","fontSize":28,"fontFamily":1,"strokeColor":"#1e1e1e","originalText":"Photosynthesis","autoResize":true},
+  {"type":"text","id":"fo","x":245,"y":48,"text":"6CO2 + 6H2O --> C6H12O6 + 6O2","fontSize":16,"fontFamily":1,"strokeColor":"#757575","originalText":"6CO2 + 6H2O --> C6H12O6 + 6O2","autoResize":true},
+  {"type":"rectangle","id":"lf","x":150,"y":90,"width":520,"height":380,"backgroundColor":"#d3f9d8","fillStyle":"solid","roundness":{"type":3},"strokeColor":"#22c55e","strokeWidth":1,"opacity":35},
+  {"type":"text","id":"lfl","x":170,"y":96,"text":"Inside the Leaf","fontSize":16,"fontFamily":1,"strokeColor":"#15803d","originalText":"Inside the Leaf","autoResize":true},
+
+  {"type":"rectangle","id":"lr","x":190,"y":190,"width":160,"height":70,"backgroundColor":"#fff3bf","fillStyle":"solid","roundness":{"type":3},"strokeColor":"#f59e0b","boundElements":[{"id":"t_lr","type":"text"},{"id":"a1","type":"arrow"},{"id":"a2","type":"arrow"},{"id":"a3","type":"arrow"},{"id":"a5","type":"arrow"}]},
+  {"type":"text","id":"t_lr","x":195,"y":205,"width":150,"height":20,"text":"Light Reactions","fontSize":16,"fontFamily":1,"strokeColor":"#1e1e1e","textAlign":"center","verticalAlign":"middle","containerId":"lr","originalText":"Light Reactions","autoResize":true},
+
+  {"type":"arrow","id":"a1","x":350,"y":225,"width":120,"height":0,"points":[[0,0],[120,0]],"strokeColor":"#1e1e1e","strokeWidth":2,"endArrowhead":"arrow","boundElements":[{"id":"t_a1","type":"text"}]},
+  {"type":"text","id":"t_a1","x":390,"y":205,"width":40,"height":20,"text":"ATP","fontSize":14,"fontFamily":1,"strokeColor":"#1e1e1e","textAlign":"center","verticalAlign":"middle","containerId":"a1","originalText":"ATP","autoResize":true},
+
+  {"type":"rectangle","id":"cc","x":470,"y":190,"width":160,"height":70,"backgroundColor":"#d0bfff","fillStyle":"solid","roundness":{"type":3},"strokeColor":"#8b5cf6","boundElements":[{"id":"t_cc","type":"text"},{"id":"a1","type":"arrow"},{"id":"a4","type":"arrow"},{"id":"a6","type":"arrow"}]},
+  {"type":"text","id":"t_cc","x":475,"y":205,"width":150,"height":20,"text":"Calvin Cycle","fontSize":16,"fontFamily":1,"strokeColor":"#1e1e1e","textAlign":"center","verticalAlign":"middle","containerId":"cc","originalText":"Calvin Cycle","autoResize":true},
+
+  {"type":"rectangle","id":"sl","x":10,"y":200,"width":120,"height":50,"backgroundColor":"#fff3bf","fillStyle":"solid","roundness":{"type":3},"strokeColor":"#f59e0b","boundElements":[{"id":"t_sl","type":"text"},{"id":"a2","type":"arrow"}]},
+  {"type":"text","id":"t_sl","x":15,"y":210,"width":110,"height":20,"text":"Sunlight","fontSize":16,"fontFamily":1,"strokeColor":"#1e1e1e","textAlign":"center","verticalAlign":"middle","containerId":"sl","originalText":"Sunlight","autoResize":true},
+
+  {"type":"arrow","id":"a2","x":130,"y":225,"width":60,"height":0,"points":[[0,0],[60,0]],"strokeColor":"#f59e0b","strokeWidth":2,"endArrowhead":"arrow"},
+
+  {"type":"rectangle","id":"wa","x":200,"y":360,"width":140,"height":50,"backgroundColor":"#a5d8ff","fillStyle":"solid","roundness":{"type":3},"strokeColor":"#4a9eed","boundElements":[{"id":"t_wa","type":"text"},{"id":"a3","type":"arrow"}]},
+  {"type":"text","id":"t_wa","x":205,"y":370,"width":130,"height":20,"text":"Water (H2O)","fontSize":16,"fontFamily":1,"strokeColor":"#1e1e1e","textAlign":"center","verticalAlign":"middle","containerId":"wa","originalText":"Water (H2O)","autoResize":true},
+
+  {"type":"arrow","id":"a3","x":270,"y":360,"width":0,"height":-100,"points":[[0,0],[0,-100]],"strokeColor":"#4a9eed","strokeWidth":2,"endArrowhead":"arrow"},
+
+  {"type":"rectangle","id":"co","x":480,"y":360,"width":130,"height":50,"backgroundColor":"#ffd8a8","fillStyle":"solid","roundness":{"type":3},"strokeColor":"#f59e0b","boundElements":[{"id":"t_co","type":"text"},{"id":"a4","type":"arrow"}]},
+  {"type":"text","id":"t_co","x":485,"y":370,"width":120,"height":20,"text":"CO2","fontSize":16,"fontFamily":1,"strokeColor":"#1e1e1e","textAlign":"center","verticalAlign":"middle","containerId":"co","originalText":"CO2","autoResize":true},
+
+  {"type":"arrow","id":"a4","x":545,"y":360,"width":0,"height":-100,"points":[[0,0],[0,-100]],"strokeColor":"#f59e0b","strokeWidth":2,"endArrowhead":"arrow"},
+
+  {"type":"rectangle","id":"ox","x":540,"y":100,"width":100,"height":40,"backgroundColor":"#ffc9c9","fillStyle":"solid","roundness":{"type":3},"strokeColor":"#ef4444","boundElements":[{"id":"t_ox","type":"text"},{"id":"a5","type":"arrow"}]},
+  {"type":"text","id":"t_ox","x":545,"y":105,"width":90,"height":20,"text":"O2","fontSize":16,"fontFamily":1,"strokeColor":"#1e1e1e","textAlign":"center","verticalAlign":"middle","containerId":"ox","originalText":"O2","autoResize":true},
+
+  {"type":"arrow","id":"a5","x":310,"y":190,"width":230,"height":-50,"points":[[0,0],[230,-50]],"strokeColor":"#ef4444","strokeWidth":2,"endArrowhead":"arrow"},
+
+  {"type":"rectangle","id":"gl","x":690,"y":195,"width":120,"height":60,"backgroundColor":"#c3fae8","fillStyle":"solid","roundness":{"type":3},"strokeColor":"#22c55e","boundElements":[{"id":"t_gl","type":"text"},{"id":"a6","type":"arrow"}]},
+  {"type":"text","id":"t_gl","x":695,"y":210,"width":110,"height":25,"text":"Glucose","fontSize":18,"fontFamily":1,"strokeColor":"#1e1e1e","textAlign":"center","verticalAlign":"middle","containerId":"gl","originalText":"Glucose","autoResize":true},
+
+  {"type":"arrow","id":"a6","x":630,"y":225,"width":60,"height":0,"points":[[0,0],[60,0]],"strokeColor":"#22c55e","strokeWidth":2,"endArrowhead":"arrow"},
+
+  {"type":"ellipse","id":"sun","x":30,"y":110,"width":50,"height":50,"backgroundColor":"#fff3bf","fillStyle":"solid","strokeColor":"#f59e0b","strokeWidth":2},
+  {"type":"arrow","id":"r1","x":55,"y":108,"width":0,"height":-14,"points":[[0,0],[0,-14]],"strokeColor":"#f59e0b","strokeWidth":2,"endArrowhead":null,"startArrowhead":null},
+  {"type":"arrow","id":"r2","x":55,"y":162,"width":0,"height":14,"points":[[0,0],[0,14]],"strokeColor":"#f59e0b","strokeWidth":2,"endArrowhead":null,"startArrowhead":null},
+  {"type":"arrow","id":"r3","x":28,"y":135,"width":-14,"height":0,"points":[[0,0],[-14,0]],"strokeColor":"#f59e0b","strokeWidth":2,"endArrowhead":null,"startArrowhead":null},
+  {"type":"arrow","id":"r4","x":82,"y":135,"width":14,"height":0,"points":[[0,0],[14,0]],"strokeColor":"#f59e0b","strokeWidth":2,"endArrowhead":null,"startArrowhead":null}
+]
+```
+
+---
+
+## Example 3: Sequence Diagram (UML-style)
+
+Demonstrates a sequence diagram with actors, dashed lifelines, and message arrows.
+
+```json
+[
+  {"type":"text","id":"title","x":200,"y":15,"text":"MCP Apps -- Sequence Flow","fontSize":24,"fontFamily":1,"strokeColor":"#1e1e1e","originalText":"MCP Apps -- Sequence Flow","autoResize":true},
+
+  {"type":"rectangle","id":"uHead","x":60,"y":60,"width":100,"height":40,"backgroundColor":"#a5d8ff","fillStyle":"solid","roundness":{"type":3},"strokeColor":"#4a9eed","strokeWidth":2,"boundElements":[{"id":"t_uHead","type":"text"}]},
+  {"type":"text","id":"t_uHead","x":65,"y":65,"width":90,"height":20,"text":"User","fontSize":16,"fontFamily":1,"strokeColor":"#1e1e1e","textAlign":"center","verticalAlign":"middle","containerId":"uHead","originalText":"User","autoResize":true},
+
+  {"type":"arrow","id":"uLine","x":110,"y":100,"width":0,"height":400,"points":[[0,0],[0,400]],"strokeColor":"#b0b0b0","strokeWidth":1,"strokeStyle":"dashed","endArrowhead":null},
+
+  {"type":"rectangle","id":"aHead","x":230,"y":60,"width":100,"height":40,"backgroundColor":"#d0bfff","fillStyle":"solid","roundness":{"type":3},"strokeColor":"#8b5cf6","strokeWidth":2,"boundElements":[{"id":"t_aHead","type":"text"}]},
+  {"type":"text","id":"t_aHead","x":235,"y":65,"width":90,"height":20,"text":"Agent","fontSize":16,"fontFamily":1,"strokeColor":"#1e1e1e","textAlign":"center","verticalAlign":"middle","containerId":"aHead","originalText":"Agent","autoResize":true},
+
+  {"type":"arrow","id":"aLine","x":280,"y":100,"width":0,"height":400,"points":[[0,0],[0,400]],"strokeColor":"#b0b0b0","strokeWidth":1,"strokeStyle":"dashed","endArrowhead":null},
+
+  {"type":"rectangle","id":"sHead","x":420,"y":60,"width":130,"height":40,"backgroundColor":"#ffd8a8","fillStyle":"solid","roundness":{"type":3},"strokeColor":"#f59e0b","strokeWidth":2,"boundElements":[{"id":"t_sHead","type":"text"}]},
+  {"type":"text","id":"t_sHead","x":425,"y":65,"width":120,"height":20,"text":"Server","fontSize":16,"fontFamily":1,"strokeColor":"#1e1e1e","textAlign":"center","verticalAlign":"middle","containerId":"sHead","originalText":"Server","autoResize":true},
+
+  {"type":"arrow","id":"sLine","x":485,"y":100,"width":0,"height":400,"points":[[0,0],[0,400]],"strokeColor":"#b0b0b0","strokeWidth":1,"strokeStyle":"dashed","endArrowhead":null},
+
+  {"type":"arrow","id":"m1","x":110,"y":150,"width":170,"height":0,"points":[[0,0],[170,0]],"strokeColor":"#1e1e1e","strokeWidth":2,"endArrowhead":"arrow","boundElements":[{"id":"t_m1","type":"text"}]},
+  {"type":"text","id":"t_m1","x":165,"y":130,"width":60,"height":20,"text":"request","fontSize":14,"fontFamily":1,"strokeColor":"#1e1e1e","textAlign":"center","verticalAlign":"middle","containerId":"m1","originalText":"request","autoResize":true},
+
+  {"type":"arrow","id":"m2","x":280,"y":200,"width":205,"height":0,"points":[[0,0],[205,0]],"strokeColor":"#8b5cf6","strokeWidth":2,"endArrowhead":"arrow","boundElements":[{"id":"t_m2","type":"text"}]},
+  {"type":"text","id":"t_m2","x":352,"y":180,"width":60,"height":20,"text":"tools/call","fontSize":14,"fontFamily":1,"strokeColor":"#1e1e1e","textAlign":"center","verticalAlign":"middle","containerId":"m2","originalText":"tools/call","autoResize":true},
+
+  {"type":"arrow","id":"m3","x":485,"y":260,"width":-205,"height":0,"points":[[0,0],[-205,0]],"strokeColor":"#f59e0b","strokeWidth":2,"endArrowhead":"arrow","strokeStyle":"dashed","boundElements":[{"id":"t_m3","type":"text"}]},
+  {"type":"text","id":"t_m3","x":352,"y":240,"width":60,"height":20,"text":"result","fontSize":14,"fontFamily":1,"strokeColor":"#1e1e1e","textAlign":"center","verticalAlign":"middle","containerId":"m3","originalText":"result","autoResize":true},
+
+  {"type":"arrow","id":"m4","x":280,"y":320,"width":-170,"height":0,"points":[[0,0],[-170,0]],"strokeColor":"#8b5cf6","strokeWidth":2,"endArrowhead":"arrow","strokeStyle":"dashed","boundElements":[{"id":"t_m4","type":"text"}]},
+  {"type":"text","id":"t_m4","x":165,"y":300,"width":60,"height":20,"text":"response","fontSize":14,"fontFamily":1,"strokeColor":"#1e1e1e","textAlign":"center","verticalAlign":"middle","containerId":"m4","originalText":"response","autoResize":true}
+]
+```
+
+---
+
+## Common Mistakes to Avoid
+
+- **Do NOT use `"label"` property** -- this is the #1 mistake. It is NOT part of the Excalidraw file format and will be silently ignored, producing blank shapes with no visible text. Always use container binding (`containerId` + `boundElements`) as shown in the examples above.
+- **Every bound text needs both sides linked** -- the shape needs `boundElements: [{"id": "t_xxx", "type": "text"}]` AND the text needs `containerId: "shape_id"`. If either is missing, the binding won't work.
+- **Include `originalText` and `autoResize: true`** on all text elements -- Excalidraw uses these for proper text reflow.
+- **Include `fontFamily: 1`** on all text elements -- without it, text may not render with the expected hand-drawn font.
+- **Elements overlap when y-coordinates are close** -- always check that text, boxes, and labels don't stack on top of each other
+- **Arrow labels need space** -- long labels like "ATP + NADPH" overflow short arrows. Keep labels short or make arrows wider
+- **Center titles relative to the diagram** -- estimate total width and center the title text over it
+- **Draw decorations LAST** -- cute illustrations (sun, stars, icons) should appear at the end of the array so they're drawn on top
+
--- a/skills/diagramming/excalidraw/scripts/upload.py
+++ b/skills/diagramming/excalidraw/scripts/upload.py
@@ -0,0 +1,133 @@
+#!/usr/bin/env python3
+"""
+Upload an .excalidraw file to excalidraw.com and print a shareable URL.
+
+No account required. The diagram is encrypted client-side (AES-GCM) before
+upload -- the encryption key is embedded in the URL fragment, so the server
+never sees plaintext.
+
+Requirements:
+    pip install cryptography
+
+Usage:
+    python upload.py <path-to-file.excalidraw>
+
+Example:
+    python upload.py ~/diagrams/architecture.excalidraw
+    # prints: https://excalidraw.com/#json=abc123,encryptionKeyHere
+"""
+
+import json
+import os
+import struct
+import sys
+import zlib
+import base64
+import urllib.request
+
+try:
+    from cryptography.hazmat.primitives.ciphers.aead import AESGCM
+except ImportError:
+    print("Error: 'cryptography' package is required for upload.")
+    print("Install it with: pip install cryptography")
+    sys.exit(1)
+
+# Excalidraw public upload endpoint (no auth needed)
+UPLOAD_URL = "https://json.excalidraw.com/api/v2/post/"
+
+
+def concat_buffers(*buffers: bytes) -> bytes:
+    """
+    Build the Excalidraw v2 concat-buffers binary format.
+
+    Layout: [version=1 (4B big-endian)] then for each buffer:
+            [length (4B big-endian)] [data bytes]
+    """
+    parts = [struct.pack(">I", 1)]  # version = 1
+    for buf in buffers:
+        parts.append(struct.pack(">I", len(buf)))
+        parts.append(buf)
+    return b"".join(parts)
+
+
+def upload(excalidraw_json: str) -> str:
+    """
+    Encrypt and upload Excalidraw JSON to excalidraw.com.
+
+    Args:
+        excalidraw_json: The full .excalidraw file content as a string.
+
+    Returns:
+        Shareable URL string.
+    """
+    # 1. Inner payload: concat_buffers(file_metadata, data)
+    file_metadata = json.dumps({}).encode("utf-8")
+    data_bytes = excalidraw_json.encode("utf-8")
+    inner_payload = concat_buffers(file_metadata, data_bytes)
+
+    # 2. Compress with zlib
+    compressed = zlib.compress(inner_payload)
+
+    # 3. AES-GCM 128-bit encrypt
+    raw_key = os.urandom(16)   # 128-bit key
+    iv = os.urandom(12)        # 12-byte nonce
+    aesgcm = AESGCM(raw_key)
+    encrypted = aesgcm.encrypt(iv, compressed, None)
+
+    # 4. Encoding metadata
+    encoding_meta = json.dumps({
+        "version": 2,
+        "compression": "pako@1",
+        "encryption": "AES-GCM",
+    }).encode("utf-8")
+
+    # 5. Outer payload: concat_buffers(encoding_meta, iv, encrypted)
+    payload = concat_buffers(encoding_meta, iv, encrypted)
+
+    # 6. Upload
+    req = urllib.request.Request(UPLOAD_URL, data=payload, method="POST")
+    with urllib.request.urlopen(req, timeout=30) as resp:
+        if resp.status != 200:
+            raise RuntimeError(f"Upload failed with HTTP {resp.status}")
+        result = json.loads(resp.read().decode("utf-8"))
+
+    file_id = result.get("id")
+    if not file_id:
+        raise RuntimeError(f"Upload returned no file ID. Response: {result}")
+
+    # 7. Key as base64url (JWK 'k' format, no padding)
+    key_b64 = base64.urlsafe_b64encode(raw_key).rstrip(b"=").decode("ascii")
+
+    return f"https://excalidraw.com/#json={file_id},{key_b64}"
+
+
+def main():
+    if len(sys.argv) < 2:
+        print("Usage: python upload.py <path-to-file.excalidraw>")
+        sys.exit(1)
+
+    file_path = sys.argv[1]
+
+    if not os.path.isfile(file_path):
+        print(f"Error: File not found: {file_path}")
+        sys.exit(1)
+
+    with open(file_path, "r", encoding="utf-8") as f:
+        content = f.read()
+
+    # Basic validation: should be valid JSON with an "elements" key
+    try:
+        doc = json.loads(content)
+    except json.JSONDecodeError as e:
+        print(f"Error: File is not valid JSON: {e}")
+        sys.exit(1)
+
+    if "elements" not in doc:
+        print("Warning: File does not contain an 'elements' key. Uploading anyway.")
+
+    url = upload(content)
+    print(url)
+
+
+if __name__ == "__main__":
+    main()
--- a/skills/mlops/DESCRIPTION.md
+++ b/skills/mlops/DESCRIPTION.md
@@ -0,0 +1,3 @@
+---
+description: Knowledge and Tools for Machine Learning Operations - tools and frameworks for training, fine-tuning, deploying, and optimizing ML/AI models
+---
--- a/skills/note-taking/DESCRIPTION.md
+++ b/skills/note-taking/DESCRIPTION.md
@@ -0,0 +1,3 @@
+---
+description: Note taking skills, to save information, assist with research, and collab on multi-session planning and information sharing.
+---
--- a/skills/note-taking/obsidian/SKILL.md
+++ b/skills/note-taking/obsidian/SKILL.md
@@ -0,0 +1,57 @@
+---
+name: obsidian
+description: Read, search, and create notes in the Obsidian vault.
+---
+
+# Obsidian Vault
+
+**Location:** `/home/teknium/Documents/Primary Vault`
+
+Note: Path contains a space - always quote it.
+
+## Read a note
+
+```bash
+cat "/home/teknium/Documents/Primary Vault/Note Name.md"
+```
+
+## List notes
+
+```bash
+# All notes
+find "/home/teknium/Documents/Primary Vault" -name "*.md" -type f
+
+# In a specific folder
+ls "/home/teknium/Documents/Primary Vault/AI Research/"
+```
+
+## Search
+
+```bash
+# By filename
+find "/home/teknium/Documents/Primary Vault" -name "*.md" -iname "*keyword*"
+
+# By content
+grep -rli "keyword" "/home/teknium/Documents/Primary Vault" --include="*.md"
+```
+
+## Create a note
+
+```bash
+cat > "/home/teknium/Documents/Primary Vault/New Note.md" << 'ENDNOTE'
+# Title
+
+Content here.
+ENDNOTE
+```
+
+## Append to a note
+
+```bash
+echo "
+New content here." >> "/home/teknium/Documents/Primary Vault/Existing Note.md"
+```
+
+## Wikilinks
+
+Obsidian links notes with `[[Note Name]]` syntax. When creating notes, use these to link related content.
--- a/1
+++ b/1
--- a/tools/init.py
+++ b/tools/init.py
@@ -31,6 +31,8 @@ from .terminal_tool import (
    cleanup_vm,
    cleanup_all_environments,
    get_active_environments_info,
+    register_task_env_overrides,
+    clear_task_env_overrides,
    TERMINAL_TOOL_DESCRIPTION
 )

@@ -57,7 +59,6 @@ from .image_generation_tool import (
 )

 from .skills_tool import (
-    skills_categories,
    skills_list,
    skill_view,
    check_skills_requirements,
@@ -83,6 +84,56 @@ from .browser_tool import (
    BROWSER_TOOL_SCHEMAS
 )

+# Cronjob management tools (CLI-only, hermes-cli toolset)
+from .cronjob_tools import (
+    schedule_cronjob,
+    list_cronjobs,
+    remove_cronjob,
+    check_cronjob_requirements,
+    get_cronjob_tool_definitions,
+    SCHEDULE_CRONJOB_SCHEMA,
+    LIST_CRONJOBS_SCHEMA,
+    REMOVE_CRONJOB_SCHEMA
+)
+
+# RL Training tools (Tinker-Atropos)
+from .rl_training_tool import (
+    rl_list_environments,
+    rl_select_environment,
+    rl_get_current_config,
+    rl_edit_config,
+    rl_start_training,
+    rl_check_status,
+    rl_stop_training,
+    rl_get_results,
+    rl_list_runs,
+    rl_test_inference,
+    check_rl_api_keys,
+    get_missing_keys,
+)
+
+# File manipulation tools (read, write, patch, search)
+from .file_tools import (
+    read_file_tool,
+    write_file_tool,
+    patch_tool,
+    search_tool,
+    get_file_tools,
+    clear_file_ops_cache,
+)
+
+# Text-to-speech tools (Edge TTS / ElevenLabs / OpenAI)
+from .tts_tool import (
+    text_to_speech_tool,
+    check_tts_requirements,
+)
+
+# File tools have no external requirements - they use the terminal backend
+def check_file_requirements():
+    """File tools only require terminal backend to be available."""
+    from .terminal_tool import check_terminal_requirements
+    return check_terminal_requirements()
+
 __all__ = [
    # Web tools
    'web_search_tool',
@@ -95,6 +146,8 @@ __all__ = [
    'cleanup_vm',
    'cleanup_all_environments',
    'get_active_environments_info',
+    'register_task_env_overrides',
+    'clear_task_env_overrides',
    'TERMINAL_TOOL_DESCRIPTION',
    # Terminal tools (Hecate/MorphCloud backend)
    'terminal_hecate_tool',
@@ -110,7 +163,6 @@ __all__ = [
    'image_generate_tool',
    'check_image_generation_requirements',
    # Skills tools
-    'skills_categories',
    'skills_list',
    'skill_view',
    'check_skills_requirements',
@@ -131,5 +183,38 @@ __all__ = [
    'get_active_browser_sessions',
    'check_browser_requirements',
    'BROWSER_TOOL_SCHEMAS',
+    # Cronjob management tools (CLI-only)
+    'schedule_cronjob',
+    'list_cronjobs',
+    'remove_cronjob',
+    'check_cronjob_requirements',
+    'get_cronjob_tool_definitions',
+    'SCHEDULE_CRONJOB_SCHEMA',
+    'LIST_CRONJOBS_SCHEMA',
+    'REMOVE_CRONJOB_SCHEMA',
+    # RL Training tools
+    'rl_list_environments',
+    'rl_select_environment',
+    'rl_get_current_config',
+    'rl_edit_config',
+    'rl_start_training',
+    'rl_check_status',
+    'rl_stop_training',
+    'rl_get_results',
+    'rl_list_runs',
+    'rl_test_inference',
+    'check_rl_api_keys',
+    'get_missing_keys',
+    # File manipulation tools
+    'read_file_tool',
+    'write_file_tool',
+    'patch_tool',
+    'search_tool',
+    'get_file_tools',
+    'clear_file_ops_cache',
+    'check_file_requirements',
+    # Text-to-speech tools
+    'text_to_speech_tool',
+    'check_tts_requirements',
 ]

--- a/tools/browser_tool.py
+++ b/tools/browser_tool.py
@@ -51,6 +51,9 @@ import subprocess
 import shutil
 import sys
 import asyncio
+import tempfile
+import threading
+import time
 import requests
 from typing import Dict, Any, Optional, List
 from pathlib import Path
@@ -86,6 +89,22 @@ _active_sessions: Dict[str, Dict[str, str]] = {}  # task_id -> {session_name, bb
 # Flag to track if cleanup has been done
 _cleanup_done = False

+# =============================================================================
+# Inactivity Timeout Configuration
+# =============================================================================
+
+# Session inactivity timeout (seconds) - cleanup if no activity for this long
+# Default: 2 minutes. Can be configured via environment variable.
+BROWSER_SESSION_INACTIVITY_TIMEOUT = int(os.environ.get("BROWSER_INACTIVITY_TIMEOUT", "120"))
+
+# Track last activity time per session
+_session_last_activity: Dict[str, float] = {}
+
+# Background cleanup thread state
+_cleanup_thread = None
+_cleanup_running = False
+_cleanup_lock = threading.Lock()
+

 def _emergency_cleanup_all_sessions():
    """
@@ -157,6 +176,100 @@ except (OSError, AttributeError):
    pass  # Signal handling not available (e.g., Windows or worker process)


+# =============================================================================
+# Inactivity Cleanup Functions
+# =============================================================================
+
+def _cleanup_inactive_browser_sessions():
+    """
+    Clean up browser sessions that have been inactive for longer than the timeout.
+    
+    This function is called periodically by the background cleanup thread to
+    automatically close sessions that haven't been used recently, preventing
+    orphaned Browserbase sessions from accumulating.
+    """
+    current_time = time.time()
+    sessions_to_cleanup = []
+    
+    with _cleanup_lock:
+        for task_id, last_time in list(_session_last_activity.items()):
+            if current_time - last_time > BROWSER_SESSION_INACTIVITY_TIMEOUT:
+                sessions_to_cleanup.append(task_id)
+    
+    for task_id in sessions_to_cleanup:
+        try:
+            if not os.getenv("HERMES_QUIET"):
+                elapsed = int(current_time - _session_last_activity.get(task_id, current_time))
+                print(f"[browser_tool] Cleaning up inactive session for task: {task_id} "
+                      f"(inactive for {elapsed}s)", file=sys.stderr)
+            cleanup_browser(task_id)
+            with _cleanup_lock:
+                if task_id in _session_last_activity:
+                    del _session_last_activity[task_id]
+        except Exception as e:
+            if not os.getenv("HERMES_QUIET"):
+                print(f"[browser_tool] Error cleaning up inactive session {task_id}: {e}", file=sys.stderr)
+
+
+def _browser_cleanup_thread_worker():
+    """
+    Background thread that periodically cleans up inactive browser sessions.
+    
+    Runs every 30 seconds and checks for sessions that haven't been used
+    within the BROWSER_SESSION_INACTIVITY_TIMEOUT period.
+    """
+    global _cleanup_running
+    
+    while _cleanup_running:
+        try:
+            _cleanup_inactive_browser_sessions()
+        except Exception as e:
+            if not os.getenv("HERMES_QUIET"):
+                print(f"[browser_tool] Cleanup thread error: {e}", file=sys.stderr)
+        
+        # Sleep in 1-second intervals so we can stop quickly if needed
+        for _ in range(30):
+            if not _cleanup_running:
+                break
+            time.sleep(1)
+
+
+def _start_browser_cleanup_thread():
+    """Start the background cleanup thread if not already running."""
+    global _cleanup_thread, _cleanup_running
+    
+    with _cleanup_lock:
+        if _cleanup_thread is None or not _cleanup_thread.is_alive():
+            _cleanup_running = True
+            _cleanup_thread = threading.Thread(
+                target=_browser_cleanup_thread_worker,
+                daemon=True,
+                name="browser-cleanup"
+            )
+            _cleanup_thread.start()
+            if not os.getenv("HERMES_QUIET"):
+                print(f"[browser_tool] Started inactivity cleanup thread "
+                      f"(timeout: {BROWSER_SESSION_INACTIVITY_TIMEOUT}s)", file=sys.stderr)
+
+
+def _stop_browser_cleanup_thread():
+    """Stop the background cleanup thread."""
+    global _cleanup_running
+    _cleanup_running = False
+    if _cleanup_thread is not None:
+        _cleanup_thread.join(timeout=5)
+
+
+def _update_session_activity(task_id: str):
+    """Update the last activity timestamp for a session."""
+    with _cleanup_lock:
+        _session_last_activity[task_id] = time.time()
+
+
+# Register cleanup thread stop on exit
+atexit.register(_stop_browser_cleanup_thread)
+
+
 # ============================================================================
 # Tool Schemas
 # ============================================================================
@@ -461,6 +574,7 @@ def _get_session_info(task_id: Optional[str] = None) -> Dict[str, str]:
    Get or create session info for the given task.
    
    Creates a Browserbase session with proxies enabled if one doesn't exist.
+    Also starts the inactivity cleanup thread and updates activity tracking.
    
    Args:
        task_id: Unique identifier for the task
@@ -471,6 +585,12 @@ def _get_session_info(task_id: Optional[str] = None) -> Dict[str, str]:
    if task_id is None:
        task_id = "default"
    
+    # Start the cleanup thread if not running (handles inactivity timeouts)
+    _start_browser_cleanup_thread()
+    
+    # Update activity timestamp for this session
+    _update_session_activity(task_id)
+    
    # Check if we already have a session for this task
    if task_id in _active_sessions:
        return _active_sessions[task_id]
@@ -525,17 +645,25 @@ def _find_agent_browser() -> str:
    """
    Find the agent-browser CLI executable.
    
+    Checks in order: PATH, local node_modules/.bin/, npx fallback.
+    
    Returns:
        Path to agent-browser executable
        
    Raises:
        FileNotFoundError: If agent-browser is not installed
    """
-    # Check if it's in PATH
+    # Check if it's in PATH (global install)
    which_result = shutil.which("agent-browser")
    if which_result:
        return which_result
    
+    # Check local node_modules/.bin/ (npm install in repo root)
+    repo_root = Path(__file__).parent.parent
+    local_bin = repo_root / "node_modules" / ".bin" / "agent-browser"
+    if local_bin.exists():
+        return str(local_bin)
+    
    # Check common npx locations
    npx_path = shutil.which("npx")
    if npx_path:
@@ -543,6 +671,7 @@ def _find_agent_browser() -> str:
    
    raise FileNotFoundError(
        "agent-browser CLI not found. Install it with: npm install -g agent-browser\n"
+        "Or run 'npm install' in the repo root to install locally.\n"
        "Or ensure npx is available in your PATH."
    )

@@ -589,12 +718,26 @@ def _run_browser_command(
    ] + args
    
    try:
+        # Give each task its own socket directory to prevent concurrency conflicts.
+        # Without this, parallel workers fight over the same default socket path,
+        # causing "Failed to create socket directory: Permission denied" errors.
+        task_socket_dir = os.path.join(
+            tempfile.gettempdir(), 
+            f"agent-browser-{session_info['session_name']}"
+        )
+        os.makedirs(task_socket_dir, exist_ok=True)
+        
+        browser_env = {
+            **os.environ,
+            "AGENT_BROWSER_SOCKET_DIR": task_socket_dir,
+        }
+        
        result = subprocess.run(
            cmd_parts,
            capture_output=True,
            text=True,
            timeout=timeout,
-            env={**os.environ}
+            env=browser_env,
        )
        
        # Parse JSON output
@@ -1334,7 +1477,7 @@ def cleanup_browser(task_id: Optional[str] = None) -> None:
    """
    Clean up browser session for a task.
    
-    Called automatically when a task completes.
+    Called automatically when a task completes or when inactivity timeout is reached.
    Closes both the agent-browser session and the Browserbase session.
    
    Args:
@@ -1368,11 +1511,23 @@ def cleanup_browser(task_id: Optional[str] = None) -> None:
        except Exception as e:
            print(f"[browser_tool] Exception during BrowserBase session close: {e}", file=sys.stderr)
        
+        # Clean up per-task socket directory
+        session_name = session_info.get("session_name", "")
+        if session_name:
+            socket_dir = os.path.join(tempfile.gettempdir(), f"agent-browser-{session_name}")
+            if os.path.exists(socket_dir):
+                shutil.rmtree(socket_dir, ignore_errors=True)
+        
        del _active_sessions[task_id]
        if not os.getenv("HERMES_QUIET"):
            print(f"[browser_tool] Removed task {task_id} from active sessions", file=sys.stderr)
    elif not os.getenv("HERMES_QUIET"):
        print(f"[browser_tool] No active session found for task_id: {task_id}", file=sys.stderr)
+    
+    # Clean up activity tracking
+    with _cleanup_lock:
+        if task_id in _session_last_activity:
+            del _session_last_activity[task_id]


 def cleanup_all_browsers() -> None:
@@ -1383,6 +1538,10 @@ def cleanup_all_browsers() -> None:
    """
    for task_id in list(_active_sessions.keys()):
        cleanup_browser(task_id)
+    
+    # Clear any remaining activity tracking
+    with _cleanup_lock:
+        _session_last_activity.clear()


 def get_active_browser_sessions() -> Dict[str, Dict[str, str]]:
--- a/tools/cronjob_tools.py
+++ b/tools/cronjob_tools.py
@@ -0,0 +1,374 @@
+"""
+Cron job management tools for Hermes Agent.
+
+These tools allow the agent to schedule, list, and remove automated tasks.
+Only available when running via CLI (hermes-cli toolset).
+
+IMPORTANT: Cronjobs run in isolated sessions with NO prior context.
+The prompt must contain ALL necessary information.
+"""
+
+import json
+import os
+from typing import Optional
+
+# Import from cron module (will be available when properly installed)
+import sys
+from pathlib import Path
+sys.path.insert(0, str(Path(__file__).parent.parent))
+
+from cron.jobs import create_job, get_job, list_jobs, remove_job
+
+
+# =============================================================================
+# Tool: schedule_cronjob
+# =============================================================================
+
+def schedule_cronjob(
+    prompt: str,
+    schedule: str,
+    name: Optional[str] = None,
+    repeat: Optional[int] = None,
+    deliver: Optional[str] = None,
+    task_id: str = None
+) -> str:
+    """
+    Schedule an automated task to run the agent on a schedule.
+    
+    IMPORTANT: When the cronjob runs, it starts a COMPLETELY FRESH session.
+    The agent will have NO memory of this conversation or any prior context.
+    Therefore, the prompt MUST contain ALL necessary information:
+    - Full context of what needs to be done
+    - Specific file paths, URLs, or identifiers
+    - Clear success criteria
+    - Any relevant background information
+    
+    BAD prompt:  "Check on that server issue"
+    GOOD prompt: "SSH into server 192.168.1.100 as user 'deploy', check if nginx 
+                  is running with 'systemctl status nginx', and verify the site 
+                  https://example.com returns HTTP 200. Report any issues found."
+    
+    Args:
+        prompt: Complete, self-contained instructions for the future agent.
+                Must include ALL context needed - the agent won't remember anything.
+        schedule: When to run. Either:
+                  - Duration for one-shot: "30m", "2h", "1d" (runs once)
+                  - Interval: "every 30m", "every 2h" (recurring)
+                  - Cron expression: "0 9 * * *" (daily at 9am)
+                  - ISO timestamp: "2026-02-03T14:00:00" (one-shot at specific time)
+        name: Optional human-friendly name for the job (for listing/management)
+        repeat: How many times to run. Omit for default behavior:
+                - One-shot schedules default to repeat=1 (run once)
+                - Intervals/cron default to forever
+                - Set repeat=5 to run 5 times then auto-delete
+        deliver: Where to send the output. Options:
+                 - "origin": Back to where this job was created (default)
+                 - "local": Save to local files only (~/.hermes/cron/output/)
+                 - "telegram": Send to Telegram home channel
+                 - "discord": Send to Discord home channel
+                 - "telegram:123456": Send to specific chat ID
+    
+    Returns:
+        JSON with job_id, next_run time, and confirmation
+    """
+    # Get origin info from environment if available
+    origin = None
+    origin_platform = os.getenv("HERMES_SESSION_PLATFORM")
+    origin_chat_id = os.getenv("HERMES_SESSION_CHAT_ID")
+    if origin_platform and origin_chat_id:
+        origin = {
+            "platform": origin_platform,
+            "chat_id": origin_chat_id,
+            "chat_name": os.getenv("HERMES_SESSION_CHAT_NAME"),
+        }
+    
+    try:
+        job = create_job(
+            prompt=prompt,
+            schedule=schedule,
+            name=name,
+            repeat=repeat,
+            deliver=deliver,
+            origin=origin
+        )
+        
+        # Format repeat info for display
+        times = job["repeat"].get("times")
+        if times is None:
+            repeat_display = "forever"
+        elif times == 1:
+            repeat_display = "once"
+        else:
+            repeat_display = f"{times} times"
+        
+        return json.dumps({
+            "success": True,
+            "job_id": job["id"],
+            "name": job["name"],
+            "schedule": job["schedule_display"],
+            "repeat": repeat_display,
+            "deliver": job.get("deliver", "local"),
+            "next_run_at": job["next_run_at"],
+            "message": f"Cronjob '{job['name']}' created. It will run {repeat_display}, deliver to {job.get('deliver', 'local')}, next at {job['next_run_at']}."
+        }, indent=2)
+        
+    except Exception as e:
+        return json.dumps({
+            "success": False,
+            "error": str(e)
+        }, indent=2)
+
+
+SCHEDULE_CRONJOB_SCHEMA = {
+    "name": "schedule_cronjob",
+    "description": """Schedule an automated task to run the agent on a schedule.
+
+⚠️ CRITICAL: The cronjob runs in a FRESH SESSION with NO CONTEXT from this conversation.
+The prompt must be COMPLETELY SELF-CONTAINED with ALL necessary information including:
+- Full context and background
+- Specific file paths, URLs, server addresses
+- Clear instructions and success criteria
+- Any credentials or configuration details
+
+The future agent will NOT remember anything from the current conversation.
+
+SCHEDULE FORMATS:
+- One-shot: "30m", "2h", "1d" (runs once after delay)
+- Interval: "every 30m", "every 2h" (recurring)  
+- Cron: "0 9 * * *" (cron expression for precise scheduling)
+- Timestamp: "2026-02-03T14:00:00" (specific date/time)
+
+REPEAT BEHAVIOR:
+- One-shot schedules: run once by default
+- Intervals/cron: run forever by default
+- Set repeat=N to run exactly N times then auto-delete
+
+DELIVERY OPTIONS (where output goes):
+- "origin": Back to current chat (default if in messaging platform)
+- "local": Save to local files only (default if in CLI)
+- "telegram": Send to Telegram home channel
+- "discord": Send to Discord home channel
+- "telegram:123456": Send to specific chat (if user provides ID)
+
+Use for: reminders, periodic checks, scheduled reports, automated maintenance.""",
+    "parameters": {
+        "type": "object",
+        "properties": {
+            "prompt": {
+                "type": "string",
+                "description": "Complete, self-contained instructions. Must include ALL context - the future agent will have NO memory of this conversation."
+            },
+            "schedule": {
+                "type": "string",
+                "description": "When to run: '30m' (once in 30min), 'every 30m' (recurring), '0 9 * * *' (cron), or ISO timestamp"
+            },
+            "name": {
+                "type": "string",
+                "description": "Optional human-friendly name for the job"
+            },
+            "repeat": {
+                "type": "integer",
+                "description": "How many times to run. Omit for default (once for one-shot, forever for recurring). Set to N for exactly N runs."
+            },
+            "deliver": {
+                "type": "string",
+                "description": "Where to send output: 'origin' (back to this chat), 'local' (files only), 'telegram', 'discord', or 'platform:chat_id'"
+            }
+        },
+        "required": ["prompt", "schedule"]
+    }
+}
+
+
+# =============================================================================
+# Tool: list_cronjobs
+# =============================================================================
+
+def list_cronjobs(include_disabled: bool = False, task_id: str = None) -> str:
+    """
+    List all scheduled cronjobs.
+    
+    Returns information about each job including:
+    - Job ID (needed for removal)
+    - Name
+    - Schedule (human-readable)
+    - Repeat status (completed/total or 'forever')
+    - Next scheduled run time
+    - Last run time and status (if any)
+    
+    Args:
+        include_disabled: Whether to include disabled/completed jobs
+    
+    Returns:
+        JSON array of all scheduled jobs
+    """
+    try:
+        jobs = list_jobs(include_disabled=include_disabled)
+        
+        formatted_jobs = []
+        for job in jobs:
+            # Format repeat status
+            times = job["repeat"].get("times")
+            completed = job["repeat"].get("completed", 0)
+            if times is None:
+                repeat_status = "forever"
+            else:
+                repeat_status = f"{completed}/{times}"
+            
+            formatted_jobs.append({
+                "job_id": job["id"],
+                "name": job["name"],
+                "prompt_preview": job["prompt"][:100] + "..." if len(job["prompt"]) > 100 else job["prompt"],
+                "schedule": job["schedule_display"],
+                "repeat": repeat_status,
+                "deliver": job.get("deliver", "local"),
+                "next_run_at": job.get("next_run_at"),
+                "last_run_at": job.get("last_run_at"),
+                "last_status": job.get("last_status"),
+                "enabled": job.get("enabled", True)
+            })
+        
+        return json.dumps({
+            "success": True,
+            "count": len(formatted_jobs),
+            "jobs": formatted_jobs
+        }, indent=2)
+        
+    except Exception as e:
+        return json.dumps({
+            "success": False,
+            "error": str(e)
+        }, indent=2)
+
+
+LIST_CRONJOBS_SCHEMA = {
+    "name": "list_cronjobs",
+    "description": """List all scheduled cronjobs with their IDs, schedules, and status.
+
+Use this to:
+- See what jobs are currently scheduled
+- Find job IDs for removal with remove_cronjob
+- Check job status and next run times
+
+Returns job_id, name, schedule, repeat status, next/last run times.""",
+    "parameters": {
+        "type": "object",
+        "properties": {
+            "include_disabled": {
+                "type": "boolean",
+                "description": "Include disabled/completed jobs in the list (default: false)"
+            }
+        },
+        "required": []
+    }
+}
+
+
+# =============================================================================
+# Tool: remove_cronjob
+# =============================================================================
+
+def remove_cronjob(job_id: str, task_id: str = None) -> str:
+    """
+    Remove a scheduled cronjob by its ID.
+    
+    Use list_cronjobs first to find the job_id of the job you want to remove.
+    
+    Args:
+        job_id: The ID of the job to remove (from list_cronjobs output)
+    
+    Returns:
+        JSON confirmation of removal
+    """
+    try:
+        job = get_job(job_id)
+        if not job:
+            return json.dumps({
+                "success": False,
+                "error": f"Job with ID '{job_id}' not found. Use list_cronjobs to see available jobs."
+            }, indent=2)
+        
+        removed = remove_job(job_id)
+        if removed:
+            return json.dumps({
+                "success": True,
+                "message": f"Cronjob '{job['name']}' (ID: {job_id}) has been removed.",
+                "removed_job": {
+                    "id": job_id,
+                    "name": job["name"],
+                    "schedule": job["schedule_display"]
+                }
+            }, indent=2)
+        else:
+            return json.dumps({
+                "success": False,
+                "error": f"Failed to remove job '{job_id}'"
+            }, indent=2)
+            
+    except Exception as e:
+        return json.dumps({
+            "success": False,
+            "error": str(e)
+        }, indent=2)
+
+
+REMOVE_CRONJOB_SCHEMA = {
+    "name": "remove_cronjob",
+    "description": """Remove a scheduled cronjob by its ID.
+
+Use list_cronjobs first to find the job_id of the job you want to remove.
+Jobs that have completed their repeat count are auto-removed, but you can
+use this to cancel a job before it completes.""",
+    "parameters": {
+        "type": "object",
+        "properties": {
+            "job_id": {
+                "type": "string",
+                "description": "The ID of the cronjob to remove (from list_cronjobs output)"
+            }
+        },
+        "required": ["job_id"]
+    }
+}
+
+
+# =============================================================================
+# Requirements check
+# =============================================================================
+
+def check_cronjob_requirements() -> bool:
+    """
+    Check if cronjob tools can be used.
+    
+    Only available in interactive CLI mode (HERMES_INTERACTIVE=1).
+    """
+    return os.getenv("HERMES_INTERACTIVE") == "1"
+
+
+# =============================================================================
+# Exports
+# =============================================================================
+
+def get_cronjob_tool_definitions():
+    """Return tool definitions for cronjob management."""
+    return [
+        SCHEDULE_CRONJOB_SCHEMA,
+        LIST_CRONJOBS_SCHEMA,
+        REMOVE_CRONJOB_SCHEMA
+    ]
+
+
+# For direct testing
+if __name__ == "__main__":
+    # Test the tools
+    print("Testing schedule_cronjob:")
+    result = schedule_cronjob(
+        prompt="Test prompt for cron job",
+        schedule="5m",
+        name="Test Job"
+    )
+    print(result)
+    
+    print("\nTesting list_cronjobs:")
+    result = list_cronjobs()
+    print(result)
--- a/tools/file_operations.py
+++ b/tools/file_operations.py
@@ -0,0 +1,940 @@
+#!/usr/bin/env python3
+"""
+File Operations Module
+
+Provides file manipulation capabilities (read, write, patch, search) that work
+across all terminal backends (local, docker, singularity, ssh, modal).
+
+The key insight is that all file operations can be expressed as shell commands,
+so we wrap the terminal backend's execute() interface to provide a unified file API.
+
+Usage:
+    from tools.file_operations import ShellFileOperations
+    from tools.terminal_tool import _active_environments
+    
+    # Get file operations for a terminal environment
+    file_ops = ShellFileOperations(terminal_env)
+    
+    # Read a file
+    result = file_ops.read_file("/path/to/file.py")
+    
+    # Write a file
+    result = file_ops.write_file("/path/to/new.py", "print('hello')")
+    
+    # Search for content
+    result = file_ops.search("TODO", path=".", file_glob="*.py")
+"""
+
+import os
+import re
+import json
+import uuid
+import difflib
+from abc import ABC, abstractmethod
+from dataclasses import dataclass, field
+from typing import Optional, List, Dict, Any, Tuple
+from pathlib import Path
+
+
+# =============================================================================
+# Result Data Classes
+# =============================================================================
+
+@dataclass
+class ReadResult:
+    """Result from reading a file."""
+    content: str = ""
+    total_lines: int = 0
+    file_size: int = 0
+    truncated: bool = False
+    hint: Optional[str] = None
+    is_binary: bool = False
+    is_image: bool = False
+    base64_content: Optional[str] = None
+    mime_type: Optional[str] = None
+    dimensions: Optional[str] = None  # For images: "WIDTHxHEIGHT"
+    error: Optional[str] = None
+    similar_files: List[str] = field(default_factory=list)
+    
+    def to_dict(self) -> dict:
+        return {k: v for k, v in self.__dict__.items() if v is not None and v != [] and v != ""}
+
+
+@dataclass
+class WriteResult:
+    """Result from writing a file."""
+    bytes_written: int = 0
+    dirs_created: bool = False
+    error: Optional[str] = None
+    warning: Optional[str] = None
+    
+    def to_dict(self) -> dict:
+        return {k: v for k, v in self.__dict__.items() if v is not None}
+
+
+@dataclass
+class PatchResult:
+    """Result from patching a file."""
+    success: bool = False
+    diff: str = ""
+    files_modified: List[str] = field(default_factory=list)
+    files_created: List[str] = field(default_factory=list)
+    files_deleted: List[str] = field(default_factory=list)
+    lint: Optional[Dict[str, Any]] = None
+    error: Optional[str] = None
+    
+    def to_dict(self) -> dict:
+        result = {"success": self.success}
+        if self.diff:
+            result["diff"] = self.diff
+        if self.files_modified:
+            result["files_modified"] = self.files_modified
+        if self.files_created:
+            result["files_created"] = self.files_created
+        if self.files_deleted:
+            result["files_deleted"] = self.files_deleted
+        if self.lint:
+            result["lint"] = self.lint
+        if self.error:
+            result["error"] = self.error
+        return result
+
+
+@dataclass
+class SearchMatch:
+    """A single search match."""
+    path: str
+    line_number: int
+    content: str
+    mtime: float = 0.0  # Modification time for sorting
+
+
+@dataclass
+class SearchResult:
+    """Result from searching."""
+    matches: List[SearchMatch] = field(default_factory=list)
+    files: List[str] = field(default_factory=list)
+    counts: Dict[str, int] = field(default_factory=dict)
+    total_count: int = 0
+    truncated: bool = False
+    error: Optional[str] = None
+    
+    def to_dict(self) -> dict:
+        result = {"total_count": self.total_count}
+        if self.matches:
+            result["matches"] = [
+                {"path": m.path, "line": m.line_number, "content": m.content}
+                for m in self.matches
+            ]
+        if self.files:
+            result["files"] = self.files
+        if self.counts:
+            result["counts"] = self.counts
+        if self.truncated:
+            result["truncated"] = True
+        if self.error:
+            result["error"] = self.error
+        return result
+
+
+@dataclass
+class LintResult:
+    """Result from linting a file."""
+    success: bool = True
+    skipped: bool = False
+    output: str = ""
+    message: str = ""
+    
+    def to_dict(self) -> dict:
+        if self.skipped:
+            return {"status": "skipped", "message": self.message}
+        return {
+            "status": "ok" if self.success else "error",
+            "output": self.output
+        }
+
+
+@dataclass
+class ExecuteResult:
+    """Result from executing a shell command."""
+    stdout: str = ""
+    exit_code: int = 0
+
+
+# =============================================================================
+# Abstract Interface
+# =============================================================================
+
+class FileOperations(ABC):
+    """Abstract interface for file operations across terminal backends."""
+    
+    @abstractmethod
+    def read_file(self, path: str, offset: int = 1, limit: int = 500) -> ReadResult:
+        """Read a file with pagination support."""
+        ...
+    
+    @abstractmethod
+    def write_file(self, path: str, content: str) -> WriteResult:
+        """Write content to a file, creating directories as needed."""
+        ...
+    
+    @abstractmethod
+    def patch_replace(self, path: str, old_string: str, new_string: str, 
+                      replace_all: bool = False) -> PatchResult:
+        """Replace text in a file using fuzzy matching."""
+        ...
+    
+    @abstractmethod
+    def patch_v4a(self, patch_content: str) -> PatchResult:
+        """Apply a V4A format patch."""
+        ...
+    
+    @abstractmethod
+    def search(self, pattern: str, path: str = ".", target: str = "content",
+               file_glob: Optional[str] = None, limit: int = 50, offset: int = 0,
+               output_mode: str = "content", context: int = 0) -> SearchResult:
+        """Search for content or files."""
+        ...
+
+
+# =============================================================================
+# Shell-based Implementation
+# =============================================================================
+
+# Binary file extensions (fast path check)
+BINARY_EXTENSIONS = {
+    # Images
+    '.png', '.jpg', '.jpeg', '.gif', '.webp', '.bmp', '.ico', '.tiff', '.tif',
+    '.svg',  # SVG is text but often treated as binary
+    # Audio/Video
+    '.mp3', '.mp4', '.wav', '.avi', '.mov', '.mkv', '.flac', '.ogg', '.webm',
+    # Archives
+    '.zip', '.tar', '.gz', '.bz2', '.xz', '.7z', '.rar',
+    # Documents
+    '.pdf', '.doc', '.docx', '.xls', '.xlsx', '.ppt', '.pptx',
+    # Compiled/Binary
+    '.exe', '.dll', '.so', '.dylib', '.o', '.a', '.pyc', '.pyo', '.class',
+    '.wasm', '.bin',
+    # Fonts
+    '.ttf', '.otf', '.woff', '.woff2', '.eot',
+    # Other
+    '.db', '.sqlite', '.sqlite3',
+}
+
+# Image extensions (subset of binary that we can return as base64)
+IMAGE_EXTENSIONS = {'.png', '.jpg', '.jpeg', '.gif', '.webp', '.bmp', '.ico'}
+
+# Linters by file extension
+LINTERS = {
+    '.py': 'python -m py_compile {file} 2>&1',
+    '.js': 'node --check {file} 2>&1',
+    '.ts': 'npx tsc --noEmit {file} 2>&1',
+    '.go': 'go vet {file} 2>&1',
+    '.rs': 'rustfmt --check {file} 2>&1',
+}
+
+# Max limits for read operations
+MAX_LINES = 2000
+MAX_LINE_LENGTH = 2000
+MAX_FILE_SIZE = 50 * 1024  # 50KB
+
+
+class ShellFileOperations(FileOperations):
+    """
+    File operations implemented via shell commands.
+    
+    Works with ANY terminal backend that has execute(command, cwd) method.
+    This includes local, docker, singularity, ssh, and modal environments.
+    """
+    
+    def __init__(self, terminal_env, cwd: str = None):
+        """
+        Initialize file operations with a terminal environment.
+        
+        Args:
+            terminal_env: Any object with execute(command, cwd) method.
+                         Returns {"output": str, "returncode": int}
+            cwd: Working directory (defaults to env's cwd or current directory)
+        """
+        self.env = terminal_env
+        # Determine cwd from various possible sources.
+        # IMPORTANT: do NOT fall back to os.getcwd() -- that's the HOST's local
+        # path which doesn't exist inside container/cloud backends (modal, docker).
+        # If nothing provides a cwd, use "/" as a safe universal default.
+        self.cwd = cwd or getattr(terminal_env, 'cwd', None) or \
+                   getattr(getattr(terminal_env, 'config', None), 'cwd', None) or "/"
+        
+        # Cache for command availability checks
+        self._command_cache: Dict[str, bool] = {}
+    
+    def _exec(self, command: str, cwd: str = None, timeout: int = None) -> ExecuteResult:
+        """Execute command via terminal backend."""
+        kwargs = {}
+        if timeout:
+            kwargs['timeout'] = timeout
+        
+        result = self.env.execute(command, cwd=cwd or self.cwd, **kwargs)
+        return ExecuteResult(
+            stdout=result.get("output", ""),
+            exit_code=result.get("returncode", 0)
+        )
+    
+    def _has_command(self, cmd: str) -> bool:
+        """Check if a command exists in the environment (cached)."""
+        if cmd not in self._command_cache:
+            result = self._exec(f"command -v {cmd} >/dev/null 2>&1 && echo 'yes'")
+            self._command_cache[cmd] = result.stdout.strip() == 'yes'
+        return self._command_cache[cmd]
+    
+    def _is_likely_binary(self, path: str, content_sample: str = None) -> bool:
+        """
+        Check if a file is likely binary.
+        
+        Uses extension check (fast) + content analysis (fallback).
+        """
+        ext = os.path.splitext(path)[1].lower()
+        if ext in BINARY_EXTENSIONS:
+            return True
+        
+        # Content analysis: >30% non-printable chars = binary
+        if content_sample:
+            if not content_sample:
+                return False
+            non_printable = sum(1 for c in content_sample[:1000] 
+                               if ord(c) < 32 and c not in '\n\r\t')
+            return non_printable / min(len(content_sample), 1000) > 0.30
+        
+        return False
+    
+    def _is_image(self, path: str) -> bool:
+        """Check if file is an image we can return as base64."""
+        ext = os.path.splitext(path)[1].lower()
+        return ext in IMAGE_EXTENSIONS
+    
+    def _add_line_numbers(self, content: str, start_line: int = 1) -> str:
+        """Add line numbers to content in LINE_NUM|CONTENT format."""
+        lines = content.split('\n')
+        numbered = []
+        for i, line in enumerate(lines, start=start_line):
+            # Truncate long lines
+            if len(line) > MAX_LINE_LENGTH:
+                line = line[:MAX_LINE_LENGTH] + "... [truncated]"
+            numbered.append(f"{i:6d}|{line}")
+        return '\n'.join(numbered)
+    
+    def _expand_path(self, path: str) -> str:
+        """
+        Expand shell-style paths like ~ and ~user to absolute paths.
+        
+        This must be done BEFORE shell escaping, since ~ doesn't expand
+        inside single quotes.
+        """
+        if not path:
+            return path
+        
+        # Handle ~ and ~user
+        if path.startswith('~'):
+            # Get home directory via the terminal environment
+            result = self._exec("echo $HOME")
+            if result.exit_code == 0 and result.stdout.strip():
+                home = result.stdout.strip()
+                if path == '~':
+                    return home
+                elif path.startswith('~/'):
+                    return home + path[1:]  # Replace ~ with home
+                # ~username format - let shell expand it
+                expand_result = self._exec(f"echo {path}")
+                if expand_result.exit_code == 0:
+                    return expand_result.stdout.strip()
+        
+        return path
+    
+    def _escape_shell_arg(self, arg: str) -> str:
+        """Escape a string for safe use in shell commands."""
+        # Use single quotes and escape any single quotes in the string
+        return "'" + arg.replace("'", "'\"'\"'") + "'"
+    
+    def _unified_diff(self, old_content: str, new_content: str, filename: str) -> str:
+        """Generate unified diff between old and new content."""
+        old_lines = old_content.splitlines(keepends=True)
+        new_lines = new_content.splitlines(keepends=True)
+        diff = difflib.unified_diff(
+            old_lines, new_lines,
+            fromfile=f"a/{filename}",
+            tofile=f"b/{filename}"
+        )
+        return ''.join(diff)
+    
+    # =========================================================================
+    # READ Implementation
+    # =========================================================================
+    
+    def read_file(self, path: str, offset: int = 1, limit: int = 500) -> ReadResult:
+        """
+        Read a file with pagination, binary detection, and line numbers.
+        
+        Args:
+            path: File path (absolute or relative to cwd)
+            offset: Line number to start from (1-indexed, default 1)
+            limit: Maximum lines to return (default 500, max 2000)
+        
+        Returns:
+            ReadResult with content, metadata, or error info
+        """
+        # Expand ~ and other shell paths
+        path = self._expand_path(path)
+        
+        # Clamp limit
+        limit = min(limit, MAX_LINES)
+        
+        # Check if file exists and get metadata
+        stat_cmd = f"stat -c '%s' {self._escape_shell_arg(path)} 2>/dev/null"
+        stat_result = self._exec(stat_cmd)
+        
+        if stat_result.exit_code != 0:
+            # File not found - try to suggest similar files
+            return self._suggest_similar_files(path)
+        
+        try:
+            file_size = int(stat_result.stdout.strip())
+        except ValueError:
+            file_size = 0
+        
+        # Check if file is too large
+        if file_size > MAX_FILE_SIZE:
+            # Still try to read, but warn
+            pass
+        
+        # Check if it's an image - return base64
+        if self._is_image(path):
+            return self._read_image(path)
+        
+        # Read a sample to check for binary content
+        sample_cmd = f"head -c 1000 {self._escape_shell_arg(path)} 2>/dev/null"
+        sample_result = self._exec(sample_cmd)
+        
+        if self._is_likely_binary(path, sample_result.stdout):
+            return ReadResult(
+                is_binary=True,
+                file_size=file_size,
+                error="Binary file - cannot display as text. Use appropriate tools to handle this file type."
+            )
+        
+        # Read with pagination using sed
+        end_line = offset + limit - 1
+        read_cmd = f"sed -n '{offset},{end_line}p' {self._escape_shell_arg(path)}"
+        read_result = self._exec(read_cmd)
+        
+        if read_result.exit_code != 0:
+            return ReadResult(error=f"Failed to read file: {read_result.stdout}")
+        
+        # Get total line count
+        wc_cmd = f"wc -l < {self._escape_shell_arg(path)}"
+        wc_result = self._exec(wc_cmd)
+        try:
+            total_lines = int(wc_result.stdout.strip())
+        except ValueError:
+            total_lines = 0
+        
+        # Check if truncated
+        truncated = total_lines > end_line
+        hint = None
+        if truncated:
+            hint = f"Use offset={end_line + 1} to continue reading (showing {offset}-{end_line} of {total_lines} lines)"
+        
+        return ReadResult(
+            content=self._add_line_numbers(read_result.stdout, offset),
+            total_lines=total_lines,
+            file_size=file_size,
+            truncated=truncated,
+            hint=hint
+        )
+    
+    def _read_image(self, path: str) -> ReadResult:
+        """Read an image file, returning base64 content."""
+        # Get file size
+        stat_cmd = f"stat -c '%s' {self._escape_shell_arg(path)} 2>/dev/null"
+        stat_result = self._exec(stat_cmd)
+        try:
+            file_size = int(stat_result.stdout.strip())
+        except ValueError:
+            file_size = 0
+        
+        # Get base64 content
+        b64_cmd = f"base64 -w 0 {self._escape_shell_arg(path)} 2>/dev/null"
+        b64_result = self._exec(b64_cmd, timeout=30)
+        
+        if b64_result.exit_code != 0:
+            return ReadResult(
+                is_image=True,
+                is_binary=True,
+                file_size=file_size,
+                error=f"Failed to read image: {b64_result.stdout}"
+            )
+        
+        # Try to get dimensions (requires ImageMagick)
+        dimensions = None
+        if self._has_command('identify'):
+            dim_cmd = f"identify -format '%wx%h' {self._escape_shell_arg(path)} 2>/dev/null"
+            dim_result = self._exec(dim_cmd)
+            if dim_result.exit_code == 0:
+                dimensions = dim_result.stdout.strip()
+        
+        # Determine MIME type from extension
+        ext = os.path.splitext(path)[1].lower()
+        mime_types = {
+            '.png': 'image/png',
+            '.jpg': 'image/jpeg',
+            '.jpeg': 'image/jpeg',
+            '.gif': 'image/gif',
+            '.webp': 'image/webp',
+            '.bmp': 'image/bmp',
+            '.ico': 'image/x-icon',
+        }
+        mime_type = mime_types.get(ext, 'application/octet-stream')
+        
+        return ReadResult(
+            is_image=True,
+            is_binary=True,
+            file_size=file_size,
+            base64_content=b64_result.stdout,
+            mime_type=mime_type,
+            dimensions=dimensions
+        )
+    
+    def _suggest_similar_files(self, path: str) -> ReadResult:
+        """Suggest similar files when the requested file is not found."""
+        # Get directory and filename
+        dir_path = os.path.dirname(path) or "."
+        filename = os.path.basename(path)
+        
+        # List files in directory
+        ls_cmd = f"ls -1 {self._escape_shell_arg(dir_path)} 2>/dev/null | head -20"
+        ls_result = self._exec(ls_cmd)
+        
+        similar = []
+        if ls_result.exit_code == 0 and ls_result.stdout.strip():
+            files = ls_result.stdout.strip().split('\n')
+            # Simple similarity: files that share some characters with the target
+            for f in files:
+                # Check if filenames share significant overlap
+                common = set(filename.lower()) & set(f.lower())
+                if len(common) >= len(filename) * 0.5:  # 50% character overlap
+                    similar.append(os.path.join(dir_path, f))
+        
+        return ReadResult(
+            error=f"File not found: {path}",
+            similar_files=similar[:5]  # Limit to 5 suggestions
+        )
+    
+    # =========================================================================
+    # WRITE Implementation
+    # =========================================================================
+    
+    def write_file(self, path: str, content: str) -> WriteResult:
+        """
+        Write content to a file, creating parent directories as needed.
+        
+        Uses heredoc with unique marker for safe shell execution.
+        
+        Args:
+            path: File path to write
+            content: Content to write
+        
+        Returns:
+            WriteResult with bytes written or error
+        """
+        # Expand ~ and other shell paths
+        path = self._expand_path(path)
+        
+        # Create parent directories
+        parent = os.path.dirname(path)
+        dirs_created = False
+        
+        if parent:
+            mkdir_cmd = f"mkdir -p {self._escape_shell_arg(parent)}"
+            mkdir_result = self._exec(mkdir_cmd)
+            if mkdir_result.exit_code == 0:
+                dirs_created = True
+        
+        # Generate unique marker for heredoc that won't appear in content
+        marker = f"HERMES_EOF_{uuid.uuid4().hex[:8]}"
+        while marker in content:
+            marker = f"HERMES_EOF_{uuid.uuid4().hex[:8]}"
+        
+        # Write using heredoc with single-quoted marker (prevents all expansion)
+        # The single quotes around the marker prevent variable expansion
+        write_cmd = f"cat > {self._escape_shell_arg(path)} << '{marker}'\n{content}\n{marker}"
+        write_result = self._exec(write_cmd)
+        
+        if write_result.exit_code != 0:
+            return WriteResult(error=f"Failed to write file: {write_result.stdout}")
+        
+        # Get bytes written
+        stat_cmd = f"stat -c '%s' {self._escape_shell_arg(path)} 2>/dev/null"
+        stat_result = self._exec(stat_cmd)
+        
+        try:
+            bytes_written = int(stat_result.stdout.strip())
+        except ValueError:
+            bytes_written = len(content.encode('utf-8'))
+        
+        return WriteResult(
+            bytes_written=bytes_written,
+            dirs_created=dirs_created
+        )
+    
+    # =========================================================================
+    # PATCH Implementation (Replace Mode)
+    # =========================================================================
+    
+    def patch_replace(self, path: str, old_string: str, new_string: str,
+                      replace_all: bool = False) -> PatchResult:
+        """
+        Replace text in a file using fuzzy matching.
+        
+        Args:
+            path: File path to modify
+            old_string: Text to find (must be unique unless replace_all=True)
+            new_string: Replacement text
+            replace_all: If True, replace all occurrences
+        
+        Returns:
+            PatchResult with diff and lint results
+        """
+        # Expand ~ and other shell paths
+        path = self._expand_path(path)
+        
+        # Read current content
+        read_cmd = f"cat {self._escape_shell_arg(path)} 2>/dev/null"
+        read_result = self._exec(read_cmd)
+        
+        if read_result.exit_code != 0:
+            return PatchResult(error=f"Failed to read file: {path}")
+        
+        content = read_result.stdout
+        
+        # Import and use fuzzy matching
+        from tools.fuzzy_match import fuzzy_find_and_replace
+        
+        new_content, match_count, error = fuzzy_find_and_replace(
+            content, old_string, new_string, replace_all
+        )
+        
+        if error:
+            return PatchResult(error=error)
+        
+        if match_count == 0:
+            return PatchResult(error=f"Could not find match for old_string in {path}")
+        
+        # Write back
+        write_result = self.write_file(path, new_content)
+        if write_result.error:
+            return PatchResult(error=f"Failed to write changes: {write_result.error}")
+        
+        # Generate diff
+        diff = self._unified_diff(content, new_content, path)
+        
+        # Auto-lint
+        lint_result = self._check_lint(path)
+        
+        return PatchResult(
+            success=True,
+            diff=diff,
+            files_modified=[path],
+            lint=lint_result.to_dict() if lint_result else None
+        )
+    
+    def patch_v4a(self, patch_content: str) -> PatchResult:
+        """
+        Apply a V4A format patch.
+        
+        V4A format:
+            *** Begin Patch
+            *** Update File: path/to/file.py
+            @@ context hint @@
+             context line
+            -removed line
+            +added line
+            *** End Patch
+        
+        Args:
+            patch_content: V4A format patch string
+        
+        Returns:
+            PatchResult with changes made
+        """
+        # Import patch parser
+        from tools.patch_parser import parse_v4a_patch, apply_v4a_operations
+        
+        operations, parse_error = parse_v4a_patch(patch_content)
+        if parse_error:
+            return PatchResult(error=f"Failed to parse patch: {parse_error}")
+        
+        # Apply operations
+        result = apply_v4a_operations(operations, self)
+        return result
+    
+    def _check_lint(self, path: str) -> LintResult:
+        """
+        Run syntax check on a file after editing.
+        
+        Args:
+            path: File path to lint
+        
+        Returns:
+            LintResult with status and any errors
+        """
+        ext = os.path.splitext(path)[1].lower()
+        
+        if ext not in LINTERS:
+            return LintResult(skipped=True, message=f"No linter for {ext} files")
+        
+        # Check if linter command is available
+        linter_cmd = LINTERS[ext]
+        # Extract the base command (first word)
+        base_cmd = linter_cmd.split()[0]
+        
+        if not self._has_command(base_cmd):
+            return LintResult(skipped=True, message=f"{base_cmd} not available")
+        
+        # Run linter
+        cmd = linter_cmd.format(file=self._escape_shell_arg(path))
+        result = self._exec(cmd, timeout=30)
+        
+        return LintResult(
+            success=result.exit_code == 0,
+            output=result.stdout.strip() if result.stdout.strip() else ""
+        )
+    
+    # =========================================================================
+    # SEARCH Implementation
+    # =========================================================================
+    
+    def search(self, pattern: str, path: str = ".", target: str = "content",
+               file_glob: Optional[str] = None, limit: int = 50, offset: int = 0,
+               output_mode: str = "content", context: int = 0) -> SearchResult:
+        """
+        Search for content or files.
+        
+        Args:
+            pattern: Regex (for content) or glob pattern (for files)
+            path: Directory/file to search (default: cwd)
+            target: "content" (grep) or "files" (glob)
+            file_glob: File pattern filter for content search (e.g., "*.py")
+            limit: Max results (default 50)
+            offset: Skip first N results
+            output_mode: "content", "files_only", or "count"
+            context: Lines of context around matches
+        
+        Returns:
+            SearchResult with matches or file list
+        """
+        # Expand ~ and other shell paths
+        path = self._expand_path(path)
+        
+        if target == "files":
+            return self._search_files(pattern, path, limit, offset)
+        else:
+            return self._search_content(pattern, path, file_glob, limit, offset, 
+                                        output_mode, context)
+    
+    def _search_files(self, pattern: str, path: str, limit: int, offset: int) -> SearchResult:
+        """Search for files by name pattern (glob-like)."""
+        # Check if find is available (not on Windows without Git Bash/WSL)
+        if not self._has_command('find'):
+            return SearchResult(
+                error="File search requires 'find' command. "
+                      "On Windows, use Git Bash, WSL, or install Unix tools."
+            )
+        
+        # Auto-prepend **/ for recursive search if not already present
+        if not pattern.startswith('**/') and '/' not in pattern:
+            search_pattern = pattern
+        else:
+            search_pattern = pattern.split('/')[-1]
+        
+        # Use find with modification time sorting
+        # -printf '%T@ %p\n' outputs: timestamp path
+        # sort -rn sorts by timestamp descending (newest first)
+        cmd = f"find {self._escape_shell_arg(path)} -type f -name {self._escape_shell_arg(search_pattern)} " \
+              f"-printf '%T@ %p\\n' 2>/dev/null | sort -rn | tail -n +{offset + 1} | head -n {limit}"
+        
+        result = self._exec(cmd, timeout=60)
+        
+        if result.exit_code != 0 and not result.stdout.strip():
+            # Try without -printf (BSD find compatibility)
+            cmd_simple = f"find {self._escape_shell_arg(path)} -type f -name {self._escape_shell_arg(search_pattern)} " \
+                        f"2>/dev/null | head -n {limit + offset} | tail -n +{offset + 1}"
+            result = self._exec(cmd_simple, timeout=60)
+        
+        files = []
+        for line in result.stdout.strip().split('\n'):
+            if not line:
+                continue
+            # Parse "timestamp path" format
+            parts = line.split(' ', 1)
+            if len(parts) == 2 and parts[0].replace('.', '').isdigit():
+                files.append(parts[1])
+            else:
+                files.append(line)
+        
+        return SearchResult(
+            files=files,
+            total_count=len(files)
+        )
+    
+    def _search_content(self, pattern: str, path: str, file_glob: Optional[str],
+                        limit: int, offset: int, output_mode: str, context: int) -> SearchResult:
+        """Search for content inside files (grep-like)."""
+        # Try ripgrep first (fast), fallback to grep (slower but works)
+        if self._has_command('rg'):
+            return self._search_with_rg(pattern, path, file_glob, limit, offset, 
+                                        output_mode, context)
+        elif self._has_command('grep'):
+            return self._search_with_grep(pattern, path, file_glob, limit, offset,
+                                          output_mode, context)
+        else:
+            # Neither rg nor grep available (Windows without Git Bash, etc.)
+            return SearchResult(
+                error="Content search requires ripgrep (rg) or grep. "
+                      "Install ripgrep: https://github.com/BurntSushi/ripgrep#installation"
+            )
+    
+    def _search_with_rg(self, pattern: str, path: str, file_glob: Optional[str],
+                        limit: int, offset: int, output_mode: str, context: int) -> SearchResult:
+        """Search using ripgrep."""
+        cmd_parts = ["rg", "--line-number", "--no-heading"]
+        
+        # Add context if requested
+        if context > 0:
+            cmd_parts.extend(["-C", str(context)])
+        
+        # Add file glob filter
+        if file_glob:
+            cmd_parts.extend(["--glob", file_glob])
+        
+        # Output mode handling
+        if output_mode == "files_only":
+            cmd_parts.append("-l")  # Files only
+        elif output_mode == "count":
+            cmd_parts.append("-c")  # Count per file
+        
+        # Add pattern and path
+        cmd_parts.append(self._escape_shell_arg(pattern))
+        cmd_parts.append(self._escape_shell_arg(path))
+        
+        # Limit results
+        cmd_parts.extend(["|", "head", "-n", str(limit + offset)])
+        
+        cmd = " ".join(cmd_parts)
+        result = self._exec(cmd, timeout=60)
+        
+        # Parse results based on output mode
+        if output_mode == "files_only":
+            files = [f for f in result.stdout.strip().split('\n') if f][offset:]
+            return SearchResult(files=files[:limit], total_count=len(files))
+        
+        elif output_mode == "count":
+            counts = {}
+            for line in result.stdout.strip().split('\n'):
+                if ':' in line:
+                    parts = line.rsplit(':', 1)
+                    if len(parts) == 2:
+                        try:
+                            counts[parts[0]] = int(parts[1])
+                        except ValueError:
+                            pass
+            return SearchResult(counts=counts, total_count=sum(counts.values()))
+        
+        else:
+            # Parse content matches
+            matches = []
+            for line in result.stdout.strip().split('\n')[offset:]:
+                if not line:
+                    continue
+                # Format: file:line:content
+                parts = line.split(':', 2)
+                if len(parts) >= 3:
+                    try:
+                        matches.append(SearchMatch(
+                            path=parts[0],
+                            line_number=int(parts[1]),
+                            content=parts[2][:500]  # Truncate long lines
+                        ))
+                    except ValueError:
+                        # Line number not an int, skip
+                        pass
+            
+            return SearchResult(
+                matches=matches[:limit],
+                total_count=len(matches),
+                truncated=len(matches) > limit
+            )
+    
+    def _search_with_grep(self, pattern: str, path: str, file_glob: Optional[str],
+                          limit: int, offset: int, output_mode: str, context: int) -> SearchResult:
+        """Fallback search using grep."""
+        cmd_parts = ["grep", "-rn"]
+        
+        # Add context if requested
+        if context > 0:
+            cmd_parts.extend(["-C", str(context)])
+        
+        # Add file pattern filter
+        if file_glob:
+            cmd_parts.extend(["--include", file_glob])
+        
+        # Output mode handling
+        if output_mode == "files_only":
+            cmd_parts.append("-l")
+        elif output_mode == "count":
+            cmd_parts.append("-c")
+        
+        # Add pattern and path
+        cmd_parts.append(self._escape_shell_arg(pattern))
+        cmd_parts.append(self._escape_shell_arg(path))
+        
+        # Limit and offset
+        cmd_parts.extend(["|", "tail", "-n", f"+{offset + 1}", "|", "head", "-n", str(limit)])
+        
+        cmd = " ".join(cmd_parts)
+        result = self._exec(cmd, timeout=60)
+        
+        # Parse results (same format as rg)
+        if output_mode == "files_only":
+            files = [f for f in result.stdout.strip().split('\n') if f]
+            return SearchResult(files=files, total_count=len(files))
+        
+        elif output_mode == "count":
+            counts = {}
+            for line in result.stdout.strip().split('\n'):
+                if ':' in line:
+                    parts = line.rsplit(':', 1)
+                    if len(parts) == 2:
+                        try:
+                            counts[parts[0]] = int(parts[1])
+                        except ValueError:
+                            pass
+            return SearchResult(counts=counts, total_count=sum(counts.values()))
+        
+        else:
+            matches = []
+            for line in result.stdout.strip().split('\n'):
+                if not line:
+                    continue
+                parts = line.split(':', 2)
+                if len(parts) >= 3:
+                    try:
+                        matches.append(SearchMatch(
+                            path=parts[0],
+                            line_number=int(parts[1]),
+                            content=parts[2][:500]
+                        ))
+                    except ValueError:
+                        pass
+            
+            return SearchResult(
+                matches=matches,
+                total_count=len(matches)
+            )
--- a/Show More
+++ b/Show More