Add file manipulation tools and enhance setup scripts

- Introduced file manipulation capabilities in `model_tools.py`, including functions for reading, writing, patching, and searching files. - Added a new `file` toolset in `toolsets.py` and updated distributions to include file tools. - Enhanced `setup-hermes.sh` and `install.sh` scripts to check for and optionally install `ripgrep` for faster file searching. - Implemented a new `file_operations.py` module to encapsulate file operations using shell commands. - Updated `doctor.py` and `install.ps1` to check for `ripgrep` and provide installation guidance if not found. - Added fuzzy matching and patch parsing capabilities to improve file manipulation accuracy and flexibility.
Enhance RL test inference with WandB integration and real-time output streaming
2026-02-05 03:49:46 -08:00 · 2026-02-04 21:07:07 -08:00 · 2026-02-04 13:57:59 -08:00 · 2026-02-04 10:36:01 -08:00 · 2026-02-04 09:36:51 -08:00 · 2026-02-03 23:41:26 -08:00
121 changed files with 18424 additions and 20731 deletions
--- a/.clinerules
+++ b/.clinerules
@@ -1,115 +0,0 @@
-# Cline's Memory Bank
-
-I am Cline, an expert software engineer with a unique characteristic: my memory resets completely between sessions. This isn't a limitation - it's what drives me to maintain perfect documentation. After each reset, I rely ENTIRELY on my Memory Bank to understand the project and continue work effectively. I MUST read ALL memory bank files at the start of EVERY task - this is not optional.
-
-## Memory Bank Structure
-
-The Memory Bank consists of core files and optional context files, all in Markdown format. Files build upon each other in a clear hierarchy:
-
-flowchart TD
-    PB[projectbrief.md] --> PC[productContext.md]
-    PB --> SP[systemPatterns.md]
-    PB --> TC[techContext.md]
-
-    PC --> AC[activeContext.md]
-    SP --> AC
-    TC --> AC
-
-    AC --> P[progress.md]
-
-### Core Files (Required)
-1. `projectbrief.md`
-   - Foundation document that shapes all other files
-   - Created at project start if it doesn't exist
-   - Defines core requirements and goals
-   - Source of truth for project scope
-
-2. `productContext.md`
-   - Why this project exists
-   - Problems it solves
-   - How it should work
-   - User experience goals
-
-3. `activeContext.md`
-   - Current work focus
-   - Recent changes
-   - Next steps
-   - Active decisions and considerations
-   - Important patterns and preferences
-   - Learnings and project insights
-
-4. `systemPatterns.md`
-   - System architecture
-   - Key technical decisions
-   - Design patterns in use
-   - Component relationships
-   - Critical implementation paths
-
-5. `techContext.md`
-   - Technologies used
-   - Development setup
-   - Technical constraints
-   - Dependencies
-   - Tool usage patterns
-
-6. `progress.md`
-   - What works
-   - What's left to build
-   - Current status
-   - Known issues
-   - Evolution of project decisions
-
-### Additional Context
-Create additional files/folders within memory-bank/ when they help organize:
- Complex feature documentation
- Integration specifications
- API documentation
- Testing strategies
- Deployment procedures
-
-## Core Workflows
-
-### Plan Mode
-flowchart TD
-    Start[Start] --> ReadFiles[Read Memory Bank]
-    ReadFiles --> CheckFiles{Files Complete?}
-
-    CheckFiles -->|No| Plan[Create Plan]
-    Plan --> Document[Document in Chat]
-
-    CheckFiles -->|Yes| Verify[Verify Context]
-    Verify --> Strategy[Develop Strategy]
-    Strategy --> Present[Present Approach]
-
-### Act Mode
-flowchart TD
-    Start[Start] --> Context[Check Memory Bank]
-    Context --> Update[Update Documentation]
-    Update --> Execute[Execute Task]
-    Execute --> Document[Document Changes]
-
-## Documentation Updates
-
-Memory Bank updates occur when:
-1. Discovering new project patterns
-2. After implementing significant changes
-3. When user requests with **update memory bank** (MUST review ALL files)
-4. When context needs clarification
-
-flowchart TD
-    Start[Update Process]
-
-    subgraph Process
-        P1[Review ALL Files]
-        P2[Document Current State]
-        P3[Clarify Next Steps]
-        P4[Document Insights & Patterns]
-
-        P1 --> P2 --> P3 --> P4
-    end
-
-    Start --> Process
-
-Note: When triggered by **update memory bank**, I MUST review every memory bank file, even if some don't require updates. Focus particularly on activeContext.md and progress.md as they track current state.
-
-REMEMBER: After every memory reset, I begin completely fresh. The Memory Bank is my only link to previous work. It must be maintained with precision and clarity, as my effectiveness depends entirely on its accuracy.
--- a/.cursorrules
+++ b/.cursorrules
@@ -1,201 +0,0 @@
-Hermes-Agent is an agent harness for LLMs with an interactive CLI.
-
-## Development Environment
-
-**IMPORTANT**: Always use the virtual environment if it exists:
-```bash
-source venv/bin/activate  # Before running any Python commands
-```
-
-## Project Structure
-
- `hermes` - CLI launcher script (run with `./hermes`)
- `cli.py` - Interactive CLI with Rich UI, prompt_toolkit, animated spinners
- `cli-config.yaml` - CLI configuration (model, terminal, toolsets, personalities)
- `tools/` - Individual tool implementations (web, terminal, browser, vision, etc.)
- `tools/__init__.py` - Exports all tools for importing
- `model_tools.py` - Consolidates tool schemas and handlers for the agent
- `toolsets.py` - Groups tools into logical toolsets (web, terminal, browser, etc.)
- `toolset_distributions.py` - Probability-based tool selection for data generation
- `run_agent.py` - Primary agent runner with AIAgent class and KawaiiSpinner
- `batch_runner.py` - Parallel batch processing with checkpointing
- `tests/` - Test scripts
-
-## File Dependency Chain
-
-```
-tools/*.py → tools/__init__.py → model_tools.py → toolsets.py → toolset_distributions.py
-                                       ↑
-run_agent.py ──────────────────────────┘
-cli.py → run_agent.py (uses AIAgent with quiet_mode=True)
-batch_runner.py → run_agent.py + toolset_distributions.py
-```
-
-Always ensure consistency between tools, model_tools.py, and toolsets.py when changing any of them.
-
-## CLI Architecture (cli.py)
-
-The interactive CLI uses:
- **Rich** - For the welcome banner and styled panels
- **prompt_toolkit** - For fixed input area with history and `patch_stdout`
- **KawaiiSpinner** (in run_agent.py) - Animated feedback during API calls and tool execution
-
-Key components:
- `HermesCLI` class - Main CLI controller with commands and conversation loop
- `load_cli_config()` - Loads `cli-config.yaml`, sets environment variables for terminal
- `build_welcome_banner()` - Displays ASCII art logo, tools, and skills summary
- `/commands` - Process user commands like `/help`, `/clear`, `/personality`, etc.
-
-CLI uses `quiet_mode=True` when creating AIAgent to suppress verbose logging and enable kawaii-style feedback instead.
-
-### Adding CLI Commands
-
-1. Add to `COMMANDS` dict with description
-2. Add handler in `process_command()` method
-3. For persistent settings, use `save_config_value()` to update `cli-config.yaml`
-
-## Adding a New Tool
-
-Follow this strict order to maintain consistency:
-
-1. Create `tools/your_tool.py` with:
-   - Handler function (sync or async) returning a JSON string via `json.dumps()`
-   - `check_*_requirements()` function to verify dependencies (e.g., API keys)
-   - Schema definition following OpenAI function-calling format
-
-2. Export in `tools/__init__.py`:
-   - Import the handler and check function
-   - Add to `__all__` list
-
-3. Register in `model_tools.py`:
-   - Create `get_*_tool_definitions()` function or add to existing
-   - Add routing in `handle_function_call()` dispatcher
-   - Update `get_all_tool_names()` with the tool name
-   - Update `get_toolset_for_tool()` mapping
-   - Update `get_available_toolsets()` and `check_toolset_requirements()`
-
-4. Add to toolset in `toolsets.py`:
-   - Add to existing toolset or create new one in TOOLSETS dict
-
-5. Optionally add to `toolset_distributions.py` for batch processing
-
-## Tool Implementation Pattern
-
-```python
-# tools/example_tool.py
-import json
-import os
-
-def check_example_requirements() -> bool:
-    """Check if required API keys/dependencies are available."""
-    return bool(os.getenv("EXAMPLE_API_KEY"))
-
-def example_tool(param: str, task_id: str = None) -> str:
-    """Execute the tool and return JSON string result."""
-    try:
-        result = {"success": True, "data": "..."}
-        return json.dumps(result, ensure_ascii=False)
-    except Exception as e:
-        return json.dumps({"error": str(e)}, ensure_ascii=False)
-```
-
-All tool handlers MUST return a JSON string. Never return raw dicts.
-
-## Stateful Tools
-
-Tools that maintain state (terminal, browser) require:
- `task_id` parameter for session isolation between concurrent tasks
- `cleanup_*()` function to release resources
- Cleanup is called automatically in run_agent.py after conversation completes
-
-## Environment Variables
-
-API keys are loaded from `.env` file in repo root:
- `OPENROUTER_API_KEY` - Main LLM API access (primary provider)
- `FIRECRAWL_API_KEY` - Web search/extract tools
- `BROWSERBASE_API_KEY` / `BROWSERBASE_PROJECT_ID` - Browser automation
- `FAL_KEY` - Image generation (FLUX model)
- `NOUS_API_KEY` - Vision and Mixture-of-Agents tools
-
-Terminal tool configuration (can also be set in `cli-config.yaml`):
- `TERMINAL_ENV` - Backend: local, docker, singularity, modal, or ssh
- `TERMINAL_CWD` - Working directory
- `TERMINAL_SSH_HOST`, `TERMINAL_SSH_USER`, `TERMINAL_SSH_KEY` - For SSH backend
-
-## Agent Loop (run_agent.py)
-
-The AIAgent class handles:
- Processing enabled toolsets to provide to the model
- Piping prompts to the agent
- Looping LLM calls when tools are invoked, until natural language response
- Returning the final response
-
-Uses OpenAI-compatible API (primarily OpenRouter) with the OpenAI Python SDK.
-
-## Reasoning Model Support
-
-For models that support chain-of-thought reasoning:
- Extract `reasoning_content` from API responses
- Store in `assistant_msg["reasoning"]` for trajectory export
- Pass back via `reasoning_content` field on subsequent turns
-
-## Trajectory Format
-
-Conversations are saved in ShareGPT format for training:
-```json
-{"from": "system", "value": "System prompt with <tools>...</tools>"}
-{"from": "human", "value": "User message"}
-{"from": "gpt", "value": "<think>reasoning</think>\n<tool_call>{...}</tool_call>"}
-{"from": "tool", "value": "<tool_response>{...}</tool_response>"}
-{"from": "gpt", "value": "Final response"}
-```
-
-Tool calls use `<tool_call>` XML tags, responses use `<tool_response>` tags, reasoning uses `<think>` tags.
-
-## Batch Processing (batch_runner.py)
-
-For processing multiple prompts:
- Parallel execution with multiprocessing
- Content-based resume for fault tolerance (matches on prompt text, not indices)
- Toolset distributions control probabilistic tool availability per prompt
- Output: `data/<run_name>/trajectories.jsonl` (combined) + individual batch files
-
-## Logging
-
-Trajectories restructure tools as a system prompt for storage in a format suitable for later training use.
-
-## Skills System
-
-Skills are on-demand knowledge documents the agent can load. Located in `skills/` directory:
-
-```
-skills/
-├── mlops/                    # Category folder
-│   ├── axolotl/             # Skill folder
-│   │   ├── SKILL.md         # Main instructions (required)
-│   │   ├── references/      # Additional docs, API specs
-│   │   └── templates/       # Output formats, configs
-│   └── vllm/
-│       └── SKILL.md
-└── example-skill/
-    └── SKILL.md
-```
-
-**Progressive disclosure** (token-efficient):
-1. `skills_categories()` - List category names (~50 tokens)
-2. `skills_list(category)` - Name + description per skill (~3k tokens)
-3. `skill_view(name)` - Full content + tags + linked files
-
-SKILL.md files use YAML frontmatter:
-```yaml
---
-name: skill-name
-description: Brief description for listing
-tags: [tag1, tag2]
-related_skills: [other-skill]
-version: 1.0.0
---
-# Skill Content...
-```
-
-Tool files: `tools/skills_tool.py` → `model_tools.py` → `toolsets.py`
--- a/.env.example
+++ b/.env.example
@@ -1,68 +1,12 @@
 # Hermes Agent Environment Configuration
 # Copy this file to .env and fill in your API keys

-# =============================================================================
-# CORE SETTINGS
-# =============================================================================
-# Agent backend:
-# - openai  : default Hermes-Agent loop (OpenAI function-calling via OpenAI SDK)
-# - atropos : Atroposlib ServerManager/ManagedServer-backed loop (training/env integration)
-HERMES_BACKEND=openai
-
-
-# =============================================================================
-# LOCAL / SELF-HOSTED OPENAI-COMPATIBLE ENDPOINTS (vLLM, SGLang, llama.cpp, etc.)
-# =============================================================================
-# For local development (matches the Atropos test env defaults):
-# ATROPOS_SERVER_BASE_URL=http://127.0.0.1:8080
-# ATROPOS_SERVER_MODEL=hermes-4-36b
-# For hosted inference (Nous Research inference API):
-ATROPOS_SERVER_BASE_URL=
-ATROPOS_SERVER_MODEL=
-ATROPOS_TOKENIZER_NAME=
-# Set this to your Nous API key (Bearer token).
-ATROPOS_SERVER_API_KEY=
-
-# Debugging (prints to stdout; use with care)
-# HERMES_DEBUG_ATROPOS_REQUEST=1
-# HERMES_DEBUG_ATROPOS_RESPONSE=1
-# HERMES_DEBUG_OPENAI_REQUEST=1
-# HERMES_DEBUG_OPENAI_RESPONSE=1
-
-# =============================================================================
-# LOCAL / SELF-HOSTED OPENAI-COMPATIBLE ENDPOINTS (vLLM, SGLang, llama.cpp, etc.)
-# =============================================================================
-# If you set ATROPOS_SERVER_BASE_URL or OPENAI_BASE_URL, Hermes will use it instead
-# of OpenRouter.
-#
-# Local server convenience (base URL without /v1):
-# llama.cpp example (see `Hermes-Agent/scripts/launch_llama_cpp_hermes_4_36b.sh`):
-# ATROPOS_SERVER_BASE_URL=http://127.0.0.1:8080
-# ATROPOS_SERVER_MODEL=hermes-4-36b
-# ATROPOS_TOKENIZER_NAME=NousResearch/Hermes-4.3-36B
-# ATROPOS_SERVER_API_KEY=local
-#
-# Hosted Nous inference API:
-# ATROPOS_SERVER_BASE_URL=https://inference-api.nousresearch.com
-# ATROPOS_SERVER_MODEL=Hermes-4.3-36B
-# ATROPOS_TOKENIZER_NAME=NousResearch/Hermes-4.3-36B
-# ATROPOS_SERVER_API_KEY=sk-... (Bearer token)
-#
-# If you plan to run GRPO-style group sampling (e.g. `--env.group_size 4`) against
-# llama.cpp, start the server with at least that many slots, e.g.:
-#   LLAMA_CPP_PARALLEL=4 Hermes-Agent/scripts/launch_llama_cpp_hermes_4_36b.sh
-#
-# Generic OpenAI-compatible (base URL should include /v1):
-# OPENAI_BASE_URL=http://127.0.0.1:8080/v1
-# OPENAI_API_KEY=local
-
 # =============================================================================
 # LLM PROVIDER (OpenRouter)
 # =============================================================================
 # OpenRouter provides access to many models through one API
 # All LLM calls go through OpenRouter - no direct provider keys needed
 # Get your key at: https://openrouter.ai/keys
-OPENROUTER_BASE_URL=https://openrouter.ai/api/v1
 OPENROUTER_API_KEY=

 # Default model to use (OpenRouter format: provider/model)
@@ -96,6 +40,7 @@ FAL_KEY=
 # - modal: Runs in Modal cloud sandboxes (scalable, requires Modal account)
 TERMINAL_ENV=local

+
 # Container images (for singularity/docker/modal backends)
 TERMINAL_DOCKER_IMAGE=python:3.11
 TERMINAL_SINGULARITY_IMAGE=docker://python:3.11
@@ -144,87 +89,12 @@ TERMINAL_LIFETIME_SECONDS=300
 # SUDO_PASSWORD=your_password_here

 # =============================================================================
-# MODAL CLOUD BACKEND (for TERMINAL_ENV=modal)
+# MODAL CLOUD BACKEND (Optional - for TERMINAL_ENV=modal)
 # =============================================================================
-# Modal provides cloud sandboxes with per-second billing and auto-scaling.
-# This implementation uses a warm pool of sandboxes for cost efficiency.
-#
-# SETUP:
-#   pip install modal && modal setup
-#   (Authenticates via browser, stores credentials locally)
-#
-# FEATURES:
-# - Auto-scaling warm sandbox pool (no cold start after first use)
-# - Named sandbox recovery (reconnects after restart)
-# - Profile-based heterogeneous environments (CPU, GPU, different images)
-# - Server-side idle_timeout protection against orphaned sandboxes
-
-# Modal app name (groups all sandboxes, used for recovery)
-TERMINAL_MODAL_APP_NAME=hermes-sandbox
-
-# Default profile when none specified
-TERMINAL_MODAL_DEFAULT_PROFILE=default
-
-# Profile config file (optional - YAML format, see modal_profiles.yaml)
-# TERMINAL_MODAL_PROFILES_FILE=modal_profiles.yaml
-
-# --- Default Profile Settings (used if no YAML file) ---
-# These apply when no profile is specified or for the "default" profile
-TERMINAL_MODAL_IMAGE=python:3.11
-TERMINAL_MODAL_MIN_POOL=1
-TERMINAL_MODAL_MAX_POOL=5
-TERMINAL_MODAL_IDLE_TIMEOUT=120
-TERMINAL_MODAL_MAX_LIFETIME=3600
-TERMINAL_MODAL_SCALE_DOWN_IDLE=180
-
-# --- Custom Profile Example: pytorch-gpu ---
-# Uncomment to enable a GPU profile for ML tasks
-# Usage: terminal_tool("python train.py", profile="pytorch-gpu")
-#
-# TERMINAL_MODAL_PROFILE_pytorch_gpu_IMAGE=pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime
-# TERMINAL_MODAL_PROFILE_pytorch_gpu_GPU=T4
-# TERMINAL_MODAL_PROFILE_pytorch_gpu_MEMORY=16384
-# TERMINAL_MODAL_PROFILE_pytorch_gpu_MIN_POOL=0
-# TERMINAL_MODAL_PROFILE_pytorch_gpu_MAX_POOL=2
-# TERMINAL_MODAL_PROFILE_pytorch_gpu_IDLE_TIMEOUT=60
-
-# --- Custom Profile Example: node ---
-# Uncomment to enable a Node.js profile
-# Usage: terminal_tool("npm test", profile="node")
-#
-# TERMINAL_MODAL_PROFILE_node_IMAGE=node:18
-# TERMINAL_MODAL_PROFILE_node_MIN_POOL=0
-# TERMINAL_MODAL_PROFILE_node_MAX_POOL=3
-
-# =============================================================================
-# MODAL SECRETS (Secure credential injection)
-# =============================================================================
-# Modal Secrets allow you to securely pass API keys, passwords, and other
-# sensitive data to your sandboxes without exposing them in code or logs.
-#
-# SETUP SECRETS:
-#   1. Via Dashboard: https://modal.com/secrets
-#   2. Via CLI: modal secret create my-secret KEY1=value1 KEY2=value2
-#   3. Via CLI with env: modal secret create my-secret API_KEY="$API_KEY"
-#
-# LIST SECRETS:
-#   modal secret list
-#
-# DELETE SECRETS:
-#   modal secret delete my-secret
-
-# Global secrets applied to ALL profiles (comma-separated secret names)
-# These secrets must be created on Modal dashboard or via CLI first
-# TERMINAL_MODAL_SECRETS=my-api-keys,database-creds
-
-# Per-profile secrets (comma-separated secret names)
-# TERMINAL_MODAL_PROFILE_pytorch_gpu_SECRETS=huggingface-token,wandb-key
-
-# Per-profile environment variables (semicolon-separated KEY=VALUE pairs)
-# TERMINAL_MODAL_PROFILE_default_ENV_VARS=DEBUG=1;LOG_LEVEL=info
-
-# Load local .env file into sandbox (useful for development)
-# TERMINAL_MODAL_PROFILE_default_USE_DOTENV=true
+# Modal uses CLI authentication, not environment variables.
+# Run: pip install modal && modal setup
+# This will authenticate via browser and store credentials locally.
+# No API key needed in .env - Modal handles auth automatically.

 # =============================================================================
 # BROWSER TOOL CONFIGURATION (agent-browser + Browserbase)
@@ -285,3 +155,31 @@ WEB_TOOLS_DEBUG=false
 VISION_TOOLS_DEBUG=false
 MOA_TOOLS_DEBUG=false
 IMAGE_TOOLS_DEBUG=false
+
+# =============================================================================
+# CONTEXT COMPRESSION (Auto-shrinks long conversations)
+# =============================================================================
+# When conversation approaches model's context limit, middle turns are
+# automatically summarized to free up space.
+#
+# CONTEXT_COMPRESSION_ENABLED=true        # Enable auto-compression (default: true)
+# CONTEXT_COMPRESSION_THRESHOLD=0.85      # Compress at 85% of context limit
+# CONTEXT_COMPRESSION_MODEL=google/gemini-2.0-flash-001  # Fast model for summaries
+
+# =============================================================================
+# RL TRAINING (Tinker + Atropos)
+# =============================================================================
+# Run reinforcement learning training on language models using the Tinker API.
+# Requires the rl-server to be running (from tinker-atropos package).
+
+# Tinker API Key - RL training service
+# Get at: https://tinker-console.thinkingmachines.ai/keys
+TINKER_API_KEY=
+
+# Weights & Biases API Key - Experiment tracking and metrics
+# Get at: https://wandb.ai/authorize
+WANDB_API_KEY=
+
+# RL API Server URL (default: http://localhost:8080)
+# Change if running the rl-server on a different host/port
+# RL_API_URL=http://localhost:8080
--- a/.gitignore
+++ b/.gitignore
@@ -42,23 +42,3 @@ images/

 # CLI config (may contain sensitive SSH paths)
 cli-config.yaml
-
-.DS_Store
-
-# artifacts
-*.jsonl
-*.html
-*.json
-*.log
-*.csv
-
-# Singularity/Apptainer images (large binary files)
-*.sif
-
-# Test files
-test_singularity_*.py
-test_*.py
-!tests/test_*.py
-
-# Nomad data
-/tmp/NomadClient*/
--- a/.gitmodules
+++ b/.gitmodules
@@ -1,3 +1,6 @@
 [submodule "mini-swe-agent"]
 	path = mini-swe-agent
 	url = https://github.com/SWE-agent/mini-swe-agent
+[submodule "tinker-atropos"]
+	path = tinker-atropos
+	url = https://github.com/nousresearch/tinker-atropos
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -0,0 +1,533 @@
+# Hermes Agent - Development Guide
+
+Instructions for AI coding assistants (GitHub Copilot, Cursor, etc.) and human developers.
+
+Hermes-Agent is an AI agent harness with tool-calling capabilities, interactive CLI, messaging integrations, and scheduled tasks.
+
+## Development Environment
+
+**IMPORTANT**: Always use the virtual environment if it exists:
+```bash
+source venv/bin/activate  # Before running any Python commands
+```
+
+## Project Structure
+
+```
+hermes-agent/
+├── hermes_cli/           # Unified CLI commands
+│   ├── main.py           # Entry point, command dispatcher
+│   ├── setup.py          # Interactive setup wizard
+│   ├── config.py         # Config management & migration
+│   ├── status.py         # Status display
+│   ├── doctor.py         # Diagnostics
+│   ├── gateway.py        # Gateway management
+│   ├── uninstall.py      # Uninstaller
+│   └── cron.py           # Cron job management
+├── tools/                # Tool implementations
+├── gateway/              # Messaging platform adapters
+├── cron/                 # Scheduler implementation
+├── skills/               # Knowledge documents
+├── cli.py                # Interactive CLI (Rich UI)
+├── run_agent.py          # Agent runner with AIAgent class
+├── model_tools.py        # Tool schemas and handlers
+├── toolsets.py           # Tool groupings
+├── toolset_distributions.py  # Probability-based tool selection
+└── batch_runner.py       # Parallel batch processing
+```
+
+**User Configuration** (stored in `~/.hermes/`):
+- `~/.hermes/config.yaml` - Settings (model, terminal, toolsets, etc.)
+- `~/.hermes/.env` - API keys and secrets
+
+## File Dependency Chain
+
+```
+tools/*.py → tools/__init__.py → model_tools.py → toolsets.py → toolset_distributions.py
+                                       ↑
+run_agent.py ──────────────────────────┘
+cli.py → run_agent.py (uses AIAgent with quiet_mode=True)
+batch_runner.py → run_agent.py + toolset_distributions.py
+```
+
+Always ensure consistency between tools, model_tools.py, and toolsets.py when changing any of them.
+
+---
+
+## AIAgent Class
+
+The main agent is implemented in `run_agent.py`:
+
+```python
+class AIAgent:
+    def __init__(
+        self,
+        model: str = "anthropic/claude-sonnet-4",
+        api_key: str = None,
+        base_url: str = "https://openrouter.ai/api/v1",
+        max_iterations: int = 60,        # Max tool-calling loops
+        enabled_toolsets: list = None,
+        disabled_toolsets: list = None,
+        verbose_logging: bool = False,
+        quiet_mode: bool = False,         # Suppress progress output
+        tool_progress_callback: callable = None,  # Called on each tool use
+    ):
+        # Initialize OpenAI client, load tools based on toolsets
+        ...
+    
+    def chat(self, user_message: str, task_id: str = None) -> str:
+        # Main entry point - runs the agent loop
+        ...
+```
+
+### Agent Loop
+
+The core loop in `_run_agent_loop()`:
+
+```
+1. Add user message to conversation
+2. Call LLM with tools
+3. If LLM returns tool calls:
+   - Execute each tool
+   - Add tool results to conversation
+   - Go to step 2
+4. If LLM returns text response:
+   - Return response to user
+```
+
+```python
+while turns < max_turns:
+    response = client.chat.completions.create(
+        model=model,
+        messages=messages,
+        tools=tool_schemas,
+    )
+    
+    if response.tool_calls:
+        for tool_call in response.tool_calls:
+            result = await execute_tool(tool_call)
+            messages.append(tool_result_message(result))
+        turns += 1
+    else:
+        return response.content
+```
+
+### Conversation Management
+
+Messages are stored as a list of dicts following OpenAI format:
+
+```python
+messages = [
+    {"role": "system", "content": "You are a helpful assistant..."},
+    {"role": "user", "content": "Search for Python tutorials"},
+    {"role": "assistant", "content": None, "tool_calls": [...]},
+    {"role": "tool", "tool_call_id": "...", "content": "..."},
+    {"role": "assistant", "content": "Here's what I found..."},
+]
+```
+
+### Reasoning Model Support
+
+For models that support chain-of-thought reasoning:
+- Extract `reasoning_content` from API responses
+- Store in `assistant_msg["reasoning"]` for trajectory export
+- Pass back via `reasoning_content` field on subsequent turns
+
+---
+
+## CLI Architecture (cli.py)
+
+The interactive CLI uses:
+- **Rich** - For the welcome banner and styled panels
+- **prompt_toolkit** - For fixed input area with history and `patch_stdout`
+- **KawaiiSpinner** (in run_agent.py) - Animated feedback during API calls and tool execution
+
+Key components:
+- `HermesCLI` class - Main CLI controller with commands and conversation loop
+- `load_cli_config()` - Loads config, sets environment variables for terminal
+- `build_welcome_banner()` - Displays ASCII art logo, tools, and skills summary
+- `/commands` - Process user commands like `/help`, `/clear`, `/personality`, etc.
+
+CLI uses `quiet_mode=True` when creating AIAgent to suppress verbose logging.
+
+### Adding CLI Commands
+
+1. Add to `COMMANDS` dict with description
+2. Add handler in `process_command()` method
+3. For persistent settings, use `save_config_value()` to update config
+
+---
+
+## Hermes CLI Commands
+
+The unified `hermes` command provides all functionality:
+
+| Command | Description |
+|---------|-------------|
+| `hermes` | Interactive chat (default) |
+| `hermes chat -q "..."` | Single query mode |
+| `hermes setup` | Configure API keys and settings |
+| `hermes config` | View current configuration |
+| `hermes config edit` | Open config in editor |
+| `hermes config set KEY VAL` | Set a specific value |
+| `hermes config check` | Check for missing config |
+| `hermes config migrate` | Prompt for missing config interactively |
+| `hermes status` | Show configuration status |
+| `hermes doctor` | Diagnose issues |
+| `hermes update` | Update to latest (checks for new config) |
+| `hermes uninstall` | Uninstall (can keep configs for reinstall) |
+| `hermes gateway` | Start messaging gateway |
+| `hermes cron list` | View scheduled jobs |
+| `hermes version` | Show version info |
+
+---
+
+## Messaging Gateway
+
+The gateway connects Hermes to Telegram, Discord, and WhatsApp.
+
+### Configuration (in `~/.hermes/.env`):
+
+```bash
+# Telegram
+TELEGRAM_BOT_TOKEN=123456:ABC-DEF...      # From @BotFather
+TELEGRAM_ALLOWED_USERS=123456789,987654   # Comma-separated user IDs (from @userinfobot)
+
+# Discord  
+DISCORD_BOT_TOKEN=MTIz...                 # From Developer Portal
+DISCORD_ALLOWED_USERS=123456789012345678  # Comma-separated user IDs
+
+# Agent Behavior
+HERMES_MAX_ITERATIONS=60                  # Max tool-calling iterations
+MESSAGING_CWD=/home/myuser                # Terminal working directory for messaging
+
+# Tool Progress (optional)
+HERMES_TOOL_PROGRESS=true                 # Send progress messages
+HERMES_TOOL_PROGRESS_MODE=new             # "new" or "all"
+```
+
+### Working Directory Behavior
+
+- **CLI (`hermes` command)**: Uses current directory (`.` → `os.getcwd()`)
+- **Messaging (Telegram/Discord)**: Uses `MESSAGING_CWD` (default: home directory)
+
+This is intentional: CLI users are in a terminal and expect the agent to work in their current directory, while messaging users need a consistent starting location.
+
+### Security (User Allowlists):
+
+**IMPORTANT**: Without an allowlist, anyone who finds your bot can use it!
+
+The gateway checks `{PLATFORM}_ALLOWED_USERS` environment variables:
+- If set: Only listed user IDs can interact with the bot
+- If unset: All users are allowed (dangerous with terminal access!)
+
+Users can find their IDs:
+- **Telegram**: Message [@userinfobot](https://t.me/userinfobot)
+- **Discord**: Enable Developer Mode, right-click name → Copy ID
+
+### Tool Progress Notifications
+
+When `HERMES_TOOL_PROGRESS=true`, the bot sends status messages as it works:
+- `💻 \`ls -la\`...` (terminal commands show the actual command)
+- `🔍 web_search...`
+- `📄 web_extract...`
+
+Modes:
+- `new`: Only when switching to a different tool (less spam)
+- `all`: Every single tool call
+
+### Typing Indicator
+
+The gateway keeps the "typing..." indicator active throughout processing, refreshing every 4 seconds. This lets users know the bot is working even during long tool-calling sequences.
+
+### Platform Toolsets:
+
+Each platform has a dedicated toolset in `toolsets.py`:
+- `hermes-telegram`: Full tools including terminal (with safety checks)
+- `hermes-discord`: Full tools including terminal
+- `hermes-whatsapp`: Full tools including terminal
+
+---
+
+## Configuration System
+
+Configuration files are stored in `~/.hermes/` for easy user access:
+- `~/.hermes/config.yaml` - All settings (model, terminal, compression, etc.)
+- `~/.hermes/.env` - API keys and secrets
+
+### Adding New Configuration Options
+
+When adding new configuration variables, you MUST follow this process:
+
+#### For config.yaml options:
+
+1. Add to `DEFAULT_CONFIG` in `hermes_cli/config.py`
+2. **CRITICAL**: Bump `_config_version` in `DEFAULT_CONFIG` when adding required fields
+3. This triggers migration prompts for existing users on next `hermes update` or `hermes setup`
+
+Example:
+```python
+DEFAULT_CONFIG = {
+    # ... existing config ...
+    
+    "new_feature": {
+        "enabled": True,
+        "option": "default_value",
+    },
+    
+    # BUMP THIS when adding required fields
+    "_config_version": 2,  # Was 1, now 2
+}
+```
+
+#### For .env variables (API keys/secrets):
+
+1. Add to `REQUIRED_ENV_VARS` or `OPTIONAL_ENV_VARS` in `hermes_cli/config.py`
+2. Include metadata for the migration system:
+
+```python
+OPTIONAL_ENV_VARS = {
+    # ... existing vars ...
+    "NEW_API_KEY": {
+        "description": "What this key is for",
+        "prompt": "Display name in prompts",
+        "url": "https://where-to-get-it.com/",
+        "tools": ["tools_it_enables"],  # What tools need this
+        "password": True,  # Mask input
+    },
+}
+```
+
+#### Update related files:
+
+- `hermes_cli/setup.py` - Add prompts in the setup wizard
+- `cli-config.yaml.example` - Add example with comments
+- Update README.md if user-facing
+
+### Config Version Migration
+
+The system uses `_config_version` to detect outdated configs:
+
+1. `check_for_missing_config()` compares user config to `DEFAULT_CONFIG`
+2. `migrate_config()` interactively prompts for missing values
+3. Called automatically by `hermes update` and optionally by `hermes setup`
+
+---
+
+## Environment Variables
+
+API keys are loaded from `~/.hermes/.env`:
+- `OPENROUTER_API_KEY` - Main LLM API access (primary provider)
+- `FIRECRAWL_API_KEY` - Web search/extract tools
+- `BROWSERBASE_API_KEY` / `BROWSERBASE_PROJECT_ID` - Browser automation
+- `FAL_KEY` - Image generation (FLUX model)
+- `NOUS_API_KEY` - Vision and Mixture-of-Agents tools
+
+Terminal tool configuration (in `~/.hermes/config.yaml`):
+- `terminal.backend` - Backend: local, docker, singularity, modal, or ssh
+- `terminal.cwd` - Working directory for CLI ("." = current directory)
+- `terminal.docker_image` - Image for Docker backend
+- `terminal.singularity_image` - Image for Singularity backend
+- `terminal.modal_image` - Image for Modal backend
+- SSH: `TERMINAL_SSH_HOST`, `TERMINAL_SSH_USER`, `TERMINAL_SSH_KEY` in .env
+
+Agent behavior (in `~/.hermes/.env`):
+- `HERMES_MAX_ITERATIONS` - Max tool-calling iterations (default: 60)
+- `MESSAGING_CWD` - Working directory for messaging platforms (default: ~)
+- `HERMES_TOOL_PROGRESS` - Enable tool progress messages (`true`/`false`)
+- `HERMES_TOOL_PROGRESS_MODE` - Progress mode: `new` (tool changes) or `all`
+
+### Dangerous Command Approval
+
+The terminal tool includes safety checks for potentially destructive commands (e.g., `rm -rf`, `DROP TABLE`, `chmod 777`, etc.):
+
+**Behavior by Backend:**
+- **Docker/Singularity/Modal**: Commands run unrestricted (isolated containers)
+- **Local/SSH**: Dangerous commands trigger approval flow
+
+**Approval Flow (CLI):**
+```
+⚠️  Potentially dangerous command detected: recursive delete
+    rm -rf /tmp/test
+
+    [o]nce  |  [s]ession  |  [a]lways  |  [d]eny
+    Choice [o/s/a/D]: 
+```
+
+**Approval Flow (Messaging):**
+- Command is blocked with explanation
+- Agent explains the command was blocked for safety
+- User must add the pattern to their allowlist via `hermes config edit` or run the command directly on their machine
+
+**Configuration:**
+- `command_allowlist` in `~/.hermes/config.yaml` stores permanently allowed patterns
+- Add patterns via "always" approval or edit directly
+
+**Sudo Handling (Messaging):**
+- If sudo fails over messaging, output includes tip to add `SUDO_PASSWORD` to `~/.hermes/.env`
+
+---
+
+## Adding New Tools
+
+Follow this strict order to maintain consistency:
+
+1. Create `tools/your_tool.py` with:
+   - Handler function (sync or async) returning a JSON string via `json.dumps()`
+   - `check_*_requirements()` function to verify dependencies (e.g., API keys)
+   - Schema definition following OpenAI function-calling format
+
+2. Export in `tools/__init__.py`:
+   - Import the handler and check function
+   - Add to `__all__` list
+
+3. Register in `model_tools.py`:
+   - Add to `TOOLSET_REQUIREMENTS` if it needs API keys
+   - Create `get_*_tool_definitions()` function or add to existing
+   - Add routing in `handle_function_call()` dispatcher
+   - Update `get_all_tool_names()` with the tool name
+   - Update `get_toolset_for_tool()` mapping
+   - Update `get_available_toolsets()` and `check_toolset_requirements()`
+
+4. Add to toolset in `toolsets.py`:
+   - Add to existing toolset or create new one in TOOLSETS dict
+
+5. If the tool requires an API key:
+   - Add to `OPTIONAL_ENV_VARS` in `hermes_cli/config.py`
+   - The tool will be auto-disabled if the key is missing
+
+6. Optionally add to `toolset_distributions.py` for batch processing
+
+### Tool Implementation Pattern
+
+```python
+# tools/example_tool.py
+import json
+import os
+
+def check_example_requirements() -> bool:
+    """Check if required API keys/dependencies are available."""
+    return bool(os.getenv("EXAMPLE_API_KEY"))
+
+def example_tool(param: str, task_id: str = None) -> str:
+    """Execute the tool and return JSON string result."""
+    try:
+        result = {"success": True, "data": "..."}
+        return json.dumps(result, ensure_ascii=False)
+    except Exception as e:
+        return json.dumps({"error": str(e)}, ensure_ascii=False)
+```
+
+All tool handlers MUST return a JSON string. Never return raw dicts.
+
+### Dynamic Tool Availability
+
+Tools are automatically disabled when their API keys are missing:
+
+```python
+# In model_tools.py
+TOOLSET_REQUIREMENTS = {
+    "web": {"env_vars": ["FIRECRAWL_API_KEY"]},
+    "browser": {"env_vars": ["BROWSERBASE_API_KEY", "BROWSERBASE_PROJECT_ID"]},
+    "creative": {"env_vars": ["FAL_KEY"]},
+}
+```
+
+The `check_tool_availability()` function determines which tools to include.
+
+### Stateful Tools
+
+Tools that maintain state (terminal, browser) require:
+- `task_id` parameter for session isolation between concurrent tasks
+- `cleanup_*()` function to release resources
+- Cleanup is called automatically in run_agent.py after conversation completes
+
+---
+
+## Trajectory Format
+
+Conversations are saved in ShareGPT format for training:
+```json
+{"from": "system", "value": "System prompt with <tools>...</tools>"}
+{"from": "human", "value": "User message"}
+{"from": "gpt", "value": "<think>reasoning</think>\n<tool_call>{...}</tool_call>"}
+{"from": "tool", "value": "<tool_response>{...}</tool_response>"}
+{"from": "gpt", "value": "Final response"}
+```
+
+Tool calls use `<tool_call>` XML tags, responses use `<tool_response>` tags, reasoning uses `<think>` tags.
+
+### Trajectory Export
+
+```python
+agent = AIAgent(save_trajectories=True)
+agent.chat("Do something")
+# Saves to trajectories/*.jsonl in ShareGPT format
+```
+
+---
+
+## Batch Processing (batch_runner.py)
+
+For processing multiple prompts:
+- Parallel execution with multiprocessing
+- Content-based resume for fault tolerance (matches on prompt text, not indices)
+- Toolset distributions control probabilistic tool availability per prompt
+- Output: `data/<run_name>/trajectories.jsonl` (combined) + individual batch files
+
+```bash
+python batch_runner.py \
+    --dataset_file=prompts.jsonl \
+    --batch_size=20 \
+    --num_workers=4 \
+    --run_name=my_run
+```
+
+---
+
+## Skills System
+
+Skills are on-demand knowledge documents the agent can load. Located in `skills/` directory:
+
+```
+skills/
+├── mlops/                    # Category folder
+│   ├── axolotl/             # Skill folder
+│   │   ├── SKILL.md         # Main instructions (required)
+│   │   ├── references/      # Additional docs, API specs
+│   │   └── templates/       # Output formats, configs
+│   └── vllm/
+│       └── SKILL.md
+└── example-skill/
+    └── SKILL.md
+```
+
+**Progressive disclosure** (token-efficient):
+1. `skills_categories()` - List category names (~50 tokens)
+2. `skills_list(category)` - Name + description per skill (~3k tokens)
+3. `skill_view(name)` - Full content + tags + linked files
+
+SKILL.md files use YAML frontmatter:
+```yaml
+---
+name: skill-name
+description: Brief description for listing
+tags: [tag1, tag2]
+related_skills: [other-skill]
+version: 1.0.0
+---
+# Skill Content...
+```
+
+Tool files: `tools/skills_tool.py` → `model_tools.py` → `toolsets.py`
+
+---
+
+## Testing Changes
+
+After making changes:
+
+1. Run `hermes doctor` to check setup
+2. Run `hermes config check` to verify config
+3. Test with `hermes chat -q "test message"`
+4. For new config options, test fresh install: `rm -rf ~/.hermes && hermes setup`
--- a/README.md
+++ b/README.md
--- a/TODO.md
+++ b/TODO.md
@@ -4,84 +4,6 @@

 ---

-## 🚨 HIGH PRIORITY - Immediate Fixes
-
-These items need to be addressed ASAP:
-
-### 1. SUDO Breaking Terminal Tool 🔐 ✅ COMPLETE
- [x] **Problem:** SUDO commands break the terminal tool execution (hangs indefinitely)
- [x] **Fix:** Created custom environment wrappers in `tools/terminal_tool.py`
-  - `stdin=subprocess.DEVNULL` prevents hanging on interactive prompts
-  - Sudo fails gracefully with clear error if no password configured
-  - Same UX as Claude Code - agent sees error, tells user to run it themselves
- [x] **All 5 environments now have consistent behavior:**
-  - `_LocalEnvironment` - local execution
-  - `_DockerEnvironment` - Docker containers
-  - `_SingularityEnvironment` - Singularity/Apptainer containers
-  - `_ModalEnvironment` - Modal cloud sandboxes
-  - `_SSHEnvironment` - remote SSH execution
- [x] **Optional sudo support via `SUDO_PASSWORD` env var:**
-  - Shared `_transform_sudo_command()` helper used by all environments
-  - If set, auto-transforms `sudo cmd` → pipes password via `sudo -S`
-  - Documented in `.env.example`, `cli-config.yaml`, and README
-  - Works for chained commands: `cmd1 && sudo cmd2`
- [x] **Interactive sudo prompt in CLI mode:**
-  - When sudo detected and no password configured, prompts user
-  - 45-second timeout (auto-skips if no input)
-  - Hidden password input via `getpass` (password not visible)
-  - Password cached for session (don't ask repeatedly)
-  - Spinner pauses during prompt for clean UX
-  - Uses `HERMES_INTERACTIVE` env var to detect CLI mode
-
-### 2. Fix `browser_get_images` Tool 🖼️ ✅ VERIFIED WORKING
- [x] **Tested:** Tool works correctly on multiple sites
- [x] **Results:** Successfully extracts image URLs, alt text, dimensions
- [x] **Note:** Some sites (Pixabay, etc.) have Cloudflare bot protection that blocks headless browsers - this is expected behavior, not a bug
-
-### 3. Better Action Logging for Debugging 📝 ✅ COMPLETE
- [x] **Problem:** Need better logging of agent actions for debugging
- [x] **Implementation:**
-  - Save full session trajectories to `logs/` directory as JSON
-  - Each session gets a unique file: `session_YYYYMMDD_HHMMSS_UUID.json`
-  - Logs all messages, tool calls with inputs/outputs, timestamps
-  - Structured JSON format for easy parsing and replay
-  - Automatic on CLI runs (configurable)
-
-### 4. Stream Thinking Summaries in Real-Time 💭 ⏸️ DEFERRED
- [ ] **Problem:** Thinking/reasoning summaries not shown while streaming
- [ ] **Complexity:** This is a significant refactor - leaving for later
-
-**OpenRouter Streaming Info:**
- Uses `stream=True` with OpenAI SDK
- Reasoning comes in `choices[].delta.reasoning_details` chunks
- Types: `reasoning.summary`, `reasoning.text`, `reasoning.encrypted`
- Tool call arguments stream as partial JSON (need accumulation)
- Items paradigm: same ID emitted multiple times with updated content
-
-**Key Challenges:**
- Tool call JSON accumulation (partial `{"query": "wea` → `{"query": "weather"}`)
- Multiple concurrent outputs (thinking + tool calls + text simultaneously)
- State management for partial responses
- Error handling if connection drops mid-stream
- Deciding when tool calls are "complete" enough to execute
-
-**UX Questions to Resolve:**
- Show raw thinking text or summarized?
- Live expanding text vs. spinner replacement?
- Markdown rendering while streaming?
- How to handle thinking + tool call display simultaneously?
-
-**Implementation Options:**
- New `run_conversation_streaming()` method (keep non-streaming as fallback)
- Wrapper that handles streaming internally
- Big refactor of existing `run_conversation()`
-
-**References:**
- https://openrouter.ai/docs/api/reference/streaming
- https://openrouter.ai/docs/guides/best-practices/reasoning-tokens#streaming-response
-
---
-
 ## 1. Subagent Architecture (Context Isolation) 🎯

 **Problem:** Long-running tools (terminal commands, browser automation, complex file operations) consume massive context. A single `ls -la` can add hundreds of lines. Browser snapshots, debugging sessions, and iterative terminal work quickly bloat the main conversation, leaving less room for actual reasoning.
@@ -160,87 +82,48 @@ These items need to be addressed ASAP:

 ---

-## 2. Context Management (complements Subagents)
+## 2. Planning & Task Management 📋

-**Problem:** Context grows unbounded during long conversations. Trajectory compression exists for training data post-hoc, but live conversations lack intelligent context management.
+**Problem:** Agent handles tasks reactively without explicit planning. Complex multi-step tasks lack structure, progress tracking, and the ability to decompose work into manageable chunks.

 **Ideas:**
- [ ] **Incremental summarization** - Compress old tool outputs on-the-fly during conversations
-  - Trigger when context exceeds threshold (e.g., 80% of max tokens)
-  - Preserve recent turns fully, summarize older tool responses
-  - Could reuse logic from `trajectory_compressor.py`
+- [ ] **Task decomposition tool** - Break complex requests into subtasks:
+  ```
+  User: "Set up a new Python project with FastAPI, tests, and Docker"
  
- [ ] **Semantic memory retrieval** - Vector store for long conversation recall
-  - Embed important facts/findings as conversation progresses
-  - Retrieve relevant memories when needed instead of keeping everything in context
-  - Consider lightweight solutions: ChromaDB, FAISS, or even a simple embedding cache
+  Agent creates plan:
+  ├── 1. Create project structure and requirements.txt
+  ├── 2. Implement FastAPI app skeleton
+  ├── 3. Add pytest configuration and initial tests
+  ├── 4. Create Dockerfile and docker-compose.yml
+  └── 5. Verify everything works together
+  ```
+  - Each subtask becomes a trackable unit
+  - Agent can report progress: "Completed 3/5 tasks"
  
- [ ] **Working vs. episodic memory** distinction
-  - Working memory: Current task state, recent tool results (always in context)
-  - Episodic memory: Past findings, tried approaches (retrieved on demand)
-  - Clear eviction policies for each
+- [ ] **Progress checkpoints** - Periodic self-assessment:
+  - After N tool calls or time elapsed, pause to evaluate
+  - "What have I accomplished? What remains? Am I on track?"
+  - Detect if stuck in loops or making no progress
+  - Could trigger replanning if approach isn't working
+  
+- [ ] **Explicit plan storage** - Persist plan in conversation:
+  - Store as structured data (not just in context)
+  - Update status as tasks complete
+  - User can ask "What's the plan?" or "What's left?"
+  - Survives context compression (plans are protected)

-**Files to modify:** `run_agent.py` (add memory manager), possibly new `tools/memory_tool.py`
+- [ ] **Failure recovery with replanning** - When things go wrong:
+  - Record what failed and why
+  - Revise plan to work around the issue
+  - "Step 3 failed because X, adjusting approach to Y"
+  - Prevents repeating failed strategies
+
+**Files to modify:** `run_agent.py` (add planning hooks), new `tools/planning_tool.py`

 ---

-## 3. Self-Reflection & Course Correction 🔄
-
-**Problem:** Current retry logic handles malformed outputs but not semantic failures. Agent doesn't reason about *why* something failed.
-
-**Ideas:**
- [ ] **Meta-reasoning after failures** - When a tool returns an error or unexpected result:
-  ```
-  Tool failed → Reflect: "Why did this fail? What assumptions were wrong?"
-  → Adjust approach → Retry with new strategy
-  ```
-  - Could be a lightweight LLM call or structured self-prompt
-  
- [ ] **Planning/replanning module** - For complex multi-step tasks:
-  - Generate plan before execution
-  - After each step, evaluate: "Am I on track? Should I revise the plan?"
-  - Store plan in working memory, update as needed
-  
- [ ] **Approach memory** - Remember what didn't work:
-  - "I tried X for this type of problem and it failed because Y"
-  - Prevents repeating failed strategies in the same conversation
-
-**Files to modify:** `run_agent.py` (add reflection hooks in tool loop), new `tools/reflection_tool.py`
-
---
-
-## 4. Tool Composition & Learning 🔧
-
-**Problem:** Tools are atomic. Complex tasks require repeated manual orchestration of the same tool sequences.
-
-**Ideas:**
- [ ] **Macro tools / Tool chains** - Define reusable tool sequences:
-  ```yaml
-  research_topic:
-    description: "Deep research on a topic"
-    steps:
-      - web_search: {query: "$topic"}
-      - web_extract: {urls: "$search_results.urls[:3]"}
-      - summarize: {content: "$extracted"}
-  ```
-  - Could be defined in skills or a new `macros/` directory
-  - Agent can invoke macro as single tool call
-  
- [ ] **Tool failure patterns** - Learn from failures:
-  - Track: tool, input pattern, error type, what worked instead
-  - Before calling a tool, check: "Has this pattern failed before?"
-  - Persistent across sessions (stored in skills or separate DB)
-  
- [ ] **Parallel tool execution** - When tools are independent, run concurrently:
-  - Detect independence (no data dependencies between calls)
-  - Use `asyncio.gather()` for parallel execution
-  - Already have async support in some tools, just need orchestration
-
-**Files to modify:** `model_tools.py`, `toolsets.py`, new `tool_macros.py`
-
---
-
-## 5. Dynamic Skills Expansion 📚
+## 3. Dynamic Skills Expansion 📚

 **Problem:** Skills system is elegant but static. Skills must be manually created and added.

@@ -269,21 +152,7 @@ These items need to be addressed ASAP:

 ---

-## 6. Task Continuation Hints 🎯
-
-**Problem:** Could be more helpful by suggesting logical next steps.
-
-**Ideas:**
- [ ] **Suggest next steps** - At end of a task, suggest logical continuations:
-  - "Code is written. Want me to also write tests / docs / deploy?"
-  - Based on common workflows for task type
-  - Non-intrusive, just offer options
-
-**Files to modify:** `run_agent.py`, response generation logic
-
---
-
-## 7. Interactive Clarifying Questions Tool ❓
+## 4. Interactive Clarifying Questions Tool ❓

 **Problem:** Agent sometimes makes assumptions or guesses when it should ask the user. Currently can only ask via text, which gets lost in long outputs.

@@ -319,25 +188,7 @@ These items need to be addressed ASAP:

 ---

-## 8. Resource Awareness & Efficiency 💰
-
-**Problem:** No awareness of costs, time, or resource usage. Could be smarter about efficiency.
-
-**Ideas:**
- [ ] **Tool result caching** - Don't repeat identical operations:
-  - Cache web searches, extractions within a session
-  - Invalidation based on time-sensitivity of query
-  - Hash-based lookup: same input → cached output
-
- [ ] **Lazy evaluation** - Don't fetch everything upfront:
-  - Get summaries first, full content only if needed
-  - "I found 5 relevant pages. Want me to deep-dive on any?"
-
-**Files to modify:** `model_tools.py`, new `resource_tracker.py`
-
---
-
-## 9. Collaborative Problem Solving 🤝
+## 5. Collaborative Problem Solving 🤝

 **Problem:** Interaction is command/response. Complex problems benefit from dialogue.

@@ -356,7 +207,7 @@ These items need to be addressed ASAP:

 ---

-## 10. Project-Local Context 💾
+## 6. Project-Local Context 💾

 **Problem:** Valuable context lost between sessions.

@@ -374,30 +225,7 @@ These items need to be addressed ASAP:

 **Files to modify:** New `project_context.py`, auto-load in `run_agent.py`

---
-
-## 11. Graceful Degradation & Robustness 🛡️
-
-**Problem:** When things go wrong, recovery is limited. Should fail gracefully.
-
-**Ideas:**
- [ ] **Fallback chains** - When primary approach fails, have backups:
-  - `web_extract` fails → try `browser_navigate` → try `web_search` for cached version
-  - Define fallback order per tool type
-  
- [ ] **Partial progress preservation** - Don't lose work on failure:
-  - Long task fails midway → save what we've got
-  - "I completed 3/5 steps before the error. Here's what I have..."
-  
- [ ] **Self-healing** - Detect and recover from bad states:
-  - Browser stuck → close and retry
-  - Terminal hung → timeout and reset
-
-**Files to modify:** `model_tools.py`, tool implementations, new `fallback_manager.py`
-
---
-
-## 12. Tools & Skills Wishlist 🧰
+## 6. Tools & Skills Wishlist 🧰

 *Things that would need new tool implementations (can't do well with current tools):*

@@ -464,7 +292,7 @@ These items need to be addressed ASAP:

 ---

-## 13. Messaging Platform Integrations 💬
+## 7. Messaging Platform Integrations 💬 ✅ COMPLETE

 **Problem:** Agent currently only works via `cli.py` which requires direct terminal access. Users may want to interact via messaging apps from their phone or other devices.

@@ -485,75 +313,41 @@ These items need to be addressed ASAP:
 ```

 **Platform support (each user sets up their own credentials):**
- [ ] **Telegram** - via `python-telegram-bot` or `grammy` equivalent
+- [x] **Telegram** - via `python-telegram-bot`
  - Bot token from @BotFather
  - Easiest to set up, good for personal use
- [ ] **Discord** - via `discord.py`
+- [x] **Discord** - via `discord.py`
  - Bot token from Discord Developer Portal
  - Can work in servers (group sessions) or DMs
- [ ] **WhatsApp** - via `baileys` (WhatsApp Web protocol)
-  - QR code scan to authenticate
+- [x] **WhatsApp** - via Node.js bridge (whatsapp-web.js/baileys)
+  - Requires Node.js bridge setup
  - More complex, but reaches most people

 **Session management:**
- [ ] **Session store** - JSONL persistence per session key
-  - `~/.hermes/sessions/{session_key}.jsonl`
-  - Session keys: `telegram:dm:{user_id}`, `discord:channel:{id}`, etc.
- [ ] **Session expiry** - Configurable reset policies
-  - Daily reset (default 4am) OR idle timeout (e.g., 2 hours)
+- [x] **Session store** - JSONL persistence per session key
+  - `~/.hermes/sessions/{session_id}.jsonl`
+  - Session keys: `agent:main:telegram:dm`, `agent:main:discord:group:123`, etc.
+- [x] **Session expiry** - Configurable reset policies
+  - Daily reset (default 4am) OR idle timeout (default 2 hours)
  - Manual reset via `/reset` or `/new` command in chat
- [ ] **Session continuity** - Conversations persist across messages until reset
+  - Per-platform and per-type overrides
+- [x] **Session continuity** - Conversations persist across messages until reset

-**Files to create:** `monitors/telegram_monitor.py`, `monitors/discord_monitor.py`, `monitors/session_store.py`
+**Files created:** `gateway/`, `gateway/platforms/`, `gateway/config.py`, `gateway/session.py`, `gateway/delivery.py`, `gateway/run.py`
+
+**Configuration:**
+- Environment variables: `TELEGRAM_BOT_TOKEN`, `DISCORD_BOT_TOKEN`, etc.
+- Config file: `~/.hermes/gateway.json`
+- CLI commands: `/platforms` to check status, `--gateway` to start
+
+**Dynamic context injection:**
+- Agent knows its source platform and chat
+- Agent knows connected platforms and home channels
+- Agent can deliver cron outputs to specific platforms

 ---

-## 14. Scheduled Tasks / Cron Jobs ⏰
-
-**Problem:** Agent only runs on-demand. Some tasks benefit from scheduled execution (daily summaries, monitoring, reminders).
-
-**Ideas:**
- [ ] **Cron-style scheduler** - Run agent turns on a schedule
-  - Store jobs in `~/.hermes/cron/jobs.json`
-  - Each job: `{ id, schedule, prompt, session_mode, delivery }`
-  - Uses APScheduler or similar Python library
-  
- [ ] **Session modes:**
-  - `isolated` - Fresh session each run (no history, clean context)
-  - `main` - Append to main session (agent remembers previous scheduled runs)
-  
- [ ] **Delivery options:**
-  - Write output to file (`~/.hermes/cron/output/{job_id}/{timestamp}.md`)
-  - Send to messaging channel (if integrations enabled)
-  - Both
-  
- [ ] **CLI interface:**
-  ```bash
-  # List scheduled jobs
-  python cli.py --cron list
-  
-  # Add a job (runs daily at 9am)
-  python cli.py --cron add "Summarize my email inbox" --schedule "0 9 * * *"
-  
-  # Quick syntax for simple intervals  
-  python cli.py --cron add "Check server status" --every 30m
-  
-  # Remove a job
-  python cli.py --cron remove <job_id>
-  ```
-
- [ ] **Agent self-scheduling** - Let the agent create its own cron jobs
-  - New tool: `schedule_task(prompt, schedule, session_mode)`
-  - "Remind me to check the deployment tomorrow at 9am"
-  - Agent can set follow-up tasks for itself
-
- [ ] **In-chat command:** `/cronjob {prompt} {frequency}` when using messaging integrations
-
-**Files to create:** `cron/scheduler.py`, `cron/jobs.py`, `tools/schedule_tool.py`
-
---
-
-## 15. Text-to-Speech (TTS) 🔊
+## 8. Text-to-Speech (TTS) 🔊

 **Problem:** Agent can only respond with text. Some users prefer audio responses (accessibility, hands-free use, podcasts).

@@ -584,7 +378,7 @@ These items need to be addressed ASAP:

 ---

-## 16. Speech-to-Text / Audio Transcription 🎤
+## 13. Speech-to-Text / Audio Transcription 🎤

 **Problem:** Users may want to send voice memos instead of typing. Agent is blind to audio content.

@@ -613,103 +407,6 @@ These items need to be addressed ASAP:

 **Files to create:** `tools/transcribe_tool.py`, integrate with messaging monitors

---
-
-## Priority Order (Suggested)
-
-1. **🎯 Subagent Architecture** - Critical for context management, enables everything else
-2. **Memory & Context Management** - Complements subagents for remaining context
-3. **Self-Reflection** - Improves reliability and reduces wasted tool calls  
-4. **Project-Local Context** - Practical win, keeps useful info across sessions
-5. **Messaging Integrations** - Unlocks mobile access, new interaction patterns
-6. **Scheduled Tasks / Cron Jobs** - Enables automation, reminders, monitoring
-7. **Tool Composition** - Quality of life, builds on other improvements
-8. **Dynamic Skills** - Force multiplier for repeated tasks
-9. **Interactive Clarifying Questions** - Better UX for ambiguous tasks
-10. **TTS / Audio Transcription** - Accessibility, hands-free use
-
---
-
-## Removed Items (Unrealistic)
-
-The following were removed because they're architecturally impossible:
-
- ~~Proactive suggestions / Prefetching~~ - Agent only runs on user request, can't interject
- ~~Clipboard integration~~ - No access to user's local system clipboard
-
-The following **moved to active TODO** (now possible with new architecture):
-
- ~~Session save/restore~~ → See **Messaging Integrations** (session persistence)
- ~~Voice/TTS playback~~ → See **TTS** (can generate audio files, send via messaging)
- ~~Set reminders~~ → See **Scheduled Tasks / Cron Jobs**
-
-The following were removed because they're **already possible**:
-
- ~~HTTP/API Client~~ → Use `curl` or Python `requests` in terminal
- ~~Structured Data Manipulation~~ → Use `pandas` in terminal
- ~~Git-Native Operations~~ → Use `git` CLI in terminal
- ~~Symbolic Math~~ → Use `SymPy` in terminal
- ~~Code Quality Tools~~ → Run linters (`eslint`, `black`, `mypy`) in terminal
- ~~Testing Framework~~ → Run `pytest`, `jest`, etc. in terminal
- ~~Translation~~ → LLM handles this fine, or use translation APIs
-
---
-
---
-
-## 🧪 Brainstorm Ideas (Not Yet Fleshed Out)
-
-*These are early-stage ideas that need more thinking before implementation. Captured here so they don't get lost.*
-
-### Remote/Distributed Execution 🌐
-
-**Concept:** Run agent on a powerful remote server while interacting from a thin client.
-
-**Why interesting:**
- Run on beefy GPU server for local LLM inference
- Agent has access to remote machine's resources (files, tools, internet)
- User interacts via lightweight client (phone, low-power laptop)
-
-**Open questions:**
- How does this differ from just SSH + running cli.py on remote?
- Would need secure communication channel (WebSocket? gRPC?)
- How to handle tool outputs that reference remote paths?
- Credential management for remote execution
- Latency considerations for interactive use
-
-**Possible architecture:**
-```
-┌─────────────┐         ┌─────────────────────────┐
-│ Thin Client │ ◄─────► │ Remote Hermes Server    │
-│ (phone/web) │  WS/API │ - Full agent + tools    │
-└─────────────┘         │ - GPU for local LLM     │
-                        │ - Access to server files│
-                        └─────────────────────────┘
-```
-
-**Related to:** Messaging integrations (could be the "server" that monitors receive from)
-
---
-
-### Multi-Agent Parallel Execution 🤖🤖
-
-**Concept:** Extension of Subagent Architecture (Section 1) - run multiple subagents in parallel.
-
-**Why interesting:**
- Independent subtasks don't need to wait for each other
- "Research X while setting up Y" - both run simultaneously
- Faster completion for complex multi-part tasks
-
-**Open questions:**
- How to detect which tasks are truly independent?
- Resource management (API rate limits, concurrent connections)
- How to merge results when parallel tasks have conflicts?
- Cost implications of multiple parallel LLM calls
-
-*Note: Basic subagent delegation (Section 1) should be implemented first, parallel execution is an optimization on top.*
-
---
-
 ### Plugin/Extension System 🔌

 **Concept:** Allow users to add custom tools/skills without modifying core code.
@@ -726,4 +423,167 @@ The following were removed because they're **already possible**:

 ---

+## Recently Completed ✅
+
+### Dangerous Command Approval System
+**Implemented:** Dangerous command detection and approval for terminal tool.
+
+**Features:**
+- Pattern-based detection of dangerous commands (rm -rf, DROP TABLE, chmod 777, etc.)
+- CLI prompt with options: `[o]nce | [s]ession | [a]lways | [d]eny`
+- Session caching (approved patterns don't re-prompt)
+- Permanent allowlist in `~/.hermes/config.yaml`
+- Force flag for agent to bypass after user confirmation
+- Skip check for isolated backends (Docker, Singularity, Modal)
+- Helpful sudo failure messages for messaging platforms
+
+**Files:** `tools/terminal_tool.py`, `model_tools.py`, `hermes_cli/config.py`
+
+---
+
+## 14. Learning Machine / Dynamic Memory System 🧠
+
+*Inspired by [Dash](~/agent-codebases/dash) - a self-learning data agent.*
+
+**Problem:** Agent starts fresh every session. Valuable learnings from debugging, error patterns, successful approaches, and user preferences are lost.
+
+**Dash's Key Insight:** Separate **Knowledge** (static, curated) from **Learnings** (dynamic, discovered):
+
+| System | What It Stores | How It Evolves |
+|--------|---------------|----------------|
+| **Knowledge** (Skills) | Validated approaches, templates, best practices | Curated by user |
+| **Learnings** | Error patterns, gotchas, discovered fixes | Managed automatically |
+
+**Tools to implement:**
+- [ ] `save_learning(topic, learning, context?)` - Record a discovered pattern
+  ```python
+  save_learning(
+    topic="python-ssl",
+    learning="On Ubuntu 22.04, SSL certificate errors often fixed by: apt install ca-certificates",
+    context="Debugging requests SSL failure"
+  )
+  ```
+- [ ] `search_learnings(query)` - Find relevant past learnings
+  ```python
+  search_learnings("SSL certificate error Python")
+  # Returns: "On Ubuntu 22.04, SSL certificate errors often fixed by..."
+  ```
+
+**User Profile & Memory:**
+- [ ] `user_profile` - Structured facts about user preferences
+  ```yaml
+  # ~/.hermes/user_profile.yaml
+  coding_style:
+    python_formatter: black
+    type_hints: always
+    test_framework: pytest
+  preferences:
+    verbosity: detailed
+    confirm_destructive: true
+  environment:
+    os: linux
+    shell: bash
+    default_python: 3.11
+  ```
+- [ ] `user_memory` - Unstructured observations the agent learns
+  ```yaml
+  # ~/.hermes/user_memory.yaml
+  - "User prefers tabs over spaces despite black's defaults"
+  - "User's main project is ~/work/myapp - a Django app"
+  - "User often works late - don't ask about timezone"
+  ```
+
+**When to learn:**
+- After fixing an error that took multiple attempts
+- When user corrects the agent's approach
+- When a workaround is discovered for a tool limitation
+- When user expresses a preference
+
+**Storage:** Vector database (ChromaDB) or simple YAML with embedding search.
+
+**Files to create:** `tools/learning_tools.py`, `learning/store.py`, `~/.hermes/learnings/`
+
+---
+
+## 15. Layered Context Architecture 📊
+
+*Inspired by Dash's "Six Layers of Context" - grounding responses in multiple sources.*
+
+**Problem:** Context sources are ad-hoc. No clear hierarchy or strategy for what context to include when.
+
+**Proposed Layers for Hermes:**
+
+| Layer | Source | When Loaded | Example |
+|-------|--------|-------------|---------|
+| 1. **Project Context** | `.hermes/context.md` | Auto on cwd | "This is a FastAPI project using PostgreSQL" |
+| 2. **Skills** | `skills/*.md` | On request | "How to set up React project" |
+| 3. **User Profile** | `~/.hermes/user_profile.yaml` | Always | "User prefers pytest, uses black" |
+| 4. **Learnings** | `~/.hermes/learnings/` | Semantic search | "SSL fix for Ubuntu" |
+| 5. **External Knowledge** | Web search, docs | On demand | Current API docs, Stack Overflow |
+| 6. **Runtime Introspection** | Tool calls | Real-time | File contents, terminal output |
+
+**Benefits:**
+- Clear mental model for what context is available
+- Prioritization: local > learned > external
+- Debugging: "Why did agent do X?" → check which layers contributed
+
+**Files to modify:** `run_agent.py` (context loading), new `context/layers.py`
+
+---
+
+## 16. Evaluation System with LLM Grading 📏
+
+*Inspired by Dash's evaluation framework.*
+
+**Problem:** `batch_runner.py` runs test cases but lacks quality assessment.
+
+**Dash's Approach:**
+- **String matching** (default) - Check if expected strings appear
+- **LLM grader** (-g flag) - GPT evaluates response quality
+- **Result comparison** (-r flag) - Compare against golden output
+
+**Implementation for Hermes:**
+
+- [ ] **Test case format:**
+  ```python
+  TestCase(
+    name="create_python_project",
+    prompt="Create a new Python project with FastAPI and tests",
+    expected_strings=["requirements.txt", "main.py", "test_"],  # Basic check
+    golden_actions=["write:main.py", "write:requirements.txt", "terminal:pip install"],
+    grader_criteria="Should create complete project structure with working code"
+  )
+  ```
+
+- [ ] **LLM grader mode:**
+  ```python
+  def grade_response(response: str, criteria: str) -> Grade:
+      """Use GPT to evaluate response quality."""
+      prompt = f"""
+      Evaluate this agent response against the criteria.
+      Criteria: {criteria}
+      Response: {response}
+      
+      Score (1-5) and explain why.
+      """
+      # Returns: Grade(score=4, explanation="Created all files but tests are minimal")
+  ```
+
+- [ ] **Action comparison mode:**
+  - Record tool calls made during test
+  - Compare against expected actions
+  - "Expected terminal call to pip install, got npm install"
+
+- [ ] **CLI flags:**
+  ```bash
+  python batch_runner.py eval test_cases.yaml       # String matching
+  python batch_runner.py eval test_cases.yaml -g    # + LLM grading
+  python batch_runner.py eval test_cases.yaml -r    # + Result comparison
+  python batch_runner.py eval test_cases.yaml -v    # Verbose (show responses)
+  ```
+
+**Files to modify:** `batch_runner.py`, new `evals/test_cases.py`, new `evals/grader.py`
+
+---
+
 *Last updated: $(date +%Y-%m-%d)* 🤖
--- a/atropos/Dockerfile
+++ b/atropos/Dockerfile
@@ -1,41 +0,0 @@
-# Dockerfile for atropos-agent sandbox server
-# Runs inside Nomad containers to handle tool execution
-# Includes bubblewrap for namespace-based slot isolation
-
-FROM python:3.11-slim
-
-# Install system dependencies
-RUN apt-get update && apt-get install -y --no-install-recommends \
-    # Bubblewrap for namespace isolation
-    bubblewrap \
-    # `script` for PTY allocation (used for stable tmux+asciinema startup)
-    util-linux \
-    # Git for SWE-style tasks (cloning repos)
-    git \
-    # tmux for stateful terminal sessions (Phase 4.7+)
-    tmux \
-    # Common tools agents might need
-    curl \
-    wget \
-    jq \
-    # Cleanup
-    && rm -rf /var/lib/apt/lists/*
-
-# Install Python dependencies (sandbox server + optional terminal recording)
-RUN pip install --no-cache-dir aiohttp asciinema
-
-# Copy the sandbox server
-COPY sandbox_server.py /app/sandbox_server.py
-
-WORKDIR /app
-
-# Create data directory for slot workspaces
-RUN mkdir -p /data
-
-# Verify bubblewrap is installed and working
-RUN bwrap --version
-
-EXPOSE 8080
-
-# Default command - can be overridden by Nomad job spec
-CMD ["python", "sandbox_server.py", "--port", "8080", "--slots", "10", "--data-dir", "/data"]
--- a/atropos/init.py
+++ b/atropos/init.py
@@ -1,46 +0,0 @@
-"""
-Atropos integration for Hermes-Agent.
-
-This package is intentionally optional: Hermes-Agent should work without Atropos.
-If you import anything from `atropos.*` without having `atroposlib` installed,
-we raise a clear error with install instructions.
-
-Install (recommended, from repo checkout):
-  uv sync --extra atropos
-
-Or (pip / editable):
-  pip install -e '.[atropos]'
-"""
-
-from __future__ import annotations
-
-
-def _require_atroposlib() -> None:
-    try:
-        import atroposlib  # noqa: F401
-    except ModuleNotFoundError as exc:  # pragma: no cover
-        raise ModuleNotFoundError(
-            "Hermes-Agent Atropos integration requires `atroposlib`, but it is not installed.\n"
-            "Install it with:\n"
-            "  uv sync --extra atropos\n"
-            "or:\n"
-            "  pip install -e '.[atropos]'\n"
-        ) from exc
-
-
-_require_atroposlib()
-
-# Re-export the most commonly used pieces for convenience.
-from .agent import AgentConfig, AgentResult, AgentStep, AtroposAgent, SequenceData  # noqa: E402
-from .envs import AgentEnv, AgentEnvConfig  # noqa: E402
-
-__all__ = [
-    "AtroposAgent",
-    "AgentConfig",
-    "AgentResult",
-    "AgentStep",
-    "SequenceData",
-    "AgentEnv",
-    "AgentEnvConfig",
-]
-
--- a/atropos/agent/init.py
+++ b/atropos/agent/init.py
@@ -1,15 +0,0 @@
-"""
-Agent abstractions for atropos-agent.
-
-Provides the core AtroposAgent class for running ReACT-style agent loops.
-"""
-
-from .atropos_agent import AgentConfig, AgentResult, AgentStep, AtroposAgent, SequenceData
-
-__all__ = [
-    "AtroposAgent",
-    "AgentConfig",
-    "AgentResult",
-    "AgentStep",
-    "SequenceData",
-]
--- a/atropos/agent/atropos_agent.py
+++ b/atropos/agent/atropos_agent.py
@@ -1,850 +0,0 @@
-"""
-ReACT-style agent implementation for atropos-agent.
-
-This module provides the core AtroposAgent class that implements a basic
-Reason-Act-Observe loop with tool calling capabilities.
-
-Uses ManagedServer from atroposlib for automatic token/logprob tracking,
-making trajectories ready for RL training.
-
-The agent uses Hermes-style XML tags for tool calls:
- <think>...</think> for reasoning
- <tool_call>{"name": "...", "arguments": {...}}</tool_call> for actions
- <tool_response>...</tool_response> for observations
-"""
-
-import asyncio
-import os
-import json
-import time
-from contextlib import asynccontextmanager
-from dataclasses import dataclass, field
-from uuid import uuid4
-from typing import Any, AsyncGenerator, Awaitable, Callable, Dict, List, Optional, Union
-
-from dotenv import load_dotenv
-import httpx
-
-from ..tools import ToolCall, ToolRegistry, ToolResult
-from atroposlib.envs.server_handling.managed_server import ManagedServer
-
-load_dotenv()
-
-
-# Default system prompt with tool calling instructions.
-AGENT_SYSTEM_PROMPT = """You are a deep thinking AI. You MUST enclose your internal reasoning inside <think>...</think> tags.
-
-You are a function calling AI model.
-
-You are provided with function signatures within <tools></tools> XML tags.
-You must call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions.
-You can ONLY respond without a tool call if you are totally certain you have the final answer to the user's question or task
-After calling & executing a function, you will be provided with function results within <tool_response></tool_response> XML tags.
-
-Here are the available tools:
-<tools>
-{tools_json}
-</tools>
-
-Use the following JSON schema for each tool call you will make:
-{"title": "FunctionCall", "type": "object", "properties": {"name": {"title": "Name", "type": "string"}, "arguments": {"title": "Arguments", "type": "object"}}, "required": ["name", "arguments"]}
-
-## REQUIRED TOOL FORMAT
-
-When you decide to call a tool, your assistant message MUST be:
-1) exactly one <think>...</think> block, followed by
-2) one or more <tool_call>...</tool_call> blocks,
-and NOTHING else in that message.
-
-If you need to explain anything, put it inside <think>. Do NOT write natural language outside <think> or <tool_call>.
-
-For each function call return a JSON object with function name and arguments within <tool_call></tool_call> XML tags as follows:
-<tool_call>
-{"name": "<function-name>", "arguments": {"arg1": "value1"}}
-</tool_call>
-
-Each <tool_call> must be on its own and contain ONLY the JSON object (no extra text).
-The JSON inside <tool_call> MUST be valid JSON with double quotes.
-
-Do NOT output <tool_response> in an assistant message.
-
-After you receive tool results, you may either call more tools (same required format) or provide the final answer.
-When providing the final answer, do NOT include any <tool_call> blocks.
-
-## TERMINAL TOOL NOTES
-
- Commands execute under POSIX `/bin/sh` (not bash).
- Each tool call runs in a fresh shell: environment changes (like `cd` or venv activation) do not persist across tool calls.
- Avoid bash-only features like `source`, `[[ ... ]]`, or process substitution.
- Prefer explicit venv usage:
-  - `python -m venv .venv && . .venv/bin/activate && python -m pip install -e .` (POSIX `.` activation), or
-  - `.venv/bin/python -m pip install -e .` (no activation required).
-
-## ICL (examples)
-
-User: Show the current directory.
-Assistant:
-<think>I should run pwd.</think>
-<tool_call>
-{"name": "terminal", "arguments": {"command": "pwd"}}
-</tool_call>
-User: <tool_response>{"success": true, "output": "/tmp\\n"}</tool_response>
-Assistant: /tmp
-
-User: List files, then count them.
-Assistant:
-<think>I should count files.</think>
-<tool_call>
-{"name": "terminal", "arguments": {"command": "ls -1 | wc -l"}}
-</tool_call>
-User: <tool_response>{"success": true, "output": "3\\n"}</tool_response>
-Assistant: 3
-
-User: Run pwd, then print ok (two tool calls).
-Assistant:
-<think>I should run two commands.</think>
-<tool_call>
-{"name": "terminal", "arguments": {"command": "pwd"}}
-</tool_call>
-<tool_call>
-{"name": "terminal", "arguments": {"command": "echo ok"}}
-</tool_call>
-User: <tool_response>{"success": true, "output": "/tmp\\n"}</tool_response>
-User: <tool_response>{"success": true, "output": "ok\\n"}</tool_response>
-Assistant: ok
-"""
-
-
-@dataclass
-class AgentConfig:
-    """Configuration for the AtroposAgent."""
-    
-    # Generation parameters
-    temperature: Optional[float] = 0.7
-    # Default to "let the backend decide" (important for tool-tag completions that may be longer).
-    max_tokens: Optional[int] = None
-    
-    # Agent behavior
-    max_steps: int = 50
-    system_prompt: Optional[str] = None
-    tool_delay_s: float = 0.0
-    
-    # Working directory for tools
-    working_dir: Optional[str] = None
-
-
-@dataclass
-class SequenceData:
-    """Token/logprob data from a single completion."""
-    
-    full_text: str
-    tokens: List[int]
-    masked_tokens: List[int]  # -100 for prompt, actual IDs for completion
-    logprobs: List[float]  # 1.0 for prompt, actual values for completion
-    metadata: Optional[Dict[str, Any]] = None
-    
-    @classmethod
-    def from_sequence_node(cls, node) -> "SequenceData":
-        """Create from a ManagedServer SequenceNode."""
-        return cls(
-            full_text=node.full_text,
-            tokens=node.tokens,
-            masked_tokens=node.masked_tokens,
-            logprobs=node.logprobs,
-            metadata=getattr(node, "metadata", None),
-        )
-
-
-@dataclass
-class AgentStep:
-    """A single step in the agent's trajectory."""
-    
-    step_number: int
-    assistant_message: str
-    tool_calls: List[ToolCall] = field(default_factory=list)
-    tool_results: List[ToolResult] = field(default_factory=list)
-    sequence_data: Optional[SequenceData] = None  # Token data from this step
-    
-    @property
-    def has_tool_calls(self) -> bool:
-        return len(self.tool_calls) > 0
-
-
-@dataclass
-class AgentResult:
-    """Result of running an agent trajectory."""
-    
-    success: bool
-    final_response: str
-    steps: List[AgentStep] = field(default_factory=list)
-    total_tokens: int = 0
-    error: Optional[str] = None
-    metadata: Dict[str, Any] = field(default_factory=dict)
-    
-    # Full trajectory token data for RL training
-    trajectory_data: Optional[SequenceData] = None
-    
-    @property
-    def num_steps(self) -> int:
-        return len(self.steps)
-    
-    @property
-    def total_tool_calls(self) -> int:
-        return sum(len(step.tool_calls) for step in self.steps)
-    
-    def to_messages(self) -> List[Dict[str, str]]:
-        """Convert trajectory to messages format for logging."""
-        messages = []
-        for step in self.steps:
-            messages.append({"role": "assistant", "content": step.assistant_message})
-            if step.tool_results:
-                # Combine all tool responses
-                responses = "\n".join(r.to_xml() for r in step.tool_results)
-                messages.append({"role": "user", "content": responses})
-        return messages
-    
-    def to_scored_data(self, score: float) -> Optional[Dict[str, Any]]:
-        """
-        Convert to format suitable for ScoredDataGroup.
-        
-        Args:
-            score: The score for this trajectory
-            
-        Returns:
-            Dict with tokens, masks, scores suitable for training, or None if no data
-        """
-        if self.trajectory_data is None:
-            return None
-        
-        return {
-            "tokens": self.trajectory_data.tokens,
-            "masks": self.trajectory_data.masked_tokens,
-            "scores": score,
-            "logprobs": self.trajectory_data.logprobs,
-        }
-
-
-class AtroposAgent:
-    """
-    A ReACT-style agent that uses LLMs with tool calling.
-    
-    This implementation wraps ManagedServer for automatic token/logprob tracking,
-    making trajectories ready for RL training.
-    
-    Example:
-        # `server` may be an Atropos `ServerManager` (recommended) or a single `APIServer`.
-        # In practice, environments usually construct this via `BaseEnv`.
-        server = ...
-        tools = ToolRegistry()
-        tools.register(BashTool())
-        
-        agent = AtroposAgent(server=server, tools=tools)
-        result = await agent.run("List the files in the current directory")
-        
-        # Access token data for training
-        if result.trajectory_data:
-            print(f"Tokens: {result.trajectory_data.tokens}")
-            print(f"Masked: {result.trajectory_data.masked_tokens}")
-    """
-    
-    def __init__(
-        self,
-        server,  # ServerManager or APIServer
-        tools: Optional[ToolRegistry] = None,
-        config: Optional[AgentConfig] = None,
-        tokenizer: Optional[Any] = None,
-        execute_tool: Optional[Callable[[ToolCall], Awaitable[ToolResult]]] = None,
-    ):
-        self.server = server
-        self.tools = tools or ToolRegistry()
-        self.config = config or AgentConfig()
-        self.tokenizer = tokenizer or getattr(server, "tokenizer", None)
-        self.execute_tool = execute_tool or self.tools.execute
-
-    @asynccontextmanager
-    async def _managed(self) -> AsyncGenerator[Any, None]:
-        """
-        Yield a ManagedServer-like object.
-
-        - If `self.server` is a ServerManager, use its `managed_server()` context manager.
-        - If `self.server` is a single APIServer, wrap it in `ManagedServer` directly.
-        """
-        if os.getenv("ATROPOS_BYPASS_MANAGED_SERVER") == "1":
-            yield _DirectChatCompletionClient(server=self.server)
-            return
-        if hasattr(self.server, "managed_server"):
-            async with self.server.managed_server(tokenizer=self.tokenizer) as managed:
-                yield managed
-        else:
-            managed = ManagedServer(server=self.server, tokenizer=self.tokenizer)
-            try:
-                yield managed
-            finally:
-                managed.reset()
-    
-    def _build_system_prompt(self) -> str:
-        """Build the system prompt with tool descriptions."""
-        if self.config.system_prompt:
-            return self.config.system_prompt
-
-        tools_json = self.tools.get_prompt_tool_definitions_json()
-        # Avoid `str.format()` here because the prompt contains many literal `{}` braces
-        # in JSON examples; we only want to substitute the single `{tools_json}` token.
-        return AGENT_SYSTEM_PROMPT.replace("{tools_json}", tools_json)
-
-    def _infer_server_model_for_debug(self) -> Optional[str]:
-        """
-        Best-effort inference of the configured model name for debug payload saving.
-
-        ManagedServer/server_manager typically injects `model` internally, so `chat_kwargs`
-        may not contain it. For replaying saved payloads via curl, it's useful to persist it.
-        """
-        servers = getattr(self.server, "servers", None)
-        if isinstance(servers, list) and servers:
-            s0 = servers[0]
-            cfg = getattr(s0, "config", None)
-            model = getattr(cfg, "model_name", None) or getattr(s0, "model_name", None)
-            if isinstance(model, str) and model:
-                return model
-        model = getattr(self.server, "model_name", None) or getattr(self.server, "model", None)
-        if isinstance(model, str) and model:
-            return model
-        return None
-
-    def _infer_server_base_url_for_debug(self) -> Optional[str]:
-        """
-        Best-effort inference of the configured base_url for debug logging.
-
-        This is helpful when diagnosing hangs / retries at the transport layer.
-        """
-        servers = getattr(self.server, "servers", None)
-        if isinstance(servers, list) and servers:
-            s0 = servers[0]
-            cfg = getattr(s0, "config", None)
-            base_url = getattr(cfg, "base_url", None) or getattr(s0, "base_url", None)
-            if isinstance(base_url, str) and base_url:
-                return base_url
-        base_url = getattr(self.server, "base_url", None)
-        if isinstance(base_url, str) and base_url:
-            return base_url
-        return None
-
-    def _extract_response_metadata(self, response: Any) -> Dict[str, Any]:
-        """
-        Extract lightweight, JSON-serializable metadata from an OpenAI-style response.
-
-        This is useful for debugging training runs, especially when ManagedServer state
-        tracking is unavailable (e.g. OpenAI-compatible chat endpoints).
-        """
-        meta: Dict[str, Any] = {}
-        try:
-            rid = getattr(response, "id", None)
-            if isinstance(rid, str) and rid:
-                meta["id"] = rid
-            model = getattr(response, "model", None)
-            if isinstance(model, str) and model:
-                meta["model"] = model
-            created = getattr(response, "created", None)
-            if isinstance(created, int):
-                meta["created"] = created
-            system_fingerprint = getattr(response, "system_fingerprint", None)
-            if isinstance(system_fingerprint, str) and system_fingerprint:
-                meta["system_fingerprint"] = system_fingerprint
-
-            choices = getattr(response, "choices", None)
-            if isinstance(choices, list) and choices:
-                fr = getattr(choices[0], "finish_reason", None)
-                if isinstance(fr, str) and fr:
-                    meta["finish_reason"] = fr
-
-            usage = getattr(response, "usage", None)
-            if usage is not None:
-                if hasattr(usage, "model_dump"):
-                    meta["usage"] = usage.model_dump()
-                elif isinstance(usage, dict):
-                    meta["usage"] = usage
-        except Exception:
-            pass
-        return meta
-
-    def _debug_dump_request(self, *, step_num: int, chat_kwargs: Dict[str, Any]) -> None:
-        if os.getenv("ATROPOS_DEBUG_AGENT_REQUEST") != "1":
-            return
-        try:
-            # Avoid dumping megabytes by default; messages can be huge.
-            meta = {
-                "step": step_num,
-                "base_url": self._infer_server_base_url_for_debug(),
-                "model": chat_kwargs.get("model") or self._infer_server_model_for_debug(),
-                "chat_kwargs_keys": sorted(list(chat_kwargs.keys())),
-                "n": chat_kwargs.get("n"),
-                "max_tokens": chat_kwargs.get("max_tokens"),
-                "temperature": chat_kwargs.get("temperature"),
-                "num_messages": len(chat_kwargs.get("messages") or []),
-            }
-            print("\n=== ATROPOS_DEBUG_AGENT_REQUEST ===", flush=True)
-            print(meta, flush=True)
-
-            if os.getenv("ATROPOS_DEBUG_AGENT_REQUEST_FULL") == "1":
-                payload = dict(chat_kwargs)
-                # Make the payload more legible and less huge.
-                try:
-                    dumped = json.dumps(payload, ensure_ascii=False, indent=2)
-                except Exception:
-                    dumped = repr(payload)
-                print("\n=== ATROPOS_DEBUG_AGENT_REQUEST_FULL ===", flush=True)
-                print(dumped[:200_000], flush=True)
-
-            # Optional: save the FULL request payload to disk (no truncation).
-            save_dir = os.getenv("ATROPOS_DEBUG_AGENT_REQUEST_SAVE_DIR")
-            if save_dir:
-                os.makedirs(save_dir, exist_ok=True)
-                payload: Dict[str, Any] = dict(chat_kwargs)
-                if "model" not in payload:
-                    model = self._infer_server_model_for_debug()
-                    if model:
-                        payload["model"] = model
-                # Use a unique filename so parallel trajectories don't clobber each other.
-                fname = os.path.join(
-                    save_dir,
-                    f"atropos_agent_request_step{step_num}_{int(time.time()*1000)}_{os.getpid()}_{uuid4().hex}.json",
-                )
-                with open(fname, "w", encoding="utf-8") as f:
-                    json.dump(payload, f, ensure_ascii=False, indent=2)
-                print(f"[AtroposAgent] saved request payload: {fname}", flush=True)
-        except Exception:
-            return
-
-    def _debug_dump_response(self, *, step_num: int, response: Any) -> None:
-        if os.getenv("ATROPOS_DEBUG_AGENT_RESPONSE") != "1":
-            return
-        print("\n=== ATROPOS_DEBUG_AGENT_RESPONSE ===", flush=True)
-        print({"step": step_num, "type": type(response).__name__}, flush=True)
-        try:
-            dumped = response.model_dump()  # openai pydantic model
-        except Exception:
-            dumped = getattr(response, "__dict__", {"repr": repr(response)})
-        # Keep the dump bounded; we only need enough to see the assistant message content.
-        text = str(dumped)
-        print(text[:200_000], flush=True)
-
-    async def _chat_completion_with_debug(
-        self, *, managed: Any, step_num: int, chat_kwargs: Dict[str, Any]
-    ) -> Any:
-        """
-        Call `managed.chat_completion()` with optional timeout + richer failure logging.
-
-        Debug env vars:
-        - `ATROPOS_AGENT_CHAT_TIMEOUT_S`: if set, wraps the await in `asyncio.wait_for`.
-        - `ATROPOS_DEBUG_AGENT_WAIT_EVERY_S`: if set, prints a heartbeat while waiting.
-        """
-        # Hard guardrail: never allow a single chat completion to block for too long.
-        # This is essential for RL data-gen stability; long hangs should be treated as failures (score=0).
-        timeout_s_raw = os.getenv("ATROPOS_AGENT_CHAT_TIMEOUT_S")
-        timeout_s_default = 240.0
-        timeout_s = float(timeout_s_raw) if timeout_s_raw else timeout_s_default
-        timeout_s = min(timeout_s, 240.0)
-
-        wait_every_raw = os.getenv("ATROPOS_DEBUG_AGENT_WAIT_EVERY_S")
-        wait_every_s = float(wait_every_raw) if wait_every_raw else None
-
-        async def _await_call() -> Any:
-            if not wait_every_s or wait_every_s <= 0:
-                return await managed.chat_completion(**chat_kwargs)
-
-            # Heartbeat mode: wait in chunks without cancelling the underlying request.
-            # NOTE: do NOT use `asyncio.wait_for(task, timeout=...)` here, because a timeout
-            # will cancel the task and surface as `CancelledError` on the next loop.
-            task = asyncio.create_task(managed.chat_completion(**chat_kwargs))
-            t0 = time.perf_counter()
-            try:
-                while True:
-                    done, _pending = await asyncio.wait({task}, timeout=wait_every_s)
-                    if task in done:
-                        return task.result()
-
-                    waited = time.perf_counter() - t0
-                    print(
-                        f"[AtroposAgent] step={step_num} still waiting for chat_completion... ({waited:.1f}s)",
-                        flush=True,
-                    )
-            except asyncio.CancelledError:
-                task.cancel()
-                raise
-
-        try:
-            return await asyncio.wait_for(_await_call(), timeout=timeout_s)
-        except asyncio.TimeoutError as e:
-            print("\n=== ATROPOS_DEBUG_AGENT_CHAT_TIMEOUT ===", flush=True)
-            print({"step": step_num, "timeout_s": timeout_s}, flush=True)
-            raise RuntimeError(f"chat_completion timed out after {timeout_s:.1f}s") from e
-        except asyncio.CancelledError:
-            # Treat cancellation as a hard failure rather than crashing the whole env run.
-            # (Atropos/BaseEnv may cancel tasks during shutdown or retries.)
-            raise RuntimeError("chat_completion cancelled") from None
-        except Exception as e:
-            detail: Dict[str, Any] = {
-                "step": step_num,
-                "exc_type": type(e).__name__,
-                "exc_str": str(e),
-            }
-            if isinstance(e, httpx.HTTPStatusError):
-                try:
-                    detail["status_code"] = e.response.status_code
-                    detail["response_text"] = e.response.text[:20_000]
-                except Exception:
-                    pass
-            elif isinstance(e, httpx.RequestError):
-                detail["request"] = repr(getattr(e, "request", None))
-
-            print("\n=== ATROPOS_DEBUG_AGENT_CHAT_FAILURE ===", flush=True)
-            print(detail, flush=True)
-            raise
-
-    async def run(
-        self,
-        task: str,
-        initial_messages: Optional[List[Dict[str, str]]] = None,
-    ) -> AgentResult:
-        """
-        Run the agent on a task using ManagedServer for token tracking.
-        
-        Args:
-            task: The task/prompt for the agent
-            initial_messages: Optional additional context messages
-            
-        Returns:
-            AgentResult with the trajectory, final response, and token data
-        """
-        messages = [
-            {"role": "system", "content": self._build_system_prompt()},
-        ]
-        
-        if initial_messages:
-            messages.extend(initial_messages)
-        
-        messages.append({"role": "user", "content": task})
-        
-        steps = []
-        final_response = ""
-        final_node = None
-        final_prompt_messages: Optional[List[Dict[str, str]]] = None
-        last_node = None
-        last_prompt_messages: Optional[List[Dict[str, str]]] = None
-        last_response_text: str = ""
-        
-        # Use ManagedServer for automatic token tracking
-        async with self._managed() as managed:
-            for step_num in range(self.config.max_steps):
-                # ReACT loop iteration here, just call -> tools -> observe until done (no tools called)
-                try:
-                    # Keep a copy of the prompt messages used for this completion.
-                    # Useful for reconstructing tokens/masks when state tracking is unavailable.
-                    prompt_messages = list(messages)
-                    chat_kwargs: Dict[str, Any] = {"messages": messages, "n": 1}
-                    if self.config.max_tokens is not None:
-                        chat_kwargs["max_tokens"] = self.config.max_tokens
-                    if self.config.temperature is not None:
-                        chat_kwargs["temperature"] = self.config.temperature
-
-                    t_req = time.perf_counter()
-                    print(
-                        f"[AtroposAgent] step={step_num+1} chat_completion start "
-                        f"(messages={len(messages)}, max_tokens={self.config.max_tokens}, temp={self.config.temperature})",
-                        flush=True,
-                    )
-                    self._debug_dump_request(step_num=step_num + 1, chat_kwargs=chat_kwargs)
-                    response = await self._chat_completion_with_debug(
-                        managed=managed, step_num=step_num + 1, chat_kwargs=chat_kwargs
-                    )
-                    self._debug_dump_response(step_num=step_num + 1, response=response)
-                    response_meta = self._extract_response_metadata(response)
-                    print(
-                        f"[AtroposAgent] step={step_num+1} chat_completion done in {time.perf_counter() - t_req:.2f}s",
-                        flush=True,
-                    )
-                    
-                    current_node = None
-                    if hasattr(managed, "get_state"):
-                        state = managed.get_state()
-                        nodes = state.get("nodes", [])
-                        current_node = nodes[-1] if nodes else None
-                    
-                except Exception as e:
-                    return AgentResult(
-                        success=False,
-                        final_response="",
-                        steps=steps,
-                        error=f"Generation error: {str(e)}",
-                    )
-                
-                msg = response.choices[0].message
-                # Some OpenAI-compatible servers populate `message.reasoning` and leave `content=""`.
-                response_text = (msg.content or "") or (getattr(msg, "reasoning", None) or "")
-                tool_calls = ToolCall.parse_from_text(response_text)
-                last_node = current_node
-                last_prompt_messages = prompt_messages
-                last_response_text = response_text
-
-                step_sequence_data = SequenceData.from_sequence_node(current_node) if current_node else None
-                if step_sequence_data is None:
-                    if response_meta:
-                        # We still want metadata for debugging even if token/logprob state tracking is unavailable.
-                        step_sequence_data = SequenceData(
-                            full_text=response_text,
-                            tokens=[],
-                            masked_tokens=[],
-                            logprobs=[],
-                            metadata=response_meta,
-                        )
-                else:
-                    merged = dict(response_meta)
-                    node_meta = step_sequence_data.metadata
-                    if isinstance(node_meta, dict):
-                        merged.update(node_meta)
-                    step_sequence_data.metadata = merged or step_sequence_data.metadata
-                
-                step = AgentStep(
-                    step_number=step_num + 1,
-                    assistant_message=response_text,
-                    tool_calls=tool_calls,
-                    sequence_data=step_sequence_data,
-                )
-                
-                if not tool_calls:
-                    steps.append(step)
-                    final_response = response_text
-                    final_node = current_node
-                    final_prompt_messages = prompt_messages
-                    break
-                
-                messages.append({"role": "assistant", "content": response_text})
-                
-                tool_responses = []
-                for call in tool_calls:
-                    result = await self.execute_tool(call)
-                    step.tool_results.append(result)
-                    tool_responses.append(result.to_xml())
-                    if self.config.tool_delay_s > 0:
-                        await asyncio.sleep(self.config.tool_delay_s)
-                
-                steps.append(step)
-            
-                responses_text = "\n".join(tool_responses)
-                # Tool observations are represented as user content with Hermes-style tags.
-                # This is compatible with most OpenAI-compatible chat APIs and ensures
-                # tokenizers/chat templates include tool outputs during training.
-                messages.append({"role": "user", "content": responses_text})
-            
-            else:
-                # Reached max steps without completing
-                # Return a failure result but include the last observed completion so callers can
-                # record the trajectory (score=0) without triggering retries.
-                final_response = last_response_text or final_response
-                final_node = last_node
-                final_prompt_messages = last_prompt_messages
-                trajectory_data = None
-                if final_node:
-                    trajectory_data = SequenceData.from_sequence_node(final_node)
-                elif final_prompt_messages is not None and self.tokenizer is not None:
-                    if hasattr(self.tokenizer, "apply_chat_template"):
-                        prompt_text = self.tokenizer.apply_chat_template(
-                            final_prompt_messages, tokenize=False, add_generation_prompt=True
-                        )
-                        prompt_tokens = self.tokenizer.encode(prompt_text, add_special_tokens=False)
-                    else:
-                        prompt_text = "\n".join([f"{m['role']}: {m['content']}" for m in final_prompt_messages])
-                        prompt_tokens = self.tokenizer.encode(prompt_text, add_special_tokens=True)
-                    output_tokens = self.tokenizer.encode(final_response, add_special_tokens=False)
-                    tokens = prompt_tokens + output_tokens
-                    masked_tokens = ([-100] * len(prompt_tokens)) + output_tokens
-                    logprobs = ([1.0] * len(prompt_tokens)) + ([0.0] * len(output_tokens))
-                    trajectory_data = SequenceData(
-                        full_text=f"{prompt_text}{final_response}",
-                        tokens=tokens,
-                        masked_tokens=masked_tokens,
-                        logprobs=logprobs,
-                    )
-                # Preserve response metadata (if any) even on failure trajectories.
-                try:
-                    if trajectory_data is not None and steps:
-                        last_step = steps[-1]
-                        if last_step.sequence_data and isinstance(last_step.sequence_data.metadata, dict):
-                            trajectory_data.metadata = dict(last_step.sequence_data.metadata)
-                except Exception:
-                    pass
-                return AgentResult(
-                    success=False,
-                    final_response=final_response,
-                    steps=steps,
-                    error=f"Reached maximum steps ({self.config.max_steps})",
-                    trajectory_data=trajectory_data,
-                )
-        
-        # Build result with trajectory data
-        trajectory_data = None
-        if final_node:
-            trajectory_data = SequenceData.from_sequence_node(final_node)
-        elif final_prompt_messages is not None and self.tokenizer is not None:
-            if hasattr(self.tokenizer, "apply_chat_template"):
-                prompt_text = self.tokenizer.apply_chat_template(
-                    final_prompt_messages, tokenize=False, add_generation_prompt=True
-                )
-                prompt_tokens = self.tokenizer.encode(prompt_text, add_special_tokens=False)
-            else:
-                prompt_text = "\n".join([f"{m['role']}: {m['content']}" for m in final_prompt_messages])
-                prompt_tokens = self.tokenizer.encode(prompt_text, add_special_tokens=True)
-            output_tokens = self.tokenizer.encode(final_response, add_special_tokens=False)
-            tokens = prompt_tokens + output_tokens
-            masked_tokens = ([-100] * len(prompt_tokens)) + output_tokens
-            logprobs = ([1.0] * len(prompt_tokens)) + ([0.0] * len(output_tokens))
-            trajectory_data = SequenceData(
-                full_text=f"{prompt_text}{final_response}",
-                tokens=tokens,
-                masked_tokens=masked_tokens,
-                logprobs=logprobs,
-            )
-
-        # Ensure trajectory_data carries the most recent metadata we observed (if any).
-        try:
-            if trajectory_data is not None and steps:
-                last_step = steps[-1]
-                if last_step.sequence_data and isinstance(last_step.sequence_data.metadata, dict):
-                    trajectory_data.metadata = dict(last_step.sequence_data.metadata)
-        except Exception:
-            pass
-        
-        return AgentResult(
-            success=True,
-            final_response=final_response,
-            steps=steps,
-            trajectory_data=trajectory_data,
-        )
-    
-    async def run_single_turn(
-        self,
-        messages: List[Dict[str, str]],
-        execute_tools: bool = True,
-    ) -> tuple[str, List[ToolResult], Optional[SequenceData]]:
-        """
-        Run a single turn of the agent (one LLM call + tool execution).
-        
-        This is useful for integration with BaseEnv where you want more
-        control over the loop.
-        
-        Args:
-            messages: The conversation history
-            execute_tools: Whether to execute parsed tool calls
-            
-        Returns:
-            Tuple of (response_text, tool_results, sequence_data)
-        """
-        async with self._managed() as managed:
-            chat_kwargs: Dict[str, Any] = {"messages": messages, "n": 1}
-            if self.config.max_tokens is not None:
-                chat_kwargs["max_tokens"] = self.config.max_tokens
-            if self.config.temperature is not None:
-                chat_kwargs["temperature"] = self.config.temperature
-
-            self._debug_dump_request(step_num=1, chat_kwargs=chat_kwargs)
-            response = await self._chat_completion_with_debug(managed=managed, step_num=1, chat_kwargs=chat_kwargs)
-            self._debug_dump_response(step_num=1, response=response)
-            
-            current_node = None
-            if hasattr(managed, "get_state"):
-                state = managed.get_state()
-                nodes = state.get("nodes", [])
-                current_node = nodes[-1] if nodes else None
-        
-        msg = response.choices[0].message
-        response_text = (msg.content or "") or (getattr(msg, "reasoning", None) or "")
-        tool_results = []
-        
-        if execute_tools:
-            tool_calls = ToolCall.parse_from_text(response_text)
-            for call in tool_calls:
-                result = await self.execute_tool(call)
-                tool_results.append(result)
-        
-        sequence_data = SequenceData.from_sequence_node(current_node) if current_node else None
-        
-        return response_text, tool_results, sequence_data
-
-
-class _DirectChatCompletionClient:
-    """
-    Minimal stand-in for ManagedServer that calls the OpenAI-compatible endpoint directly.
-
-    This is for isolating issues where `ManagedServer.chat_completion()` hangs or misbehaves.
-    It intentionally does NOT do token/logprob tracking.
-    """
-
-    def __init__(self, server: Any):
-        self._server = server
-
-    def _server_config(self) -> tuple[str, str, str]:
-        # ServerManager case: first configured server.
-        servers = getattr(self._server, "servers", None)
-        if isinstance(servers, list) and servers:
-            s0 = servers[0]
-            cfg = getattr(s0, "config", None)
-            base_url = getattr(cfg, "base_url", None) or getattr(s0, "base_url", None)
-            api_key = getattr(cfg, "api_key", None) or getattr(s0, "api_key", None)
-            model = getattr(cfg, "model_name", None) or getattr(s0, "model_name", None)
-            if isinstance(base_url, str) and isinstance(api_key, str) and isinstance(model, str):
-                return base_url.rstrip("/"), api_key, model
-
-        # APIServer-like fallback.
-        base_url = getattr(self._server, "base_url", None)
-        api_key = getattr(self._server, "api_key", None)
-        model = getattr(self._server, "model_name", None) or getattr(self._server, "model", None)
-        if isinstance(base_url, str) and isinstance(api_key, str) and isinstance(model, str):
-            return base_url.rstrip("/"), api_key, model
-
-        raise RuntimeError("Unable to resolve server base_url/api_key/model for direct chat completion")
-
-    async def chat_completion(self, *, messages: List[Dict[str, str]], n: int = 1, **kwargs: Any) -> Any:
-        base_url, api_key, model = self._server_config()
-        url = f"{base_url}/chat/completions"
-
-        payload: Dict[str, Any] = {
-            "model": model,
-            "messages": messages,
-            "n": n,
-        }
-        # Pass through common generation kwargs.
-        for k in ("max_tokens", "temperature", "top_p", "presence_penalty", "frequency_penalty", "stop"):
-            if k in kwargs and kwargs[k] is not None:
-                payload[k] = kwargs[k]
-
-        timeout_s = float(os.getenv("ATROPOS_DIRECT_REQUEST_TIMEOUT_S") or "120")
-        print(f"[AtroposAgent] DIRECT chat_completion POST {url} (timeout={timeout_s}s)", flush=True)
-        async with httpx.AsyncClient(timeout=timeout_s) as client:
-            resp = await client.post(
-                url,
-                headers={"Authorization": f"Bearer {api_key}", "Content-Type": "application/json"},
-                json=payload,
-            )
-            resp.raise_for_status()
-            data = resp.json()
-
-        # Return a very small object compatible with the code paths that read
-        # `response.choices[0].message.content`.
-        class _Msg:
-            def __init__(self, d: Dict[str, Any]):
-                self.content = d.get("content")
-                self.reasoning = d.get("reasoning")
-
-        class _Choice:
-            def __init__(self, d: Dict[str, Any]):
-                self.message = _Msg(d.get("message") or {})
-
-        class _Resp:
-            def __init__(self, d: Dict[str, Any]):
-                self._d = d
-                self.choices = [_Choice(c) for c in (d.get("choices") or [])]
-
-            def model_dump(self) -> Dict[str, Any]:
-                return self._d
-
-        return _Resp(data)
--- a/atropos/api/init.py
+++ b/atropos/api/init.py
@@ -1,6 +0,0 @@
-"""
-FastAPI services for atropos-agent.
-
- tool_executor_server: queued/batched sandbox tool execution (Phase 4)
-"""
-
--- a/atropos/api/tool_executor_server.py
+++ b/atropos/api/tool_executor_server.py
@@ -1,254 +0,0 @@
-"""
-Tool Executor API (Phase 4)
-
-This service provides a queued, batched execution layer on top of a ToolBackend.
-It mirrors the stateful FastAPI + app.state pattern used in:
-  atropos/atroposlib/api/server.py
-
-Run (dev):
-  uv run uvicorn atropos_agent.api.tool_executor_server:app --host 0.0.0.0 --port 9001
-"""
-
-from __future__ import annotations
-
-import os
-from typing import Any, Dict, Optional
-from pathlib import Path
-
-from fastapi import FastAPI, Header, HTTPException, status
-from pydantic import BaseModel, Field
-
-from ..backends.nomad_backend import NomadBackendConfig, NomadToolBackend
-from ..tools import ToolRegistry, build_tool_registry
-from ..tools.base import (
-    ArtifactArchiveRequestPayload,
-    ArtifactArchiveResponsePayload,
-    ArtifactListRequestPayload,
-    ArtifactListResponsePayload,
-    ArtifactReadRequestPayload,
-    ArtifactReadResponsePayload,
-    ToolExecutorExecuteRequest,
-    ToolExecutorReleaseRequest,
-    ToolResultPayload,
-)
-from ..tools.tool_executor import ToolExecutor, ToolExecutorConfig
-
-
-class ToolExecutorServerConfig(BaseModel):
-    nomad_address: str = Field(default="http://localhost:4646")
-    job_id: str = Field(default="atropos-sandbox-tool-executor")
-    image: str = Field(default="atropos-sandbox:local")
-    slots_per_container: int = Field(default=10)
-    min_containers: int = Field(default=1)
-    max_containers: int = Field(default=10)
-    privileged: bool = Field(default=False)
-    acquire_timeout_s: float = Field(default=30.0)
-
-    batch_window_ms: int = Field(default=20)
-    max_batch_size: int = Field(default=200)
-    allow_network: bool = Field(default=True)
-
-    tool_server_url: Optional[str] = Field(default=None)
-    tool_server_token: Optional[str] = Field(default=None)
-
-    token: Optional[str] = Field(default=None, description="Bearer token required for requests (optional in dev).")
-
-    purge_job_on_shutdown: bool = Field(default=True)
-
-    @classmethod
-    def from_env(cls) -> "ToolExecutorServerConfig":
-        # In dev, prefer loading secrets/config from the repo-local `.env` (not committed).
-        try:
-            from dotenv import load_dotenv  # type: ignore
-        except Exception:  # pragma: no cover
-            load_dotenv = None  # type: ignore[assignment]
-        if load_dotenv is not None:
-            env_path = Path(__file__).resolve().parents[2] / ".env"
-            if env_path.exists():
-                load_dotenv(dotenv_path=env_path)
-
-        def _get_bool(name: str, default: bool) -> bool:
-            raw = os.getenv(name)
-            if raw is None:
-                return default
-            return raw.strip().lower() in {"1", "true", "yes", "y", "on"}
-
-        return cls(
-            nomad_address=os.getenv("TOOL_EXECUTOR_NOMAD_ADDRESS", "http://localhost:4646"),
-            job_id=os.getenv("TOOL_EXECUTOR_JOB_ID", "atropos-sandbox-tool-executor"),
-            image=os.getenv("TOOL_EXECUTOR_IMAGE", "atropos-sandbox:local"),
-            slots_per_container=int(os.getenv("TOOL_EXECUTOR_SLOTS", "10")),
-            min_containers=int(os.getenv("TOOL_EXECUTOR_MIN_CONTAINERS", "1")),
-            max_containers=int(os.getenv("TOOL_EXECUTOR_MAX_CONTAINERS", "10")),
-            privileged=_get_bool("TOOL_EXECUTOR_PRIVILEGED", False),
-            acquire_timeout_s=float(os.getenv("TOOL_EXECUTOR_ACQUIRE_TIMEOUT_S", "30.0")),
-            batch_window_ms=int(os.getenv("TOOL_EXECUTOR_BATCH_WINDOW_MS", "20")),
-            max_batch_size=int(os.getenv("TOOL_EXECUTOR_MAX_BATCH_SIZE", "200")),
-            allow_network=_get_bool("TOOL_EXECUTOR_ALLOW_NETWORK", True),
-            tool_server_url=os.getenv("TOOL_EXECUTOR_TOOL_SERVER_URL") or None,
-            tool_server_token=os.getenv("TOOL_EXECUTOR_TOOL_SERVER_TOKEN") or None,
-            token=os.getenv("TOOL_EXECUTOR_TOKEN") or None,
-            purge_job_on_shutdown=_get_bool("TOOL_EXECUTOR_PURGE_JOB_ON_SHUTDOWN", True),
-        )
-
-
-app = FastAPI(title="Atropos-Agent Tool Executor")
-
-
-@app.get("/")
-async def root() -> Dict[str, str]:
-    return {"message": "Atropos-Agent Tool Executor"}
-
-
-def _check_auth(cfg: ToolExecutorServerConfig, authorization: Optional[str]) -> None:
-    if not cfg.token:
-        return
-    if not authorization:
-        raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Missing Authorization header")
-    if not authorization.lower().startswith("bearer "):
-        raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Invalid Authorization header")
-    token = authorization.split(" ", 1)[1].strip()
-    if token != cfg.token:
-        raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail="Invalid token")
-
-
-@app.on_event("startup")
-async def _startup() -> None:
-    cfg = ToolExecutorServerConfig.from_env()
-
-    # Default to Atropos "full" tool surface: sandbox + external (if tool_server_url provided).
-    tools: ToolRegistry = build_tool_registry(
-        enabled_toolsets=["full"],
-        disabled_toolsets=None,
-        tool_server_url=cfg.tool_server_url,
-    )
-
-    backend = NomadToolBackend(
-        NomadBackendConfig(
-            nomad_address=cfg.nomad_address,
-            sandbox_job_id=cfg.job_id,
-            sandbox_image=cfg.image,
-            slots_per_container=cfg.slots_per_container,
-            min_containers=cfg.min_containers,
-            max_containers=cfg.max_containers,
-            privileged=cfg.privileged,
-            acquire_timeout_s=cfg.acquire_timeout_s,
-            purge_job_on_start=False,
-        )
-    )
-    await backend.start()
-
-    executor = ToolExecutor(
-        backend=backend,
-        tools=tools,
-        config=ToolExecutorConfig(
-            batch_window_ms=cfg.batch_window_ms,
-            max_batch_size=cfg.max_batch_size,
-            allow_network=cfg.allow_network,
-            tool_server_url=cfg.tool_server_url,
-            tool_server_token=cfg.tool_server_token,
-        ),
-    )
-    await executor.start()
-
-    app.state.cfg = cfg
-    app.state.backend = backend
-    app.state.executor = executor
-
-
-@app.on_event("shutdown")
-async def _shutdown() -> None:
-    executor: Optional[ToolExecutor] = getattr(app.state, "executor", None)
-    backend: Optional[NomadToolBackend] = getattr(app.state, "backend", None)
-    cfg: Optional[ToolExecutorServerConfig] = getattr(app.state, "cfg", None)
-
-    if executor is not None:
-        await executor.close()
-
-    if backend is not None:
-        await backend.stop(purge=bool(cfg.purge_job_on_shutdown) if cfg else False)
-
-
-@app.get("/health")
-async def health() -> Dict[str, Any]:
-    return {"status": "ok"}
-
-
-@app.get("/status")
-async def status_endpoint() -> Dict[str, Any]:
-    executor: ToolExecutor = app.state.executor
-    backend: NomadToolBackend = app.state.backend
-
-    return {
-        "queue_size": executor.queue_size(),
-        "total_requests": executor.total_requests,
-        "total_errors": executor.total_errors,
-        "pool": backend.get_stats(),
-    }
-
-
-@app.post("/execute", response_model=ToolResultPayload)
-async def execute_tool(
-    req: ToolExecutorExecuteRequest,
-    authorization: Optional[str] = Header(default=None),
-    status_code: int = status.HTTP_200_OK,  # noqa: B008
-) -> ToolResultPayload:
-    cfg: ToolExecutorServerConfig = app.state.cfg
-    _check_auth(cfg, authorization)
-
-    executor: ToolExecutor = app.state.executor
-    result = await executor.execute(
-        trajectory_id=req.trajectory_id,
-        call=req.tool.to_tool_call(),
-        timeout_s=req.timeout_s,
-    )
-    return ToolResultPayload.from_tool_result(result)
-
-
-@app.post("/release")
-async def release_trajectory(
-    req: ToolExecutorReleaseRequest,
-    authorization: Optional[str] = Header(default=None),
-) -> Dict[str, Any]:
-    cfg: ToolExecutorServerConfig = app.state.cfg
-    _check_auth(cfg, authorization)
-
-    executor: ToolExecutor = app.state.executor
-    await executor.release_trajectory(req.trajectory_id, reset_workspace=req.reset_workspace)
-    return {"status": "ok"}
-
-
-@app.post("/artifacts/read", response_model=ArtifactReadResponsePayload)
-async def artifacts_read(
-    req: ArtifactReadRequestPayload,
-    authorization: Optional[str] = Header(default=None),
-) -> ArtifactReadResponsePayload:
-    cfg: ToolExecutorServerConfig = app.state.cfg
-    _check_auth(cfg, authorization)
-
-    executor: ToolExecutor = app.state.executor
-    return await executor.read_artifact(req)
-
-
-@app.post("/artifacts/list", response_model=ArtifactListResponsePayload)
-async def artifacts_list(
-    req: ArtifactListRequestPayload,
-    authorization: Optional[str] = Header(default=None),
-) -> ArtifactListResponsePayload:
-    cfg: ToolExecutorServerConfig = app.state.cfg
-    _check_auth(cfg, authorization)
-
-    executor: ToolExecutor = app.state.executor
-    return await executor.list_artifacts(req)
-
-
-@app.post("/artifacts/archive", response_model=ArtifactArchiveResponsePayload)
-async def artifacts_archive(
-    req: ArtifactArchiveRequestPayload,
-    authorization: Optional[str] = Header(default=None),
-) -> ArtifactArchiveResponsePayload:
-    cfg: ToolExecutorServerConfig = app.state.cfg
-    _check_auth(cfg, authorization)
-
-    executor: ToolExecutor = app.state.executor
-    return await executor.archive_artifacts(req)
--- a/atropos/api/tool_server.py
+++ b/atropos/api/tool_server.py
@@ -1,140 +0,0 @@
-"""
-External ToolServer (Phase 4.5+).
-
-This server executes tools that must NOT run inside the sandbox, typically
-because they require credentials or access to external services.
-
-Run (dev):
-  uv run uvicorn atropos_agent.api.tool_server:app --host 0.0.0.0 --port 9002
-"""
-
-from __future__ import annotations
-
-import asyncio
-import os
-import inspect
-from typing import Any, Dict, List, Optional
-from pathlib import Path
-
-from fastapi import FastAPI, Header, HTTPException, status
-from pydantic import BaseModel, Field
-
-from ..tools import ToolRegistry, build_tool_registry
-from ..tools.base import ToolResultPayload, ToolServerExecuteRequest
-
-
-class ToolServerConfig(BaseModel):
-    token: Optional[str] = Field(
-        default=None,
-        description="Bearer token required for requests (optional in dev).",
-    )
-    max_concurrency: int = Field(default=16, ge=1, description="Max concurrent tool executions.")
-
-    @classmethod
-    def from_env(cls) -> "ToolServerConfig":
-        # In dev, prefer loading secrets from the repo-local `.env` (not committed).
-        try:
-            from dotenv import load_dotenv  # type: ignore
-        except Exception:  # pragma: no cover
-            load_dotenv = None  # type: ignore[assignment]
-        if load_dotenv is not None:
-            env_path = Path(__file__).resolve().parents[2] / ".env"
-            if env_path.exists():
-                load_dotenv(dotenv_path=env_path)
-
-        token = os.getenv("TOOL_SERVER_TOKEN") or None
-        max_concurrency = int(os.getenv("TOOL_SERVER_MAX_CONCURRENCY", "16"))
-        return cls(token=token, max_concurrency=max_concurrency)
-
-
-app = FastAPI(title="Atropos-Agent Tool Server")
-
-
-@app.get("/")
-async def root() -> Dict[str, str]:
-    return {"message": "Atropos-Agent Tool Server"}
-
-
-@app.on_event("startup")
-async def _startup() -> None:
-    cfg = ToolServerConfig.from_env()
-
-    # External-only registry. It will only include tools that are enabled by toolsets and
-    # whose Hermes requirements/keys are satisfied in this process.
-    tools: ToolRegistry = build_tool_registry(
-        enabled_toolsets=["all"],
-        disabled_toolsets=["terminal", "sandbox", "filesystem", "terminal_stateful", "default"],
-        tool_server_url="enabled",
-    )
-
-    app.state.cfg = cfg
-    app.state.tools = tools
-    app.state.semaphore = asyncio.Semaphore(cfg.max_concurrency)
-
-
-@app.get("/health")
-async def health() -> Dict[str, Any]:
-    return {"status": "ok"}
-
-
-@app.get("/tools")
-async def list_tools() -> Dict[str, Any]:
-    tools: ToolRegistry = app.state.tools
-    return {"tools": [s.to_dict() for s in tools.get_schemas()]}
-
-
-def _check_auth(cfg: ToolServerConfig, authorization: Optional[str]) -> None:
-    if not cfg.token:
-        return
-    if not authorization:
-        raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Missing Authorization header")
-    if not authorization.lower().startswith("bearer "):
-        raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Invalid Authorization header")
-    token = authorization.split(" ", 1)[1].strip()
-    if token != cfg.token:
-        raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail="Invalid token")
-
-
-@app.post("/execute", response_model=ToolResultPayload)
-async def execute_tool(
-    req: ToolServerExecuteRequest,
-    authorization: Optional[str] = Header(default=None),
-) -> ToolResultPayload:
-    cfg: ToolServerConfig = app.state.cfg
-    _check_auth(cfg, authorization)
-
-    tools: ToolRegistry = app.state.tools
-    sem: asyncio.Semaphore = app.state.semaphore
-
-    tool = tools.get(req.tool.name)
-    if tool is None:
-        return ToolResultPayload(
-            success=False,
-            error=f"Unknown tool: {req.tool.name}",
-            uniq_id=req.tool.uniq_id,
-        )
-
-    async with sem:
-        try:
-            kwargs = dict(req.tool.arguments)
-            sig = inspect.signature(tool.execute).parameters
-            # Some tools can benefit from extra context.
-            if req.trajectory_id and "trajectory_id" in sig:
-                kwargs["trajectory_id"] = req.trajectory_id
-            if req.slot_id and "slot_id" in sig:
-                kwargs["slot_id"] = req.slot_id
-            if req.container_addr and "container_addr" in sig:
-                kwargs["container_addr"] = req.container_addr
-            if "task_id" in sig:
-                kwargs["task_id"] = req.trajectory_id
-            result = await tool.execute(**kwargs)
-        except Exception as e:
-            return ToolResultPayload(
-                success=False,
-                error=f"Tool execution error: {e}",
-                uniq_id=req.tool.uniq_id,
-            )
-
-    if result.uniq_id is None:
-        result.uniq_id = req.tool.uniq_id
-    return ToolResultPayload.from_tool_result(result)
--- a/atropos/backends/init.py
+++ b/atropos/backends/init.py
@@ -1,27 +0,0 @@
-from __future__ import annotations
-
-from typing import Any
-
-from .base import ToolBackend
-from .modal_backend import ModalSandboxConfig, ModalToolBackend
-from .nomad_backend import NomadBackendConfig, NomadToolBackend
-
-
-def create_tool_backend(cfg: Any) -> ToolBackend:
-    mode = str(getattr(cfg, "tool_pool_mode", "nomad")).strip().lower()
-    if mode == "nomad":
-        return NomadToolBackend(NomadBackendConfig.from_agent_env_config(cfg))
-    if mode == "modal":
-        return ModalToolBackend(ModalSandboxConfig.from_agent_env_config(cfg))
-    raise ValueError(f"Unknown tool_pool_mode: {mode}")
-
-
-__all__ = [
-    "ToolBackend",
-    "create_tool_backend",
-    "NomadBackendConfig",
-    "NomadToolBackend",
-    "ModalSandboxConfig",
-    "ModalToolBackend",
-]
-
--- a/atropos/backends/base.py
+++ b/atropos/backends/base.py
@@ -1,89 +0,0 @@
-"""
-Backend interfaces for AgentEnv tool execution.
-
-The goal of this module is to decouple ToolExecutor / AgentEnv from any single
-execution backend (Nomad/Docker today; Modal later).
-"""
-
-from __future__ import annotations
-
-from typing import Any, Dict, List, Optional, Protocol, Tuple
-
-from ..slots.executor import ExecutionResult
-from ..slots.slot import Slot
-
-
-class ToolBackend(Protocol):
-    """
-    Minimal interface required by ToolExecutor.
-
-    Backends provide:
-    - lifecycle (start/stop)
-    - slot acquisition/release (workspace affinity)
-    - batched tool execution across slots
-    - optional artifact helpers (for env verification / demos)
-    """
-
-    @property
-    def default_timeout_s(self) -> Optional[float]:
-        """Default sandbox execution timeout in seconds (if any)."""
-
-    async def start(self) -> None:
-        """Start the backend (provision workers/containers, health checks, etc)."""
-
-    async def stop(self, *, purge: bool = False) -> None:
-        """Stop the backend and optionally purge remote resources."""
-
-    async def acquire(self, trajectory_id: Optional[str] = None) -> Slot:
-        """Acquire a slot for a trajectory (workspace affinity)."""
-
-    async def release(self, slot: Slot, *, reset_workspace: bool = False) -> None:
-        """Release a slot back to the pool."""
-
-    async def execute_batch(
-        self,
-        requests: List[Tuple[Slot, str, Dict[str, Any]]],
-        *,
-        timeout_s: Optional[float] = None,
-    ) -> List[ExecutionResult]:
-        """Execute a batch of sandbox tool calls and return results in order."""
-
-    # ---------------------------------------------------------------------
-    # Optional artifact helpers (supported by the Nomad sandbox-server today)
-    # ---------------------------------------------------------------------
-
-    async def read_artifact(
-        self,
-        slot: Slot,
-        path: str,
-        *,
-        encoding: str = "text",
-        max_bytes: Optional[int] = None,
-        include_sha256: bool = False,
-        timeout_s: Optional[float] = None,
-    ) -> Dict[str, Any]:
-        raise NotImplementedError
-
-    async def list_artifacts(
-        self,
-        slot: Slot,
-        path: str = ".",
-        *,
-        recursive: bool = False,
-        max_entries: Optional[int] = None,
-        timeout_s: Optional[float] = None,
-    ) -> Dict[str, Any]:
-        raise NotImplementedError
-
-    async def archive_artifacts(
-        self,
-        slot: Slot,
-        path: str = ".",
-        *,
-        archive_format: str = "tar.gz",
-        max_bytes: Optional[int] = None,
-        max_entries: Optional[int] = None,
-        timeout_s: Optional[float] = None,
-    ) -> Dict[str, Any]:
-        raise NotImplementedError
-
--- a/atropos/backends/modal_backend.py
+++ b/atropos/backends/modal_backend.py
--- a/atropos/backends/nomad_backend.py
+++ b/atropos/backends/nomad_backend.py
@@ -1,156 +0,0 @@
-"""
-Nomad/Docker tool backend.
-
-This backend is the current default for AgentEnv: it provisions a Nomad job
-running `sandbox_server.py` and multiplexes stateless slots inside each container.
-"""
-
-from __future__ import annotations
-
-from dataclasses import dataclass
-from typing import Any, Dict, List, Optional, Tuple
-
-from ..slots import Slot, SlotPool, SlotPoolConfig
-from ..slots.executor import ExecutionResult
-from .base import ToolBackend
-
-
-@dataclass(frozen=True)
-class NomadBackendConfig:
-    nomad_address: str
-    sandbox_job_id: str
-    sandbox_image: str
-    slots_per_container: int
-    min_containers: int
-    max_containers: int
-    privileged: bool
-    acquire_timeout_s: float
-    purge_job_on_start: bool
-    # Driver selection: "docker" or "singularity"
-    driver: str = "docker"
-    # Path to .sif file for singularity driver (required if driver="singularity")
-    singularity_image: Optional[str] = None
-
-    @classmethod
-    def from_agent_env_config(cls, cfg: Any) -> "NomadBackendConfig":
-        return cls(
-            nomad_address=str(getattr(cfg, "nomad_address")),
-            sandbox_job_id=str(getattr(cfg, "sandbox_job_id")),
-            sandbox_image=str(getattr(cfg, "sandbox_image")),
-            slots_per_container=int(getattr(cfg, "slots_per_container")),
-            min_containers=int(getattr(cfg, "min_containers")),
-            max_containers=int(getattr(cfg, "max_containers")),
-            privileged=bool(getattr(cfg, "privileged")),
-            acquire_timeout_s=float(getattr(cfg, "acquire_timeout_s")),
-            purge_job_on_start=bool(getattr(cfg, "purge_job_on_start", False)),
-            driver=str(getattr(cfg, "driver", "docker")),
-            singularity_image=getattr(cfg, "singularity_image", None),
-        )
-
-
-class NomadToolBackend(ToolBackend):
-    def __init__(self, config: NomadBackendConfig):
-        self.config = config
-        self.pool = SlotPool(
-            SlotPoolConfig(
-                nomad_address=config.nomad_address,
-                job_id=config.sandbox_job_id,
-                image=config.sandbox_image,
-                slots_per_container=config.slots_per_container,
-                min_containers=config.min_containers,
-                max_containers=config.max_containers,
-                privileged=config.privileged,
-                acquire_timeout=config.acquire_timeout_s,
-                purge_job_on_start=bool(config.purge_job_on_start),
-                driver=config.driver,
-                singularity_image=config.singularity_image,
-            )
-        )
-
-    @property
-    def default_timeout_s(self) -> Optional[float]:
-        t = getattr(self.pool.executor, "timeout", None)
-        total = getattr(t, "total", None)
-        try:
-            return float(total) if total is not None else None
-        except Exception:
-            return None
-
-    async def start(self) -> None:
-        await self.pool.start()
-
-    async def stop(self, *, purge: bool = False) -> None:
-        await self.pool.stop(purge_job=purge)
-
-    async def acquire(self, trajectory_id: Optional[str] = None) -> Slot:
-        return await self.pool.acquire(trajectory_id)
-
-    async def release(self, slot: Slot, *, reset_workspace: bool = False) -> None:
-        await self.pool.release(slot, reset_workspace=reset_workspace)
-
-    async def execute_batch(
-        self,
-        requests: List[Tuple[Slot, str, Dict[str, Any]]],
-        *,
-        timeout_s: Optional[float] = None,
-    ) -> List[ExecutionResult]:
-        return await self.pool.execute_batch(requests, timeout=timeout_s)
-
-    async def read_artifact(
-        self,
-        slot: Slot,
-        path: str,
-        *,
-        encoding: str = "text",
-        max_bytes: Optional[int] = None,
-        include_sha256: bool = False,
-        timeout_s: Optional[float] = None,
-    ) -> Dict[str, Any]:
-        return await self.pool.executor.read_artifact(
-            slot,
-            path,
-            encoding=encoding,
-            max_bytes=max_bytes,
-            include_sha256=include_sha256,
-            timeout=timeout_s,
-        )
-
-    async def list_artifacts(
-        self,
-        slot: Slot,
-        path: str = ".",
-        *,
-        recursive: bool = False,
-        max_entries: Optional[int] = None,
-        timeout_s: Optional[float] = None,
-    ) -> Dict[str, Any]:
-        return await self.pool.executor.list_artifacts(
-            slot,
-            path,
-            recursive=recursive,
-            max_entries=max_entries,
-            timeout=timeout_s,
-        )
-
-    async def archive_artifacts(
-        self,
-        slot: Slot,
-        path: str = ".",
-        *,
-        archive_format: str = "tar.gz",
-        max_bytes: Optional[int] = None,
-        max_entries: Optional[int] = None,
-        timeout_s: Optional[float] = None,
-    ) -> Dict[str, Any]:
-        return await self.pool.executor.archive_artifacts(
-            slot,
-            path,
-            archive_format=archive_format,
-            max_bytes=max_bytes,
-            max_entries=max_entries,
-            timeout=timeout_s,
-        )
-
-    def get_stats(self) -> Dict[str, Any]:
-        return self.pool.get_stats()
-
--- a/atropos/envs/init.py
+++ b/atropos/envs/init.py
@@ -1,10 +0,0 @@
-"""
-Environment implementations for atropos-agent.
-"""
-
-from .agent_env import AgentEnv, AgentEnvConfig
-
-# NOTE: Additional example envs exist as modules (e.g. `test_env`, `swe_smith_oracle_env`),
-# but are intentionally not imported here to avoid pulling heavy optional deps at import time.
-
-__all__ = ["AgentEnv", "AgentEnvConfig"]
--- a/atropos/envs/agent_env.py
+++ b/atropos/envs/agent_env.py
@@ -1,526 +0,0 @@
-"""
-AgentEnv - Atropos BaseEnv extension for agent/tool-call workloads.
-
-AgentEnv is responsible for starting the sandbox tool execution backend and
-providing helpers for running agent trajectories with queued/batched tool calls.
-"""
-
-from __future__ import annotations
-import os
-import asyncio
-import time
-import uuid
-from abc import ABC, abstractmethod
-from typing import Any, Awaitable, Callable, Dict, Generic, List, Optional, Tuple, TypeVar
-
-from pydantic import Field
-
-from atroposlib.envs.base import APIServerConfig, BaseEnv, BaseEnvConfig, Item, ScoredDataGroup, ScoredDataItem
-from atroposlib.envs.server_handling.server_baseline import AsyncSemWithAdaptiveWeight
-
-from ..agent import AgentConfig, AgentResult, AtroposAgent
-from ..backends import ToolBackend, create_tool_backend
-from ..tools import ToolRegistry, build_tool_registry
-from ..tools.tool_executor import ToolExecutor, ToolExecutorConfig
-
-# Main BaseEnv child classes. Child class THESE to get agent+tooling functionality easily.
-
-class AgentEnvConfig(BaseEnvConfig):
-    tool_pool_mode: str = Field(default="nomad", description="Tool execution backend ('nomad' or 'modal')")
-
-    allow_network: bool = Field(
-        default=True,
-        description="Whether sandbox bash commands may access the network (env policy).",
-    )
-    require_sandbox: bool = Field(
-        default=False,
-        description="Fail closed if bubblewrap sandboxing is unavailable/unusable for stateless sandbox tools.",
-    )
-    require_stateful_sandbox: bool = Field(
-        default=False,
-        description="Fail closed if bubblewrap/PID isolation is unavailable for stateful terminal tools (tmux).",
-    )
-    tool_batch_window_ms: int = Field(default=20, description="ToolExecutor batching window (ms)")
-    tool_max_batch_size: int = Field(default=200, description="ToolExecutor maximum batch size")
-
-    # nomad mode settings. TODO: Add Modal support, split this into own config
-    nomad_address: str = Field(default="http://localhost:4646", description="Nomad API address")
-    sandbox_job_id: str = Field(default="atropos-sandbox-agent-env", description="Nomad job id for sandbox containers")
-    sandbox_image: str = Field(default="atropos-sandbox:local", description="Docker image for sandbox containers")
-    slots_per_container: int = Field(default=10, description="Nomad mode: slots per container")
-    min_containers: int = Field(default=1, description="Nomad mode: minimum containers")
-    max_containers: int = Field(default=10, description="Nomad mode: maximum containers")
-    privileged: bool = Field(default=False, description="Nomad mode: run container privileged")
-    acquire_timeout_s: float = Field(default=30.0, description="Slot acquisition timeout (seconds)")
-    purge_job_on_start: bool = Field(
-        default=False,
-        description=(
-            "Nomad mode: stop/purge the sandbox job on startup. This is helpful in local dev and training runs "
-            "to recover from previous crashes that leave the job in a restart backoff state."
-        ),
-    )
-    purge_job_on_shutdown: bool = Field(default=True, description="Nomad mode: stop/purge job on shutdown")
-    
-    # Nomad driver selection (docker or singularity)
-    driver: str = Field(
-        default="docker",
-        description="Nomad task driver: 'docker' (default) or 'singularity' (for HPC without sudo Docker)",
-    )
-    singularity_image: Optional[str] = Field(
-        default=None,
-        description="Path to .sif file for Singularity driver (required if driver='singularity')",
-    )
-
-    # modal mode settings (stub; implementation pending)
-    modal_app_name: str = Field(default="atropos-sandbox", description="Modal app name (stub)")
-    modal_function_name: str = Field(default="sandbox_server", description="Modal function/actor name (stub)")
-    modal_volume_name: Optional[str] = Field(default=None, description="Modal Volume name for persistent storage (stub)")
-    modal_volume_mount_path: str = Field(default="/data", description="Modal Volume mount path (stub)")
-
-    # basic agent defaults
-    agent_max_steps: int = Field(default=50, description="Max ReACT steps per trajectory")
-    agent_temperature: float = Field(default=0.7, description="Sampling temperature")
-    agent_max_tokens: Optional[int] = Field(
-        default=None,
-        description="Max tokens per model response (default: let backend decide)",
-    )
-    agent_tool_delay_s: float = Field(default=0.0, description="Delay between tool calls (seconds)")
-
-    # tool selection
-    enabled_toolsets: List[str] = Field(
-        default_factory=lambda: ["default"],
-        description="Toolsets to enable (Hermes-style grouping).",
-    )
-    disabled_toolsets: List[str] = Field(
-        default_factory=list,
-        description="Toolsets to disable (applied after enabled_toolsets).",
-    )
-
-    # external ToolServer routing (Phase 4.5+)
-    tool_server_url: Optional[str] = Field(
-        default=None,
-        description="Base URL for external ToolServer (enables external tools).",
-    )
-    tool_server_token: Optional[str] = Field(
-        default=None,
-        description="Bearer token for ToolServer auth (optional in dev).",
-    )
-
-AgentEnvConfigT = TypeVar("AgentEnvConfigT", bound="AgentEnvConfig")
-
-
-class AgentEnv(BaseEnv, ABC, Generic[AgentEnvConfigT]):
-    env_config_cls = AgentEnvConfig
-
-    def __init__(
-        self,
-        config: AgentEnvConfigT,
-        server_configs: List[APIServerConfig],
-        slurm: bool = False,
-        testing: bool = False,
-    ):
-        super().__init__(config, server_configs, slurm, testing)
-        self.config: AgentEnvConfigT = config
-
-        self.tools: ToolRegistry = self.build_tools()
-
-        self._backend: Optional[ToolBackend] = None
-        self._tool_executor: Optional[ToolExecutor] = None
-        self._tool_server_inprocess: bool = False
-        self._trajectory_workspace_meta: Dict[str, Dict[str, Any]] = {}
-
-    def build_tools(self) -> ToolRegistry:
-        """Wraps original Hermes-Agent ToolRegistry for atropos AgentEnv use.
-        See Hermes-Agent docs for toolsets and available tools etc.
-        """
-        return build_tool_registry(
-            enabled_toolsets=self.config.enabled_toolsets or ["default"],
-            disabled_toolsets=self.config.disabled_toolsets or None,
-            tool_server_url=self.config.tool_server_url,
-        )
-
-    @abstractmethod
-    def build_task(self, item: Item) -> str:
-        """Return the user-facing task string for the agent."""
-
-    @abstractmethod
-    async def score_trajectory(self, item: Item, final_response: str) -> float:
-        """Return a scalar score for this trajectory."""
-
-    async def setup_trajectory_workspace(
-        self,
-        item: Item,
-        *,
-        trajectory_id: str,
-        exec_tool: Callable[["ToolCall"], Awaitable["ToolResult"]],
-    ) -> Dict[str, Any]:
-        """
-        Optional hook: prepare the sandbox workspace before the agent starts.
-
-        Examples:
-        - clone a repo and checkout a commit
-        - write fixture files (e.g. images) for external-tool demos
-        - pre-install dependencies
-
-        Default: no-op.
-        """
-        _ = (item, trajectory_id, exec_tool)
-        return {}
-
-    async def verify_and_score_trajectory(
-        self,
-        item: Item,
-        final_response: str,
-        *,
-        trajectory_id: str,
-        exec_tool: Callable[["ToolCall"], Awaitable["ToolResult"]],
-        agent_result: Optional[AgentResult] = None,
-        workspace_meta: Optional[Dict[str, Any]] = None,
-    ) -> tuple[float, Dict[str, Any]]:
-        """
-        Optional hook: run in-sandbox verification before scoring.
-
-        Many agent envs need to execute verification inside the same trajectory
-        workspace (e.g. pytest) before releasing/resetting the slot.
-
-        Default: calls `score_trajectory()` and returns empty metadata.
-        """
-        _ = (trajectory_id, exec_tool, agent_result, workspace_meta)  # default ignores in-workspace verification
-        score = await self.score_trajectory(item, final_response)
-        return score, {}
-
-    def build_agent_config(self, item: Item) -> AgentConfig:  # noqa: ARG002
-        return AgentConfig(
-            max_steps=self.config.agent_max_steps,
-            temperature=self.config.agent_temperature,
-            max_tokens=self.config.agent_max_tokens,
-            tool_delay_s=self.config.agent_tool_delay_s,
-        )
-
-    async def setup(self) -> None:
-        print(f"[AgentEnv] setup(): starting tool backend ({self.config.tool_pool_mode})", flush=True)
-        await self._start_tool_backend()
-        print("[AgentEnv] setup(): configuring server concurrency", flush=True)
-        self._configure_server_concurrency()
-        print("[AgentEnv] setup(): running env-specific setup_agent_env()", flush=True)
-        await self.setup_agent_env()
-        print("[AgentEnv] setup(): done", flush=True)
-
-    def _configure_server_concurrency(self) -> None:
-        """
-        Ensure the LLM server concurrency isn't accidentally capped below `group_size`.
-
-        In `BaseEnv process` mode, groups are collected concurrently and if the underlying
-        ServerManager/OpenAIServer semaphore is left at 1, we serialize inference even
-        when `--env.group_size` is > 1.
-        """
-        desired = int(getattr(self.config, "group_size", 1) or 1)
-        if desired <= 1:
-            return
-
-        servers = getattr(self.server, "servers", None)
-        if not isinstance(servers, list) or not servers:
-            return
-
-        for s in servers:
-            sem = getattr(s, "sem", None)
-            eval_sem = getattr(s, "eval_sem", None)
-            # Only increase; never shrink.
-            if sem is not None and getattr(sem, "max_val", 0) < desired:
-                s.sem = AsyncSemWithAdaptiveWeight(desired)
-                if hasattr(s, "config") and hasattr(s.config, "num_max_requests_at_once"):
-                    s.config.num_max_requests_at_once = desired
-            if eval_sem is not None and getattr(eval_sem, "max_val", 0) < desired:
-                s.eval_sem = AsyncSemWithAdaptiveWeight(desired)
-                if hasattr(s, "config") and hasattr(s.config, "num_requests_for_eval"):
-                    s.config.num_requests_for_eval = desired
-
-    @abstractmethod
-    async def setup_agent_env(self) -> None:
-        """Subclass hook for env-specific setup."""
-
-    async def evaluate(self, *args, **kwargs):  # noqa: ARG002
-        """
-        Default eval hook (no-op).
-
-        Atropos BaseEnv requires an `evaluate()` implementation. Many agent envs
-        won't have a meaningful evaluation path during early PoC work; they can
-        override this when needed.
-        """
-        return {}
-
-    async def env_manager(self):
-        try:
-            return await super().env_manager()
-        finally:
-            await self.shutdown_tool_backend()
-
-    async def process_manager(self):
-        try:
-            return await super().process_manager()
-        finally:
-            await self.shutdown_tool_backend()
-
-    async def _start_tool_backend(self) -> None:
-        if self._tool_executor is not None:
-            return
-
-        tool_server_url = self.config.tool_server_url
-        tool_server_client = None
-        if tool_server_url == "inprocess":
-            import httpx
-            from ..api.tool_server import app as tool_server_app
-
-            await tool_server_app.router.startup()
-            tool_server_client = httpx.AsyncClient(
-                transport=httpx.ASGITransport(app=tool_server_app),
-                base_url="http://toolserver",
-            )
-            tool_server_url = "http://toolserver"
-            self._tool_server_inprocess = True
-
-        backend = create_tool_backend(self.config)
-        await backend.start()
-
-        executor = ToolExecutor(
-            backend=backend,
-            tools=self.tools,
-            config=ToolExecutorConfig(
-                batch_window_ms=self.config.tool_batch_window_ms,
-                max_batch_size=self.config.tool_max_batch_size,
-                allow_network=self.config.allow_network,
-                require_sandbox=self.config.require_sandbox,
-                require_stateful_sandbox=self.config.require_stateful_sandbox,
-                tool_server_url=tool_server_url,
-                tool_server_token=self.config.tool_server_token,
-            ),
-        )
-        await executor.start()
-        if tool_server_client is not None:
-            executor._tool_server_client = tool_server_client  # type: ignore[attr-defined]
-
-        self._backend = backend
-        self._tool_executor = executor
-
-    async def shutdown_tool_backend(self) -> None:
-        executor = self._tool_executor
-        backend = self._backend
-        inprocess_tool_server = self._tool_server_inprocess
-        self._tool_executor = None
-        self._backend = None
-        self._tool_server_inprocess = False
-
-        if executor is not None:
-            await executor.close()
-        if backend is not None:
-            await backend.stop(purge=bool(self.config.purge_job_on_shutdown))
-        if inprocess_tool_server:
-            from ..api.tool_server import app as tool_server_app
-
-            await tool_server_app.router.shutdown()
-
-    async def collect_trajectory(
-        self, item: Item
-    ) -> Tuple[Optional[ScoredDataItem], List[Item]]:
-        if self._tool_executor is None:
-            raise RuntimeError("Tool backend not started")
-
-        trajectory_id = str(uuid.uuid4())
-        t0 = time.perf_counter()
-        print(f"[AgentEnv] collect_trajectory(): tid={trajectory_id} start", flush=True)
-        task = self.build_task(item)
-        agent_config = self.build_agent_config(item)
-        if os.getenv("ATROPOS_DEBUG_PRINT_TASK") == "1":
-            print(f"Starting trajectory {trajectory_id} with task: {task}", flush=True)
-        else:
-            # Avoid printing the full task prompt by default (can be huge/noisy).
-            one_line = " ".join(str(task).splitlines()).strip()
-            preview = one_line[:240] + ("…" if len(one_line) > 240 else "")
-            print(f"Starting trajectory {trajectory_id} (task preview): {preview}", flush=True)
-
-        async def _exec(call):
-            return await self._tool_executor.execute(trajectory_id, call)
-
-        agent = AtroposAgent(
-            server=self.server,
-            tokenizer=self.tokenizer,
-            tools=self.tools,
-            config=agent_config,
-            execute_tool=_exec,
-        )
-
-        try:
-            print(f"[AgentEnv] tid={trajectory_id} setup_trajectory_workspace() start", flush=True)
-            workspace_meta = await self.setup_trajectory_workspace(item, trajectory_id=trajectory_id, exec_tool=_exec)
-            if not isinstance(workspace_meta, dict):
-                workspace_meta = {}
-            self._trajectory_workspace_meta[trajectory_id] = workspace_meta
-            print(
-                f"[AgentEnv] tid={trajectory_id} setup_trajectory_workspace() done in {time.perf_counter() - t0:.2f}s",
-                flush=True,
-            )
-
-            print(f"[AgentEnv] tid={trajectory_id} agent.run() start", flush=True)
-            result = await agent.run(task)
-            print(
-                f"[AgentEnv] tid={trajectory_id} agent.run() done in {time.perf_counter() - t0:.2f}s "
-                f"success={result.success} tool_calls={result.total_tool_calls}",
-                flush=True,
-            )
-            if not result.success or result.trajectory_data is None:
-                # Do not trigger BaseEnv retries for agent failures.
-                # Record the trajectory with score 0.0 so training/eval can see the failure mode.
-                messages = [{"role": "system", "content": agent._build_system_prompt()}]  # noqa: SLF001
-                messages.append({"role": "user", "content": task})
-                for step in result.steps:
-                    messages.append({"role": "assistant", "content": step.assistant_message})
-                    if step.tool_results:
-                        tool_text = "\n".join(r.to_xml() for r in step.tool_results)
-                        messages.append({"role": "user", "content": tool_text})
-
-                scored: ScoredDataItem = {
-                    "tokens": (result.trajectory_data.tokens if result.trajectory_data else []),
-                    "masks": (result.trajectory_data.masked_tokens if result.trajectory_data else []),
-                    "scores": 0.0,
-                }
-                if result.trajectory_data is not None:
-                    scored["inference_logprobs"] = result.trajectory_data.logprobs  # type: ignore[typeddict-unknown-key]
-                    if getattr(result.trajectory_data, "metadata", None):
-                        scored["overrides"] = {"managed_metadata": result.trajectory_data.metadata}
-                if self.config.include_messages:
-                    # Record a final failure marker as a user-side tool_response-like block so it survives templates.
-                    import json
-
-                    err = result.error or "agent_failed"
-                    messages.append(
-                        {
-                            "role": "user",
-                            "content": f"<tool_response>{json.dumps({'success': False, 'error': err})}</tool_response>",
-                        }
-                    )
-                    scored["messages"] = messages
-                return scored, []
-
-            print(f"[AgentEnv] tid={trajectory_id} verify_and_score_trajectory() start", flush=True)
-            score, score_metadata = await self.verify_and_score_trajectory(
-                item,
-                result.final_response,
-                trajectory_id=trajectory_id,
-                exec_tool=_exec,
-                agent_result=result,
-                workspace_meta=workspace_meta,
-            )
-            print(
-                f"[AgentEnv] tid={trajectory_id} verify_and_score_trajectory() done in {time.perf_counter() - t0:.2f}s "
-                f"score={score}",
-                flush=True,
-            )
-
-            messages = [{"role": "system", "content": agent._build_system_prompt()}]  # noqa: SLF001
-            messages.append({"role": "user", "content": task})
-            for step in result.steps:
-                messages.append({"role": "assistant", "content": step.assistant_message})
-                if step.tool_results:
-                    tool_text = "\n".join(r.to_xml() for r in step.tool_results)
-                    messages.append({"role": "user", "content": tool_text})
-
-            # Optional: allow env verification to attach additional messages (e.g. install logs).
-            if self.config.include_messages and isinstance(score_metadata, dict):
-                extra = score_metadata.get("verification_messages")
-                if isinstance(extra, list):
-                    for m in extra:
-                        if isinstance(m, dict) and isinstance(m.get("role"), str) and isinstance(m.get("content"), str):
-                            messages.append({"role": m["role"], "content": m["content"]})
-
-            scored: ScoredDataItem = {
-                "tokens": result.trajectory_data.tokens,
-                "masks": result.trajectory_data.masked_tokens,
-                "scores": score,
-            }
-            # Atroposlib expects policy logprobs at the *group* level under `inference_logprobs`.
-            # We stash per-item values here and lift them into the group in `collect_trajectories()`.
-            scored["inference_logprobs"] = result.trajectory_data.logprobs  # type: ignore[typeddict-unknown-key]
-            if getattr(result.trajectory_data, "metadata", None):
-                scored["overrides"] = {"managed_metadata": result.trajectory_data.metadata}
-            if self.config.include_messages:
-                scored["messages"] = messages
-
-            return scored, []
-        finally:
-            self._trajectory_workspace_meta.pop(trajectory_id, None)
-            print(f"[AgentEnv] tid={trajectory_id} release_trajectory(reset_workspace=True)", flush=True)
-            await self._tool_executor.release_trajectory(trajectory_id, reset_workspace=True)
-            print(f"[AgentEnv] collect_trajectory(): tid={trajectory_id} done in {time.perf_counter() - t0:.2f}s", flush=True)
-
-    async def collect_trajectories(
-        self, item: Item
-    ) -> Tuple[Optional[ScoredDataGroup], List[Item]]:
-        tasks = [self.collect_trajectory(item) for _ in range(self.config.group_size)]
-        results = await asyncio.gather(*tasks)
-
-        backlog: List[Item] = []
-        items: List[ScoredDataItem] = []
-        for scored, b in results:
-            backlog.extend(b)
-            if scored is not None:
-                items.append(scored)
-
-        if len(items) != self.config.group_size:
-            return None, backlog
-
-        group: ScoredDataGroup = ScoredDataGroup(
-            tokens=[],
-            masks=[],
-            scores=[],
-            advantages=[],
-            ref_logprobs=[],
-            messages=[] if self.config.include_messages else None,
-            inference_logprobs=[],
-            group_overrides={},
-            overrides=[],
-            images=[],
-            generation_params=None,
-        )
-
-        for it in items:
-            group["tokens"].append(it["tokens"])
-            group["masks"].append(it["masks"])
-            group["scores"].append(it["scores"])
-            # policy logprobs (for PPO/GRPO training) if present
-            lp = it.get("inference_logprobs")  # type: ignore[typeddict-item]
-            if lp is not None:
-                group["inference_logprobs"].append(lp)
-            group["overrides"].append(it.get("overrides") or {})  # type: ignore[typeddict-item]
-            if group.get("messages") is not None and it.get("messages") is not None:
-                group["messages"].append(it["messages"])
-
-        return group, backlog
-
-    async def run_agent(self, task: str, *, trajectory_id: Optional[str] = None) -> Tuple[str, Dict[str, Any]]:
-        """
-        Run the AtroposAgent on a single task and return (final_response, debug).
-
-        This is a helper intended for simple environments and tests.
-        """
-        if self._tool_executor is None:
-            raise RuntimeError("Tool backend not started")
-
-        tid = trajectory_id or str(uuid.uuid4())
-
-        async def _exec(call):
-            return await self._tool_executor.execute(tid, call)
-
-        agent = AtroposAgent(
-            server=self.server,
-            tokenizer=self.tokenizer,
-            tools=self.tools,
-            config=AgentConfig(
-                max_steps=self.config.agent_max_steps,
-                temperature=self.config.agent_temperature,
-                max_tokens=self.config.agent_max_tokens,
-            ),
-            execute_tool=_exec,
-        )
-        result = await agent.run(task)
-        await self._tool_executor.release_trajectory(tid, reset_workspace=True)
-        return result.final_response, {"success": result.success, "error": result.error, "tool_calls": result.total_tool_calls}
--- a/atropos/envs/hermes_compat_test_env.py
+++ b/atropos/envs/hermes_compat_test_env.py
@@ -1,171 +0,0 @@
-"""
-Hermes-Agent + Atropos (Nomad sandbox) compatibility smoke environment.
-
-This environment is intended to validate, end-to-end:
-  BaseEnv.process -> AgentEnv -> ToolExecutor (batched) -> Nomad SlotPool -> sandbox_server
-
-It forces the model to use a sandbox tool by asking it to run a command that
-generates a high-entropy token inside the sandbox, then repeat it exactly.
-
-Run (process mode):
-  uv run python -m atropos.envs.hermes_compat_test_env process --env.use_wandb false --env.total_steps 2 --env.group_size 1
-"""
-
-from __future__ import annotations
-
-import os
-from typing import Any, Dict, List, Tuple
-
-from dotenv import load_dotenv
-from pydantic import Field
-
-from atroposlib.envs.base import APIServerConfig, Item
-
-from ..agent import AgentConfig, AgentResult
-from ..tools import ToolCall
-from .agent_env import AgentEnv, AgentEnvConfig
-
-load_dotenv()
-
-
-def _forced_tool_item() -> Item:
-    # Use double quotes in the shell command and show JSON escaping explicitly.
-    # This avoids invalid JSON escapes like `\\'` (not valid JSON) that some models produce.
-    cmd = 'python -c "import secrets; print(secrets.token_hex(16))"'
-    return {
-        "command": cmd,
-        "prompt": (
-            "You are acting as an agent inside a sandboxed environment.\n"
-            "You MUST use the terminal tool to execute commands.\n"
-            "Run this exact command:\n"
-            f"{cmd}\n"
-            "When you call the tool, use valid JSON inside <tool_call>. Example:\n"
-            '<tool_call>{"name": "terminal", "arguments": {"command": '
-            '"python -c \\\\"import secrets; print(secrets.token_hex(16))\\\\""}}'
-            "</tool_call>\n"
-            "Then respond with EXACTLY what it printed (the hex token) and nothing else.\n"
-            "Do not guess. Do not explain."
-        ),
-    }
-
-
-class HermesCompatTestEnvConfig(AgentEnvConfig):
-    server_base_url: str = Field(
-        default="http://127.0.0.1:8080",
-        description="Base URL for an OpenAI-compatible chat server (without /v1).",
-    )
-    server_model: str = Field(default="hermes-4-36b", description="Model name")
-    tokenizer_name: str = Field(default="NousResearch/Hermes-4.3-36B", description="Tokenizer name for RL tokenization")
-
-
-class HermesCompatTestEnv(AgentEnv[HermesCompatTestEnvConfig]):
-    name = "hermes_compat_test_env"
-    env_config_cls = HermesCompatTestEnvConfig
-
-    def __init__(
-        self,
-        config: HermesCompatTestEnvConfig,
-        server_configs: List[APIServerConfig],
-        slurm: bool = False,
-        testing: bool = False,
-    ):
-        super().__init__(config, server_configs, slurm, testing)
-        self._iter = 0
-
-    @classmethod
-    def config_init(cls) -> Tuple[HermesCompatTestEnvConfig, List[APIServerConfig]]:
-        base_url = (
-            os.getenv("ATROPOS_SERVER_BASE_URL")
-            or os.getenv("OPENAI_BASE_URL")
-            or os.getenv("LLM_BASE_URL")
-            or "http://127.0.0.1:8080"
-        )
-        model = os.getenv("ATROPOS_SERVER_MODEL") or os.getenv("LLM_MODEL") or "hermes-4-36b"
-        api_key = os.getenv("ATROPOS_SERVER_API_KEY") or os.getenv("NOUS_API_KEY") or os.getenv("OPENAI_API_KEY") or "local"
-
-        env_config = HermesCompatTestEnvConfig(
-            tokenizer_name=os.getenv("ATROPOS_TOKENIZER_NAME") or "NousResearch/Hermes-4.3-36B",
-            group_size=1,
-            use_wandb=False,
-            include_messages=True,
-            ensure_scores_are_not_same=False,
-            total_steps=2,
-            batch_size=1,
-            server_base_url=base_url,
-            server_model=model,
-            # Tooling: sandbox-only terminal.
-            enabled_toolsets=["terminal"],
-            disabled_toolsets=[],
-            # Default to Nomad sandboxing; users can override via --env.* args.
-            sandbox_image=os.getenv("ATROPOS_SANDBOX_IMAGE") or "atropos-sandbox:local",
-            # In local dev it's common for a previous crash to leave the job in backoff.
-            purge_job_on_start=True,
-            purge_job_on_shutdown=True,
-        )
-
-        server_configs = [
-            APIServerConfig(
-                model_name=model,
-                base_url=f"{base_url.rstrip('/')}/v1",
-                api_key=api_key,
-                num_max_requests_at_once=1,
-                num_requests_for_eval=1,
-                timeout=120,
-            )
-        ]
-        return env_config, server_configs
-
-    async def setup_agent_env(self) -> None:
-        return None
-
-    async def get_next_item(self) -> Item:
-        self._iter += 1
-        return _forced_tool_item()
-
-    def build_task(self, item: Item) -> str:
-        return str(item.get("prompt") or "")
-
-    def build_agent_config(self, item: Item) -> AgentConfig:  # noqa: ARG002
-        # Avoid imposing max_tokens by default; tool-tag responses can be long for some models.
-        return AgentConfig(
-            max_steps=min(8, int(self.config.agent_max_steps)),
-            temperature=0.2,
-            max_tokens=None,
-        )
-
-    async def score_trajectory(self, item: Item, final_response: str) -> float:
-        # Scoring happens in verify_and_score_trajectory so we can inspect tool results.
-        _ = (item, final_response)
-        return 0.0
-
-    async def verify_and_score_trajectory(
-        self,
-        item: Item,
-        final_response: str,
-        *,
-        trajectory_id: str,  # noqa: ARG002
-        exec_tool,  # noqa: ARG002
-        agent_result: AgentResult | None = None,
-        workspace_meta: Dict[str, Any] | None = None,  # noqa: ARG002
-    ) -> tuple[float, Dict[str, Any]]:
-        if agent_result is None:
-            return 0.0, {"error": "Missing agent_result"}
-
-        observed: str = ""
-        tool_ok = False
-        for step in agent_result.steps:
-            for res in step.tool_results:
-                if not res.success:
-                    return 0.0, {"error": res.error, "output": res.output}
-                out = (res.output or "").strip()
-                if out:
-                    observed = out.splitlines()[-1].strip()
-                    tool_ok = True
-
-        final = (final_response or "").strip()
-        score = 1.0 if tool_ok and agent_result.total_tool_calls > 0 and observed and final == observed else 0.0
-        return score, {"observed": observed, "tool_calls": agent_result.total_tool_calls, "command": item.get("command")}
-
-
-if __name__ == "__main__":
-    HermesCompatTestEnv.cli()
--- a/atropos/envs/sandbox_terminal_smoke_env.py
+++ b/atropos/envs/sandbox_terminal_smoke_env.py
@@ -1,172 +0,0 @@
-"""
-Nomad sandbox terminal smoke environment (training-oriented).
-
-Validates, end-to-end:
-  BaseEnv.process -> AgentEnv -> ToolExecutor (batched) -> Nomad SlotPool -> sandbox_server
-
-It forces the model to use a sandbox tool by asking it to run a command that
-generates a high-entropy token inside the sandbox, then repeat it exactly.
-
-Run (process mode):
-  uv run python -m atropos.envs.sandbox_terminal_smoke_env process --env.use_wandb false --env.total_steps 2 --env.group_size 1
-"""
-
-from __future__ import annotations
-
-import os
-from typing import Any, Dict, List, Tuple
-
-from dotenv import load_dotenv
-from pydantic import Field
-
-from atroposlib.envs.base import APIServerConfig, Item
-
-from ..agent import AgentConfig, AgentResult
-from ..tools import ToolCall
-from .agent_env import AgentEnv, AgentEnvConfig
-
-load_dotenv()
-
-STRICT_TOOLCALL_SYSTEM_PROMPT = None
-
-
-def _forced_tool_item() -> Item:
-    # Use double quotes in the shell command and show JSON escaping explicitly.
-    # This avoids invalid JSON escapes like `\\'` (not valid JSON) that some models produce.
-    cmd = 'python -c "import secrets; print(secrets.token_hex(16))"'
-    return {
-        "command": cmd,
-        "prompt": (
-            "You MUST use the terminal tool.\n"
-            "Run this exact command:\n"
-            f"{cmd}\n"
-            "When you call the tool, use valid JSON inside <tool_call>. Example:\n"
-            '<tool_call>{"name": "terminal", "arguments": {"command": '
-            '"python -c \\\\"import secrets; print(secrets.token_hex(16))\\\\""}}'
-            "</tool_call>\n"
-            "Then respond with EXACTLY what it printed (the hex token) and nothing else.\n"
-            "Do not guess. Do not explain."
-        ),
-    }
-
-
-class SandboxTerminalSmokeEnvConfig(AgentEnvConfig):
-    server_base_url: str = Field(
-        default="http://127.0.0.1:8080",
-        description="Base URL for an OpenAI-compatible chat server (without /v1).",
-    )
-    server_model: str = Field(default="hermes-4-36b", description="Model name")
-    tokenizer_name: str = Field(default="NousResearch/Hermes-4.3-36B", description="Tokenizer name for RL tokenization")
-
-
-class SandboxTerminalSmokeEnv(AgentEnv[SandboxTerminalSmokeEnvConfig]):
-    name = "sandbox_terminal_smoke_env"
-    env_config_cls = SandboxTerminalSmokeEnvConfig
-
-    def __init__(
-        self,
-        config: SandboxTerminalSmokeEnvConfig,
-        server_configs: List[APIServerConfig],
-        slurm: bool = False,
-        testing: bool = False,
-    ):
-        super().__init__(config, server_configs, slurm, testing)
-        self._iter = 0
-
-    @classmethod
-    def config_init(cls) -> Tuple[SandboxTerminalSmokeEnvConfig, List[APIServerConfig]]:
-        base_url = (
-            os.getenv("ATROPOS_SERVER_BASE_URL")
-            or os.getenv("OPENAI_BASE_URL")
-            or os.getenv("LLM_BASE_URL")
-            or "http://127.0.0.1:8080"
-        )
-        model = os.getenv("ATROPOS_SERVER_MODEL") or os.getenv("LLM_MODEL") or "hermes-4-36b"
-        api_key = os.getenv("ATROPOS_SERVER_API_KEY") or os.getenv("NOUS_API_KEY") or os.getenv("OPENAI_API_KEY") or "local"
-
-        env_config = SandboxTerminalSmokeEnvConfig(
-            tokenizer_name=os.getenv("ATROPOS_TOKENIZER_NAME") or "NousResearch/Hermes-4.3-36B",
-            group_size=1,
-            use_wandb=False,
-            include_messages=True,
-            ensure_scores_are_not_same=False,
-            total_steps=2,
-            batch_size=1,
-            server_base_url=base_url,
-            server_model=model,
-            # Tooling: sandbox-only terminal.
-            enabled_toolsets=["terminal"],
-            disabled_toolsets=[],
-            # Default to Nomad sandboxing; users can override via --env.* args.
-            sandbox_image=os.getenv("ATROPOS_SANDBOX_IMAGE") or "atropos-sandbox:local",
-            purge_job_on_start=True,
-            purge_job_on_shutdown=True,
-        )
-
-        server_configs = [
-            APIServerConfig(
-                model_name=model,
-                base_url=f"{base_url.rstrip('/')}/v1",
-                api_key=api_key,
-                num_max_requests_at_once=1,
-                num_requests_for_eval=1,
-                timeout=120,
-            )
-        ]
-        return env_config, server_configs
-
-    async def setup_agent_env(self) -> None:
-        return None
-
-    async def get_next_item(self) -> Item:
-        self._iter += 1
-        return _forced_tool_item()
-
-    def build_task(self, item: Item) -> str:
-        return str(item.get("prompt") or "")
-
-    def build_agent_config(self, item: Item) -> AgentConfig:  # noqa: ARG002
-        # Avoid imposing max_tokens by default; tool-tag responses can be long for some models.
-        return AgentConfig(
-            max_steps=min(8, int(self.config.agent_max_steps)),
-            temperature=0.2,
-            max_tokens=None,
-            system_prompt=STRICT_TOOLCALL_SYSTEM_PROMPT,
-        )
-
-    async def score_trajectory(self, item: Item, final_response: str) -> float:
-        # Scoring happens in verify_and_score_trajectory so we can inspect tool results.
-        _ = (item, final_response)
-        return 0.0
-
-    async def verify_and_score_trajectory(
-        self,
-        item: Item,
-        final_response: str,
-        *,
-        trajectory_id: str,  # noqa: ARG002
-        exec_tool,  # noqa: ARG002
-        agent_result: AgentResult | None = None,
-        workspace_meta: Dict[str, Any] | None = None,  # noqa: ARG002
-    ) -> tuple[float, Dict[str, Any]]:
-        if agent_result is None:
-            return 0.0, {"error": "Missing agent_result"}
-
-        observed: str = ""
-        tool_ok = False
-        for step in agent_result.steps:
-            for res in step.tool_results:
-                if not res.success:
-                    return 0.0, {"error": res.error, "output": res.output}
-                out = (res.output or "").strip()
-                if out:
-                    observed = out.splitlines()[-1].strip()
-                    tool_ok = True
-
-        final = (final_response or "").strip()
-        score = 1.0 if tool_ok and agent_result.total_tool_calls > 0 and observed and final == observed else 0.0
-        return score, {"observed": observed, "tool_calls": agent_result.total_tool_calls, "command": item.get("command")}
-
-
-if __name__ == "__main__":
-    SandboxTerminalSmokeEnv.cli()
--- a/atropos/envs/swe_smith_oracle_env.py
+++ b/atropos/envs/swe_smith_oracle_env.py
@@ -1,418 +0,0 @@
-"""
-SWE-smith-oracle environment.
-
-This environment is intentionally minimal:
- prepares a sandbox workspace by cloning a public GitHub repo at `base_commit`
- runs an AtroposAgent tool loop to apply a fix
- verifies by running pytest nodeids from the dataset (reward = pass/fail)
- Python only (no multi-language support currently, need to properly bauild & add to dropbox)
- TODO: Get the other nonpython sandboxes up and running, then add a config knob to switch between them per row
- oh and add to dockerhub
-
-Dataset: NousResearch/SWE-smith-oracle (train; does NOT use SWE-bench eval set).
-"""
-
-from __future__ import annotations
-
-import os
-import random
-import time
-from typing import Any, Dict, List, Optional, Tuple
-
-from pydantic import Field
-
-from atroposlib.envs.base import APIServerConfig, Item
-
-from ..agent import AgentConfig
-from ..tools import ToolCall
-from .agent_env import AgentEnv, AgentEnvConfig
-
-
-class SweSmithOracleEnvConfig(AgentEnvConfig):
-    dataset_name: str = Field(default="NousResearch/SWE-smith-oracle")
-    dataset_split: str = Field(default="train")
-    max_items: int = Field(default=0, description="0 = no limit")
-    shuffle: bool = Field(default=True)
-    seed: int = Field(default=0)
-
-    python_only: bool = Field(default=True, description="Filter to Python-evaluable rows")
-    score_include_fail_to_pass: bool = Field(
-        default=True,
-        description=(
-            "If true (default), score tests on PASS_TO_PASS ∪ FAIL_TO_PASS. "
-            "Disable to only run PASS_TO_PASS (faster but weaker signal)."
-        ),
-    )
-
-    prompt_mode: str = Field(
-        default="problem_statement",
-        description="Task prompt content: 'problem_statement' (fast) or 'problem_statement+text' (slower, includes dataset 'text').",
-    )
-
-    repo_base_url: str = Field(default="https://github.com", description="Base URL for repo cloning")
-    install_timeout_s: float = Field(default=600.0)
-    test_timeout_s: float = Field(default=600.0)
-
-    tokenizer_name: str = Field(default="NousResearch/Hermes-4.3-36B", description="Tokenizer name for RL tokenization")
-
-
-class SweSmithOracleEnv(AgentEnv[SweSmithOracleEnvConfig]):
-    """
-    SWE-smith-oracle AgentEnv.
-
-    This is designed for benchmarking multiplexed slot execution vs naive container-per-trajectory.
-    """
-
-    name = "swe_smith_oracle_env"
-    env_config_cls = SweSmithOracleEnvConfig
-
-    def __init__(
-        self,
-        config: SweSmithOracleEnvConfig,
-        server_configs: List[APIServerConfig],
-        slurm: bool = False,
-        testing: bool = False,
-    ):
-        super().__init__(config, server_configs, slurm, testing)
-        self._dataset = None
-        self._indices: List[int] = []
-        self._cursor = 0
-
-    @classmethod
-    def config_init(cls) -> Tuple[SweSmithOracleEnvConfig, List[APIServerConfig]]:
-        # Defaults for running the env via CLI in offline `process` mode.
-        # Override via env vars or `--env.*` flags as needed.
-        base_url_raw = (
-            os.getenv("ATROPOS_SERVER_BASE_URL")
-            or os.getenv("OPENAI_BASE_URL")
-            or os.getenv("LLM_BASE_URL")
-            or "http://127.0.0.1:8080"
-        )
-        base_url = base_url_raw.rstrip("/")
-        if not base_url.endswith("/v1"):
-            base_url = f"{base_url}/v1"
-        model = os.getenv("ATROPOS_SERVER_MODEL") or os.getenv("LLM_MODEL") or "hermes-4-36b"
-        api_key = os.getenv("ATROPOS_SERVER_API_KEY") or os.getenv("NOUS_API_KEY") or os.getenv("OPENAI_API_KEY") or "local"
-
-        env_config = SweSmithOracleEnvConfig(
-            tokenizer_name=os.getenv("ATROPOS_TOKENIZER_NAME") or "NousResearch/Hermes-4.3-36B",
-            group_size=1,
-            use_wandb=False,
-            rollout_server_url="http://localhost:8000",
-            total_steps=1,
-            batch_size=1,
-            steps_per_eval=1,
-            max_token_length=8192,
-            inference_weight=1.0,
-            wandb_name="swe_smith_oracle",
-            enabled_toolsets=["terminal"],
-            disabled_toolsets=[],
-            sandbox_image=os.getenv("ATROPOS_SANDBOX_IMAGE") or "atropos-sandbox:local",
-            purge_job_on_start=True,
-            purge_job_on_shutdown=True,
-        )
-
-        server_configs = [
-            APIServerConfig(
-                model_name=model,
-                base_url=base_url,
-                api_key=api_key,
-                num_max_requests_at_once=1,
-                num_requests_for_eval=1,
-                timeout=int(os.getenv("ATROPOS_SERVER_TIMEOUT_S") or "300"),
-            ),
-        ]
-
-        return env_config, server_configs
-
-    async def setup_agent_env(self) -> None:
-        from datasets import load_dataset
-
-        t0 = time.perf_counter()
-        print(
-            f"[SweSmithOracleEnv] loading dataset {self.config.dataset_name}:{self.config.dataset_split} "
-            f"(python_only={self.config.python_only}, max_items={self.config.max_items or 'all'})",
-            flush=True,
-        )
-        ds = load_dataset(self.config.dataset_name, split=self.config.dataset_split)
-        self._dataset = ds
-
-        indices: List[int] = []
-        for idx in range(len(ds)):
-            row = ds[idx]
-            if self.config.python_only and not self._is_python_row(row):
-                continue
-            indices.append(idx)
-
-        if self.config.shuffle:
-            rnd = random.Random(self.config.seed)
-            rnd.shuffle(indices)
-
-        if self.config.max_items and self.config.max_items > 0:
-            indices = indices[: self.config.max_items]
-
-        self._indices = indices
-        self._cursor = 0
-
-        print(
-            f"[SweSmithOracleEnv] loaded {len(self._indices)} items from {self.config.dataset_name}:{self.config.dataset_split} "
-            f"in {time.perf_counter() - t0:.2f}s",
-            flush=True,
-        )
-
-    def _is_python_row(self, row: Dict[str, Any]) -> bool:
-        nodeids = row.get("PASS_TO_PASS")
-        if not isinstance(nodeids, list) or not nodeids:
-            return False
-        for nid in nodeids:
-            if not isinstance(nid, str) or ".py::" not in nid:
-                return False
-        return True
-
-    async def get_next_item(self) -> Item:
-        print(f"[SweSmithOracleEnv] get_next_item() cursor={self._cursor}/{len(self._indices)}", flush=True)
-        if not self._dataset or not self._indices:
-            raise RuntimeError("Dataset not initialized (did setup() run?)")
-        if self._cursor >= len(self._indices):
-            self._cursor = 0
-        idx = self._indices[self._cursor]
-        self._cursor += 1
-        return dict(self._dataset[idx])
-
-    def _repo_name(self, item: Item) -> str:
-        repo = item.get("repo") or ""
-        if isinstance(repo, str) and "/" in repo:
-            return repo.split("/")[-1]
-        return "repo"
-
-    def build_task(self, item: Item) -> str:
-        repo = item.get("repo") or ""
-        base_commit = item.get("base_commit") or ""
-        problem = str(item.get("problem_statement") or "")
-        context = str(item.get("text") or "")
-
-        nodeids = self._tests_for_item(item)
-        tests_list = "\n".join(f"- {t}" for t in nodeids)
-
-        repo_dir = self._repo_name(item)
-
-        tests_block = (
-            "Run these tests to verify:\n"
-            f"{tests_list}\n\n"
-            "When done, briefly describe what you changed and confirm tests pass."
-        )
-
-        prompt_mode = (self.config.prompt_mode or "problem_statement").strip().lower()
-        if prompt_mode not in {"problem_statement", "problem_statement+text"}:
-            raise ValueError(
-                f"Invalid prompt_mode={self.config.prompt_mode!r}. "
-                "Expected 'problem_statement' or 'problem_statement+text'."
-            )
-
-        context_block = ""
-        if prompt_mode == "problem_statement+text" and context:
-            # Note: We intentionally do NOT truncate/cap here. This mode is for debugging / richer prompts and can be slow.
-            context_block = f"\nAdditional context:\n{context}\n"
-
-        return (
-            "You are a senior software engineer. Fix the repository so the specified tests pass.\n\n"
-            f"Repository: {repo} (checked out at base_commit={base_commit})\n"
-            f"Workspace path: ./{repo_dir}\n\n"
-            "Constraints:\n"
-            "- You MUST use the terminal tool to inspect, edit, and verify the repository. Do not respond with a patch file.\n"
-            f"- Start by inspecting the repo (e.g. `ls`, `cd ./{repo_dir}`, `git status`).\n"
-            "- Use a workspace-local virtualenv (e.g. inside the repo at ./.venv) to avoid cross-run contamination.\n"
-            "- Use non-interactive commands only.\n\n"
-            "- Terminal commands run under POSIX /bin/sh and each tool call runs in a fresh shell (no persisted env vars).\n"
-            "  Avoid bash-only `source`; prefer `. .venv/bin/activate` or `.venv/bin/python ...`.\n\n"
-            "Problem statement:\n"
-            f"{problem}\n\n"
-            f"{context_block}\n"
-            f"{tests_block}"
-        )
-
-    def build_agent_config(self, item: Item) -> AgentConfig:  # noqa: ARG002
-        # SWE tasks are longer than the simple test env.
-        return AgentConfig(
-            max_steps=self.config.agent_max_steps,
-            temperature=self.config.agent_temperature,
-            max_tokens=self.config.agent_max_tokens,
-            tool_delay_s=self.config.agent_tool_delay_s,
-        )
-
-    async def setup_trajectory_workspace(self, item: Item, *, trajectory_id: str, exec_tool) -> Dict[str, Any]:
-        t0 = time.perf_counter()
-        repo = item.get("repo")
-        base_commit = item.get("base_commit")
-        instance_id = item.get("instance_id") or item.get("id") or item.get("problem_id")
-        if not isinstance(repo, str) or not isinstance(base_commit, str):
-            raise RuntimeError("Invalid dataset row: missing repo/base_commit")
-
-        repo_dir = self._repo_name(item)
-        clone_url = f"{self.config.repo_base_url.rstrip('/')}/{repo}.git"
-        print(
-            f"[SweSmithOracleEnv] tid={trajectory_id} setup_trajectory_workspace(): "
-            f"repo={repo} base_commit={base_commit} instance_id={instance_id} dir=./{repo_dir}",
-            flush=True,
-        )
-
-        # Repo setup strategy:
-        # - Maintain a shared, per-container bare repo cache under /data/repo_cache
-        # - For each trajectory, create an isolated git worktree under the slot workspace
-        # This avoids cloning/fetching full repos per trajectory and is crucial for multiplexing.
-
-        def _repo_cache_slug(repo_name: str) -> str:
-            return repo_name.replace("/", "__")
-
-        repo_slug = _repo_cache_slug(repo)
-        cache_root = "/data/repo_cache"
-        bare_repo = f"{cache_root}/{repo_slug}.git"
-        lock_file = f"{cache_root}/.locks/{repo_slug}.lock"
-
-        # Use flock to serialize operations that mutate the shared bare repo (fetch/worktree).
-        # util-linux (flock) is included in the sandbox image.
-        worktree_cmd = (
-            "set -e; "
-            f"rm -rf {repo_dir}; "
-            f"mkdir -p {cache_root}/.locks; "
-            f": > {lock_file}; "
-            f"flock -x {lock_file} sh -lc '"
-            f"set -e; "
-            "export GIT_TERMINAL_PROMPT=0; "
-            "export GIT_LFS_SKIP_SMUDGE=1; "
-            f"if [ ! -d \"{bare_repo}\" ]; then "
-            f"  git init --bare \"{bare_repo}\"; "
-            f"  git -C \"{bare_repo}\" remote add origin \"{clone_url}\"; "
-            "fi; "
-            f"git -C \"{bare_repo}\" remote set-url origin \"{clone_url}\"; "
-            f"git -C \"{bare_repo}\" worktree prune || true; "
-            f"if ! git -C \"{bare_repo}\" cat-file -e \"{base_commit}^{{commit}}\" 2>/dev/null; then "
-            f"  git -C \"{bare_repo}\" fetch --depth 1 origin \"{base_commit}\" || true; "
-            "fi; "
-            f"if ! git -C \"{bare_repo}\" cat-file -e \"{base_commit}^{{commit}}\" 2>/dev/null; then "
-            f"  git -C \"{bare_repo}\" fetch --prune origin; "
-            "fi; "
-            f"git --git-dir=\"{bare_repo}\" worktree add --detach \"{repo_dir}\" \"{base_commit}\"; "
-            "'"
-        )
-
-        print(f"[SweSmithOracleEnv] tid={trajectory_id} preparing worktree from repo cache", flush=True)
-        res = await exec_tool(
-            ToolCall(
-                name="terminal",
-                arguments={"command": worktree_cmd, "timeout": self.config.install_timeout_s},
-            )
-        )
-        if not res.success:
-            raise RuntimeError(
-                "git worktree setup failed "
-                f"(repo={repo}, base_commit={base_commit}, instance_id={instance_id}): {res.error}\n{res.output}"
-            )
-
-        print(
-            f"[SweSmithOracleEnv] tid={trajectory_id} setup_trajectory_workspace(): worktree ready in {time.perf_counter() - t0:.2f}s",
-            flush=True,
-        )
-        return {"repo_dir": repo_dir, "base_commit": base_commit}
-
-    def _tests_for_item(self, item: Item) -> List[str]:
-        tests: List[str] = []
-        if self.config.score_include_fail_to_pass:
-            for key in ("PASS_TO_PASS", "FAIL_TO_PASS"):
-                nodeids = item.get(key)
-                if isinstance(nodeids, list):
-                    tests.extend([n for n in nodeids if isinstance(n, str)])
-        else:
-            nodeids = item.get("PASS_TO_PASS")
-            if isinstance(nodeids, list):
-                tests.extend([n for n in nodeids if isinstance(n, str)])
-        # Stable order for reproducibility.
-        return sorted(dict.fromkeys(tests))
-
-    def _chunk_nodeids(self, nodeids: List[str], max_per_chunk: int = 50) -> List[List[str]]:
-        chunks: List[List[str]] = []
-        for i in range(0, len(nodeids), max_per_chunk):
-            chunks.append(nodeids[i : i + max_per_chunk])
-        return chunks
-
-    async def verify_and_score_trajectory(
-        self,
-        item: Item,
-        final_response: str,  # noqa: ARG002
-        *,
-        trajectory_id: str,
-        exec_tool,
-        agent_result=None,
-        workspace_meta: Optional[Dict[str, Any]] = None,
-    ) -> tuple[float, Dict[str, Any]]:
-        _ = trajectory_id
-        repo_dir = self._repo_name(item)
-
-        # Training correctness: do not reward trajectories that never actually used tools.
-        if agent_result is not None and getattr(agent_result, "total_tool_calls", 0) <= 0:
-            print(
-                f"[SweSmithOracleEnv] tid={trajectory_id} verify (dataset_tests): no tool calls; score=0.0",
-                flush=True,
-            )
-            return 0.0, {
-                "verification_mode": "dataset_tests",
-                "error": "No tool calls were made by the agent",
-            }
-
-        nodeids = self._tests_for_item(item)
-        if not nodeids:
-            return 0.0, {"error": "No tests provided"}
-
-        print(f"[SweSmithOracleEnv] tid={trajectory_id} verify (dataset_tests): ensuring venv + deps", flush=True)
-        setup_cmd = (
-            f"cd {repo_dir} && "
-            "python -m venv .venv && "
-            ". .venv/bin/activate && "
-            "python -m pip install -U pip setuptools wheel && "
-            "python -m pip install -e . && "
-            "python -m pip install pytest"
-        )
-        setup_res = await exec_tool(
-            ToolCall(name="terminal", arguments={"command": setup_cmd, "timeout": self.config.install_timeout_s})
-        )
-        verification_messages = [{"role": "user", "content": setup_res.to_xml()}]
-        if not setup_res.success:
-            return 0.0, {
-                "verification_mode": "dataset_tests",
-                "phase": "install",
-                "error": setup_res.error,
-                "output": setup_res.output,
-                "verification_messages": verification_messages,
-            }
-
-        chunks = self._chunk_nodeids(nodeids, max_per_chunk=50)
-        for chunk_idx, chunk in enumerate(chunks):
-            joined = " ".join(chunk)
-            cmd = f"cd {repo_dir} && . .venv/bin/activate && python -m pytest -q {joined}"
-            res = await exec_tool(
-                ToolCall(
-                    name="terminal",
-                    arguments={"command": cmd, "timeout": self.config.test_timeout_s},
-                )
-            )
-            verification_messages.append({"role": "user", "content": res.to_xml()})
-            if not res.success:
-                return 0.0, {
-                    "verification_mode": "dataset_tests",
-                    "phase": "pytest",
-                    "failed_chunk": chunk_idx,
-                    "error": res.error,
-                    "output": res.output,
-                    "verification_messages": verification_messages,
-                }
-
-        return 1.0, {"verification_mode": "dataset_tests", "passed": True, "verification_messages": verification_messages}
-
-    async def score_trajectory(self, item: Item, final_response: str) -> float:
-        # Not used; scoring happens in verify_and_score_trajectory.
-        _ = (item, final_response)
-        return 0.0
-
-
-if __name__ == "__main__":
-    SweSmithOracleEnv.cli()
--- a/atropos/envs/test_env.py
+++ b/atropos/envs/test_env.py
@@ -1,217 +0,0 @@
-"""
-Simple test environment for validating the atropos-agent setup.
-
-This environment uses a local OpenAI-compatible server for LLM testing to verify:
- BaseEnv extension works correctly
- API communication via OpenAI-compatible endpoint
- Basic trajectory collection
-
-This is a minimal environment for testing, not production use.
-"""
-
-import os
-from typing import Dict, List, Optional, Tuple
-
-from dotenv import load_dotenv
-from pydantic import Field
-
-from atroposlib.envs.base import (
-    APIServerConfig,
-    Item,
-)
-
-from ..agent import AgentConfig
-from .agent_env import AgentEnv, AgentEnvConfig
-
-# Load environment variables from .env file
-load_dotenv()
-
-
-# Simple test prompts for validation
-TEST_PROMPTS = [
-    {
-        "prompt": "What is 2 + 2? Answer with just the number.",
-        "expected": "4",
-    },
-    {
-        "prompt": "What is the capital of France? Answer with just the city name.",
-        "expected": "Paris",
-    },
-    {
-        "prompt": "What color is the sky on a clear day? Answer with just the color.",
-        "expected": "Blue",
-    },
-    {
-        "prompt": "How many days are in a week? Answer with just the number.",
-        "expected": "7",
-    },
-    {
-        "prompt": "What is 10 * 5? Answer with just the number.",
-        "expected": "50",
-    },
-]
-
-SYSTEM_PROMPT = (
-    "You are a helpful assistant. Answer questions concisely and directly. "
-    "When asked for a simple answer, provide just that answer without explanation."
-)
-
-
-class SimpleTestEnvConfig(AgentEnvConfig):
-    """Configuration for the simple test environment."""
-
-    server_base_url: str = Field(
-        default="http://127.0.0.1:8080",
-        description="Base URL for an OpenAI-compatible server (without /v1)",
-    )
-    server_model: str = Field(
-        default="hermes-4-36b",
-        description="Model name",
-    )
-    tokenizer_name: str = Field(default="NousResearch/Hermes-4.3-36B", description="Tokenizer name for RL tokenization")
-
-
-class SimpleTestEnv(AgentEnv[SimpleTestEnvConfig]):
-    """
-    A simple test environment to validate the atropos-agent setup.
-    
-    Uses a local OpenAI-compatible LLM endpoint with basic question-answering tasks.
-    Scoring is based on whether the response contains the expected answer.
-    """
-
-    name = "simple_test_env"
-    env_config_cls = SimpleTestEnvConfig
-
-    def __init__(
-        self,
-        config: SimpleTestEnvConfig,
-        server_configs: List[APIServerConfig],
-        slurm: bool = False,
-        testing: bool = False,
-    ):
-        super().__init__(config, server_configs, slurm, testing)
-        self.iter = 0
-        self.test_prompts = TEST_PROMPTS
-        self.percent_correct_buffer: List[float] = []
-
-    @classmethod
-    def config_init(cls) -> Tuple[SimpleTestEnvConfig, List[APIServerConfig]]:
-        """
-        Initialize configuration with local server settings from environment variables.
-        """
-        base_url = (
-            os.getenv("ATROPOS_SERVER_BASE_URL")
-            or os.getenv("OPENAI_BASE_URL")
-            or os.getenv("LLM_BASE_URL")
-            or "http://127.0.0.1:8080"
-        )
-        model = os.getenv("ATROPOS_SERVER_MODEL") or os.getenv("LLM_MODEL") or "hermes-4-36b"
-        api_key = os.getenv("ATROPOS_SERVER_API_KEY") or os.getenv("NOUS_API_KEY") or os.getenv("OPENAI_API_KEY") or "local"
-
-        env_config = SimpleTestEnvConfig(
-            tokenizer_name=os.getenv("ATROPOS_TOKENIZER_NAME") or "NousResearch/Hermes-4.3-36B",
-            group_size=4,
-            use_wandb=False,  # Disable wandb for simple testing
-            rollout_server_url="http://localhost:8000",
-            total_steps=10,
-            batch_size=16,
-            steps_per_eval=5,
-            max_token_length=2048,
-            inference_weight=1.0,
-            wandb_name="simple_test",
-            server_base_url=base_url,
-            server_model=model,
-        )
-
-        # OpenAI-compatible servers typically expose chat completions at /v1.
-        server_configs = [
-            APIServerConfig(
-                model_name=model,
-                base_url=f"{base_url}/v1",
-                api_key=api_key,
-                num_max_requests_at_once=4,
-                num_requests_for_eval=8,
-                timeout=120,  # Local models may be slower
-            ),
-        ]
-
-        return env_config, server_configs
-
-    async def setup_agent_env(self):
-        """Setup the environment - load test data."""
-        print(f"SimpleTestEnv setup complete. {len(self.test_prompts)} test prompts loaded.")
-        print(f"Using server at: {self.config.server_base_url}")
-        print(f"Model: {self.config.server_model}")
-
-    async def get_next_item(self) -> Item:
-        """Get the next test prompt."""
-        item = self.test_prompts[self.iter % len(self.test_prompts)]
-        self.iter += 1
-        return item
-
-    def build_task(self, item: Item) -> str:
-        return item["prompt"]
-
-    def build_agent_config(self, item: Item) -> AgentConfig:  # noqa: ARG002
-        return AgentConfig(
-            max_steps=5,
-            temperature=0.7,
-            max_tokens=256,
-            system_prompt=SYSTEM_PROMPT,
-        )
-
-    async def score_trajectory(self, item: Item, final_response: str) -> float:
-        expected = item["expected"].lower()
-        response_lower = (final_response or "").lower()
-        score = 1.0 if expected in response_lower else 0.0
-        self.percent_correct_buffer.append(score)
-        return score
-
-    async def evaluate(self, *args, **kwargs):
-        """
-        Simple evaluation - run through all test prompts once.
-        """
-        correct = 0
-        total = len(self.test_prompts)
-
-        for item in self.test_prompts:
-            messages = [
-                {"role": "system", "content": SYSTEM_PROMPT},
-                {"role": "user", "content": item["prompt"]},
-            ]
-
-            response = await self.server.chat_completion(
-                messages=messages,
-                n=1,
-                max_tokens=256,
-                temperature=0.0,  # Greedy for eval
-                split="eval",
-            )
-
-            response_text = response.choices[0].message.content or ""
-            expected = item["expected"].lower()
-
-            if expected in response_text.lower():
-                correct += 1
-
-        accuracy = correct / total
-        print(f"Evaluation: {correct}/{total} = {accuracy:.2%} accuracy")
-        return {"eval_accuracy": accuracy}
-
-    async def wandb_log(self, wandb_metrics: Optional[Dict] = None):
-        """Log metrics (simplified for testing)."""
-        if wandb_metrics is None:
-            wandb_metrics = {}
-
-        if self.percent_correct_buffer:
-            avg_correct = sum(self.percent_correct_buffer) / len(self.percent_correct_buffer)
-            wandb_metrics["train/percent_correct"] = avg_correct
-            print(f"Train accuracy: {avg_correct:.2%}")
-            self.percent_correct_buffer = []
-
-        await super().wandb_log(wandb_metrics)
-
-
-if __name__ == "__main__":
-    # Allow running as CLI
-    SimpleTestEnv.cli()
--- a/atropos/envs/toolserver_smoke_env.py
+++ b/atropos/envs/toolserver_smoke_env.py
@@ -1,165 +0,0 @@
-"""
-ToolServer routing smoke environment.
-
-Validates that:
-  - sandbox tools run through Nomad SlotPool (terminal -> bash in sandbox)
-  - external tools run through ToolServer (skills_list)
-
-This env uses ToolServer in-process by default (`tool_server_url="inprocess"`),
-so it is self-contained for local testing.
-
-Run:
-  uv run python -m atropos.envs.toolserver_smoke_env process --env.use_wandb false --env.total_steps 1 --env.group_size 1
-"""
-
-from __future__ import annotations
-
-import os
-from typing import Any, Dict, List, Tuple
-
-from dotenv import load_dotenv
-from pydantic import Field
-
-from atroposlib.envs.base import APIServerConfig, Item
-
-from ..agent import AgentConfig, AgentResult
-from .agent_env import AgentEnv, AgentEnvConfig
-
-load_dotenv()
-
-
-class ToolServerSmokeEnvConfig(AgentEnvConfig):
-    server_base_url: str = Field(
-        default="http://127.0.0.1:8080",
-        description="Base URL for an OpenAI-compatible chat server (without /v1).",
-    )
-    server_model: str = Field(default="hermes-4-36b", description="Model name")
-    tokenizer_name: str = Field(default="NousResearch/Hermes-4.3-36B", description="Tokenizer name for RL tokenization")
-
-
-class ToolServerSmokeEnv(AgentEnv[ToolServerSmokeEnvConfig]):
-    name = "toolserver_smoke_env"
-    env_config_cls = ToolServerSmokeEnvConfig
-
-    def __init__(
-        self,
-        config: ToolServerSmokeEnvConfig,
-        server_configs: List[APIServerConfig],
-        slurm: bool = False,
-        testing: bool = False,
-    ):
-        super().__init__(config, server_configs, slurm, testing)
-        self._iter = 0
-
-    @classmethod
-    def config_init(cls) -> Tuple[ToolServerSmokeEnvConfig, List[APIServerConfig]]:
-        base_url = (
-            os.getenv("ATROPOS_SERVER_BASE_URL")
-            or os.getenv("OPENAI_BASE_URL")
-            or os.getenv("LLM_BASE_URL")
-            or "http://127.0.0.1:8080"
-        )
-        model = os.getenv("ATROPOS_SERVER_MODEL") or os.getenv("LLM_MODEL") or "hermes-4-36b"
-        api_key = os.getenv("ATROPOS_SERVER_API_KEY") or os.getenv("NOUS_API_KEY") or os.getenv("OPENAI_API_KEY") or "local"
-
-        env_config = ToolServerSmokeEnvConfig(
-            tokenizer_name=os.getenv("ATROPOS_TOKENIZER_NAME") or "NousResearch/Hermes-4.3-36B",
-            group_size=1,
-            use_wandb=False,
-            include_messages=True,
-            ensure_scores_are_not_same=False,
-            total_steps=1,
-            batch_size=1,
-            server_base_url=base_url,
-            server_model=model,
-            enabled_toolsets=["terminal", "skills"],
-            disabled_toolsets=[],
-            # Self-contained ToolServer for local smoke.
-            tool_server_url="inprocess",
-            sandbox_image=os.getenv("ATROPOS_SANDBOX_IMAGE") or "atropos-sandbox:local",
-            purge_job_on_start=True,
-            purge_job_on_shutdown=True,
-        )
-
-        server_configs = [
-            APIServerConfig(
-                model_name=model,
-                base_url=f"{base_url.rstrip('/')}/v1",
-                api_key=api_key,
-                num_max_requests_at_once=1,
-                num_requests_for_eval=1,
-                timeout=120,
-            )
-        ]
-        return env_config, server_configs
-
-    async def setup_agent_env(self) -> None:
-        return None
-
-    async def get_next_item(self) -> Item:
-        self._iter += 1
-        return {
-            "prompt": (
-                "You MUST call exactly one tool per assistant message.\n"
-                "\n"
-                "Step 1) Call the skills_list tool (no arguments), then stop.\n"
-                "Step 2) After you receive the tool response, call the terminal tool to run:\n"
-                "python -c \"print('ok')\"\n"
-                "Step 3) After you receive the terminal tool response, answer with just: ok\n"
-                "\n"
-                "Tool call format requirements:\n"
-                "- Every tool call MUST be a complete XML block with a closing tag.\n"
-                "- Do NOT emit a second <tool_call> in the same assistant message.\n"
-                "\n"
-                "Example:\n"
-                "<tool_call>{\"name\": \"skills_list\", \"arguments\": {}}</tool_call>\n"
-                "Do not include anything else in your final answer."
-            )
-        }
-
-    def build_task(self, item: Item) -> str:
-        return str(item.get("prompt") or "")
-
-    def build_agent_config(self, item: Item) -> AgentConfig:  # noqa: ARG002
-        return AgentConfig(
-            max_steps=min(10, int(self.config.agent_max_steps)),
-            temperature=0.2,
-            max_tokens=None,
-        )
-
-    async def score_trajectory(self, item: Item, final_response: str) -> float:
-        _ = (item, final_response)
-        return 0.0
-
-    async def verify_and_score_trajectory(
-        self,
-        item: Item,
-        final_response: str,
-        *,
-        trajectory_id: str,  # noqa: ARG002
-        exec_tool,  # noqa: ARG002
-        agent_result: AgentResult | None = None,
-        workspace_meta: Dict[str, Any] | None = None,  # noqa: ARG002
-    ) -> tuple[float, Dict[str, Any]]:
-        if agent_result is None:
-            return 0.0, {"error": "Missing agent_result"}
-
-        called = {c.name for s in agent_result.steps for c in s.tool_calls}
-        need = {"skills_list", "terminal"}
-        if not need.issubset(called):
-            return 0.0, {"error": f"Missing tool calls: {sorted(need - called)}", "called": sorted(called)}
-
-        terminal_ok = False
-        for step in agent_result.steps:
-            for call, res in zip(step.tool_calls, step.tool_results):
-                if call.name != "terminal":
-                    continue
-                if res.success and (res.output or "").strip().splitlines()[-1].strip() == "ok":
-                    terminal_ok = True
-
-        score = 1.0 if terminal_ok and (final_response or "").strip() == "ok" else 0.0
-        return score, {"called": sorted(called), "final": (final_response or "").strip()}
-
-
-if __name__ == "__main__":
-    ToolServerSmokeEnv.cli()
--- a/atropos/nomad/init.py
+++ b/atropos/nomad/init.py
@@ -1,11 +0,0 @@
-"""
-Nomad integration for atropos-agent.
-
-Provides:
- NomadClient: Client for Nomad HTTP API
- Job templates for sandbox containers
-"""
-
-from .client import NomadClient
-
-__all__ = ["NomadClient"]
--- a/atropos/nomad/client.py
+++ b/atropos/nomad/client.py
@@ -1,500 +0,0 @@
-"""
-Nomad API Client for atropos-agent.
-
-Provides a simple async client for interacting with the Nomad HTTP API:
- Submit/stop jobs
- Query allocations
- Get allocation addresses
- Scale jobs up/down
-"""
-
-import asyncio
-import json
-import os
-from dataclasses import dataclass, field
-from enum import Enum
-from pathlib import Path
-from typing import Any, Dict, List, Optional
-
-import aiohttp
-
-
-class AllocationStatus(Enum):
-    """Nomad allocation status."""
-    PENDING = "pending"
-    RUNNING = "running"
-    COMPLETE = "complete"
-    FAILED = "failed"
-    LOST = "lost"
-
-
-@dataclass
-class Allocation:
-    """Information about a Nomad allocation."""
-    id: str
-    job_id: str
-    task_group: str
-    node_id: str
-    status: AllocationStatus
-    # Network info for reaching the allocation
-    address: Optional[str] = None
-    port: Optional[int] = None
-    
-    @property
-    def http_address(self) -> Optional[str]:
-        """Get full HTTP address for the allocation."""
-        if self.address and self.port:
-            return f"http://{self.address}:{self.port}"
-        return None
-
-
-@dataclass
-class JobStatus:
-    """Status of a Nomad job."""
-    id: str
-    name: str
-    status: str
-    allocations: List[Allocation] = field(default_factory=list)
-    count: int = 0  # Number of task groups
-
-
-class NomadClient:
-    """
-    Async client for Nomad HTTP API.
-    
-    Usage:
-        client = NomadClient(address="http://localhost:4646")
-        
-        # Submit a job
-        await client.submit_job(job_spec)
-        
-        # Get allocations
-        allocs = await client.get_job_allocations("sandbox-python")
-        
-        # Scale job
-        await client.scale_job("sandbox-python", count=5)
-    """
-    
-    def __init__(
-        self,
-        address: str = "http://localhost:4646",
-        token: Optional[str] = None,
-        timeout: float = 30.0,
-    ):
-        self.address = address.rstrip("/")
-        self.token = token or os.environ.get("NOMAD_TOKEN")
-        self.timeout = aiohttp.ClientTimeout(total=timeout)
-        self._session: Optional[aiohttp.ClientSession] = None
-    
-    async def _get_session(self) -> aiohttp.ClientSession:
-        """Get or create HTTP session."""
-        if self._session is None or self._session.closed:
-            headers = {}
-            if self.token:
-                headers["X-Nomad-Token"] = self.token
-            self._session = aiohttp.ClientSession(
-                timeout=self.timeout,
-                headers=headers,
-            )
-        return self._session
-    
-    async def close(self):
-        """Close the HTTP session."""
-        if self._session and not self._session.closed:
-            await self._session.close()
-    
-    async def __aenter__(self):
-        return self
-    
-    async def __aexit__(self, exc_type, exc_val, exc_tb):
-        await self.close()
-    
-    async def _request(
-        self,
-        method: str,
-        path: str,
-        data: Optional[Dict[str, Any]] = None,
-    ) -> Dict[str, Any]:
-        """Make an HTTP request to Nomad API."""
-        session = await self._get_session()
-        url = f"{self.address}{path}"
-        
-        try:
-            async with session.request(method, url, json=data) as response:
-                if response.status == 404:
-                    return {"error": "not_found", "status": 404}
-                
-                text = await response.text()
-                if not text:
-                    return {"status": response.status}
-                
-                try:
-                    result = json.loads(text)
-                except json.JSONDecodeError:
-                    return {"text": text, "status": response.status}
-                
-                if response.status >= 400:
-                    return {"error": result, "status": response.status}
-                
-                return result if isinstance(result, dict) else {"data": result, "status": response.status}
-                
-        except aiohttp.ClientError as e:
-            return {"error": str(e), "status": 0}
-    
-    # Job Operations
-    
-    async def submit_job(self, job_spec: Dict[str, Any]) -> Dict[str, Any]:
-        """
-        Submit a job to Nomad.
-        
-        Args:
-            job_spec: Job specification dict (HCL converted to JSON)
-            
-        Returns:
-            Response with EvalID if successful
-        """
-        return await self._request("POST", "/v1/jobs", {"Job": job_spec})
-    
-    async def stop_job(self, job_id: str, purge: bool = False) -> Dict[str, Any]:
-        """
-        Stop (and optionally purge) a job.
-        
-        Args:
-            job_id: Job identifier
-            purge: If True, completely remove the job
-        """
-        path = f"/v1/job/{job_id}"
-        if purge:
-            path += "?purge=true"
-        return await self._request("DELETE", path)
-    
-    async def get_job(self, job_id: str) -> Optional[Dict[str, Any]]:
-        """Get job details."""
-        result = await self._request("GET", f"/v1/job/{job_id}")
-        if "error" in result and result.get("status") == 404:
-            return None
-        return result
-    
-    async def get_job_status(self, job_id: str) -> Optional[JobStatus]:
-        """Get job status with allocations."""
-        job = await self.get_job(job_id)
-        if not job:
-            return None
-        
-        allocs = await self.get_job_allocations(job_id)
-        
-        # Get count from task groups
-        count = 0
-        task_groups = job.get("TaskGroups", [])
-        for tg in task_groups:
-            count += tg.get("Count", 1)
-        
-        return JobStatus(
-            id=job_id,
-            name=job.get("Name", job_id),
-            status=job.get("Status", "unknown"),
-            allocations=allocs,
-            count=count,
-        )
-    
-    # Allocation Operations
-    
-    async def get_job_allocations(self, job_id: str) -> List[Allocation]:
-        """Get all allocations for a job."""
-        result = await self._request("GET", f"/v1/job/{job_id}/allocations")
-        
-        if "error" in result:
-            return []
-        
-        allocs_data = result.get("data", result) if isinstance(result, dict) else result
-        if not isinstance(allocs_data, list):
-            return []
-        
-        allocations = []
-        for alloc_data in allocs_data:
-            # Parse allocation info
-            alloc_id = alloc_data.get("ID", "")
-            status_str = alloc_data.get("ClientStatus", "unknown")
-            
-            try:
-                status = AllocationStatus(status_str)
-            except ValueError:
-                status = AllocationStatus.PENDING
-            
-            # Get network info - need to fetch detailed allocation for this
-            address = None
-            port = None
-            
-            # First try the summary data
-            resources = alloc_data.get("AllocatedResources") or {}
-            shared = resources.get("Shared") or {}
-            networks = shared.get("Networks") or []
-            
-            # If no networks in summary, fetch detailed allocation
-            if not networks and alloc_id:
-                detailed = await self.get_allocation(alloc_id)
-                if detailed:
-                    resources = detailed.get("AllocatedResources") or {}
-                    shared = resources.get("Shared") or {}
-                    networks = shared.get("Networks") or []
-            
-            if networks:
-                network = networks[0]
-                address = network.get("IP")
-                # Look for dynamic ports OR reserved ports (Singularity/raw_exec uses reserved)
-                dyn_ports = network.get("DynamicPorts") or []
-                reserved_ports = network.get("ReservedPorts") or []
-                for dp in dyn_ports + reserved_ports:
-                    if dp.get("Label") == "http":
-                        port = dp.get("Value")
-                        break
-            
-            allocations.append(Allocation(
-                id=alloc_id,
-                job_id=job_id,
-                task_group=alloc_data.get("TaskGroup", ""),
-                node_id=alloc_data.get("NodeID", ""),
-                status=status,
-                address=address,
-                port=port,
-            ))
-        
-        return allocations
-    
-    async def get_allocation(self, alloc_id: str) -> Optional[Dict[str, Any]]:
-        """Get detailed allocation info."""
-        result = await self._request("GET", f"/v1/allocation/{alloc_id}")
-        if "error" in result and result.get("status") == 404:
-            return None
-        return result
-    
-    # Scaling Operations
-    
-    async def scale_job(self, job_id: str, count: int, task_group: str = "sandbox") -> Dict[str, Any]:
-        """
-        Scale a job's task group to specified count.
-        
-        Args:
-            job_id: Job identifier
-            count: Desired number of allocations
-            task_group: Name of task group to scale
-        """
-        payload = {
-            "Count": count,
-            "Target": {
-                "Group": task_group,
-            },
-        }
-        return await self._request("POST", f"/v1/job/{job_id}/scale", payload)
-    
-    async def get_job_scale_status(self, job_id: str) -> Dict[str, int]:
-        """
-        Get current scale status for a job.
-        
-        Returns:
-            Dict mapping task group name to count
-        """
-        result = await self._request("GET", f"/v1/job/{job_id}/scale")
-        
-        if "error" in result:
-            return {}
-        
-        task_groups = result.get("TaskGroups", {})
-        return {
-            name: info.get("Running", 0)
-            for name, info in task_groups.items()
-        }
-    
-    # Health Check
-    
-    async def is_healthy(self) -> bool:
-        """Check if Nomad is reachable and healthy."""
-        try:
-            result = await self._request("GET", "/v1/status/leader")
-            return "error" not in result
-        except Exception:
-            return False
-    
-    async def get_leader(self) -> Optional[str]:
-        """Get current Nomad leader address."""
-        result = await self._request("GET", "/v1/status/leader")
-        if isinstance(result, dict) and "data" in result:
-            return result["data"]
-        return None
-
-
-def load_job_template(
-    template_name: str = "sandbox",
-    **kwargs,
-) -> Dict[str, Any]:
-    """
-    Load and configure a job template.
-    
-    Args:
-        template_name: Name of template (e.g., "sandbox")
-        **kwargs: Template variables to substitute
-        
-    Returns:
-        Job specification dict ready for Nomad API
-    """
-    # Default job template for sandbox container
-    if template_name == "sandbox":
-        return create_sandbox_job(**kwargs)
-    else:
-        raise ValueError(f"Unknown template: {template_name}")
-
-
-def create_sandbox_job(
-    job_id: str = "atropos-sandbox",
-    image: str = "atropos-sandbox:local",  # Use :local tag to avoid registry pull
-    count: int = 1,
-    slots_per_container: int = 10,
-    privileged: bool = False,
-    cpu: int = 500,
-    memory: int = 512,
-    port: int = 8080,
-    datacenter: str = "dc1",
-    driver: str = "docker",  # "docker" or "singularity"
-    singularity_image: str = None,  # Path to .sif file for singularity driver
-) -> Dict[str, Any]:
-    """
-    Create a sandbox job specification.
-    
-    This job runs the sandbox_server.py inside a container,
-    with the specified number of slots for agent workspaces.
-    
-    Args:
-        job_id: Unique job identifier
-        image: Docker image to use (for docker driver)
-        count: Number of container instances
-        slots_per_container: Number of slots per container
-        privileged: Run container in privileged mode (recommended for bubblewrap)
-        cpu: CPU allocation in MHz
-        memory: Memory allocation in MB
-        port: HTTP port for sandbox server
-        datacenter: Nomad datacenter
-        driver: Container driver - "docker" or "singularity"
-        singularity_image: Path to .sif file (required if driver="singularity")
-        
-    Returns:
-        Job specification dict
-    """
-    # Build task config based on driver
-    if driver == "singularity":
-        if not singularity_image:
-            raise ValueError("singularity_image path required when driver='singularity'")
-        
-        # Use raw_exec driver to run apptainer via shell for variable expansion
-        # The container binds the allocation directory for workspace persistence
-        # For raw_exec, we use static port since Nomad's dynamic port mapping doesn't
-        # work the same as Docker - the process runs directly on the host.
-        shell_cmd = (
-            f'apptainer run '
-            f'--bind "$NOMAD_ALLOC_DIR/data:/data" '
-            f'--pwd /app '
-            f'--env PYTHONUNBUFFERED=1 '
-            f'{singularity_image} '
-            f'python sandbox_server.py '
-            f'--port {port} '
-            f'--slots {slots_per_container} '
-            f'--data-dir /data'
-        )
-        task_config = {
-            "command": "/bin/sh",
-            "args": ["-c", shell_cmd],
-        }
-        task_driver = "raw_exec"
-    else:
-        # Docker driver (default)
-        task_config = {
-            "image": image,
-            "force_pull": False,  # Use local image, don't try to pull
-            "ports": ["http"],
-            "privileged": privileged,
-            "command": "python",
-            "args": [
-                "sandbox_server.py",
-                "--port", str(port),
-                "--slots", str(slots_per_container),
-                "--data-dir", "/data",
-            ],
-            # Note: On Linux, you can mount persistent storage:
-            # "volumes": ["${NOMAD_ALLOC_DIR}/data:/data"],
-            # On macOS/Docker Desktop, skip volumes for PoC
-            # (container /data is ephemeral but works for testing)
-        }
-        task_driver = "docker"
-    
-    # For Singularity/raw_exec, use static ports since the process runs directly on host.
-    # For Docker, use dynamic ports with port mapping.
-    if driver == "singularity":
-        network_config = {
-            "Mode": "host",
-            "ReservedPorts": [
-                {
-                    "Label": "http",
-                    "Value": port,
-                }
-            ],
-        }
-    else:
-        network_config = {
-            "Mode": "host",
-            "DynamicPorts": [
-                {
-                    "Label": "http",
-                    "To": port,
-                }
-            ],
-        }
-    
-    return {
-        "ID": job_id,
-        "Name": job_id,
-        "Type": "service",
-        "Datacenters": [datacenter],
-        "TaskGroups": [
-            {
-                "Name": "sandbox",
-                "Count": count,
-                # Speed up deployments and avoid Consul checks. Without this, Nomad may
-                # keep an "active deployment" around for the default MinHealthyTime,
-                # which blocks immediate scaling under load.
-                "Update": {
-                    "HealthCheck": "task_states",
-                    "MinHealthyTime": 0,
-                },
-                "Networks": [network_config],
-                "Tasks": [
-                    {
-                        "Name": "sandbox-server",
-                        "Driver": task_driver,
-                        "Config": task_config,
-                        "Env": {
-                            "PYTHONUNBUFFERED": "1",
-                            "NOMAD_ALLOC_DIR": "${NOMAD_ALLOC_DIR}",
-                        },
-                        "Resources": {
-                            "CPU": cpu,
-                            "MemoryMB": memory,
-                        },
-                        # Note: Services with Checks require Consul, which we skip for the PoC
-                    }
-                ],
-                "RestartPolicy": {
-                    "Attempts": 3,
-                    "Interval": 300_000_000_000,  # 5 minutes
-                    "Delay": 10_000_000_000,     # 10 seconds
-                    "Mode": "delay",
-                },
-                "ReschedulePolicy": {
-                    "Attempts": 5,
-                    "Interval": 3600_000_000_000,  # 1 hour
-                    "Delay": 30_000_000_000,      # 30 seconds
-                    "DelayFunction": "exponential",
-                    "MaxDelay": 300_000_000_000,  # 5 minutes
-                    "Unlimited": False,
-                },
-            }
-        ],
-    }
--- a/atropos/sandbox_server.py
+++ b/atropos/sandbox_server.py
--- a/atropos/slots/init.py
+++ b/atropos/slots/init.py
@@ -1,20 +0,0 @@
-"""
-Slot-based multiplexing for atropos-agent.
-
-Provides:
- Slot: Isolated workspace for a single trajectory
- SlotPool: Manages slots across Nomad allocations  
- SandboxExecutor: Executes tools in sandbox containers
-"""
-
-from .executor import SandboxExecutor
-from .pool import SlotPool, SlotPoolConfig
-from .slot import Slot, SlotState
-
-__all__ = [
-    "Slot",
-    "SlotState",
-    "SlotPool",
-    "SlotPoolConfig",
-    "SandboxExecutor",
-]
--- a/atropos/slots/executor.py
+++ b/atropos/slots/executor.py
@@ -1,457 +0,0 @@
-"""
-SandboxExecutor - HTTP client for sandbox container communication.
-
-Sends tool execution requests to sandbox_server.py running inside Nomad containers.
-Supports single and batch execution for efficiency.
-"""
-
-import asyncio
-import uuid
-from dataclasses import dataclass, field
-from typing import Any, Dict, List, Optional, Tuple
-
-import aiohttp
-
-from .slot import Slot, SlotState
-from ..tools.base import ToolCall, ToolResult
-
-
-@dataclass
-class ExecutionRequest:
-    """Request to execute a tool in a slot."""
-    slot: Slot
-    tool_name: str
-    args: Dict[str, Any]
-    execution_id: str = field(default_factory=lambda: str(uuid.uuid4()))
-    timeout: float = 30.0
-
-
-@dataclass
-class ExecutionResult:
-    """Result from sandbox execution."""
-    success: bool
-    output: str = ""
-    error: str = ""
-    execution_id: str = ""
-    slot_id: str = ""
-    metadata: Dict[str, Any] = field(default_factory=dict)
-    
-    def to_tool_result(self) -> ToolResult:
-        """Convert to ToolResult for agent consumption."""
-        return ToolResult(
-            success=self.success,
-            output=self.output,
-            error=self.error,
-            metadata=self.metadata,
-            uniq_id=self.execution_id,
-        )
-
-
-class SandboxExecutor:
-    """
-    HTTP client for executing tools in sandbox containers.
-    
-    Communicates with sandbox_server.py running inside Nomad allocations.
-    Supports both single execution and batched parallel execution.
-    
-    Usage:
-        executor = SandboxExecutor()
-        
-        # Single execution
-        result = await executor.execute(slot, "bash", {"command": "ls"})
-        
-        # Batch execution
-        results = await executor.execute_batch([
-            (slot1, "bash", {"command": "ls"}),
-            (slot2, "write_file", {"path": "test.txt", "content": "hello"}),
-        ])
-    """
-    
-    def __init__(
-        self,
-        timeout: float = 30.0,
-        max_retries: int = 3,
-        retry_delay: float = 1.0,
-    ):
-        self.timeout = aiohttp.ClientTimeout(total=timeout)
-        self.max_retries = max_retries
-        self.retry_delay = retry_delay
-        self._session: Optional[aiohttp.ClientSession] = None
-    
-    async def _get_session(self) -> aiohttp.ClientSession:
-        """Get or create HTTP session."""
-        if self._session is None or self._session.closed:
-            self._session = aiohttp.ClientSession(timeout=self.timeout)
-        return self._session
-    
-    async def close(self):
-        """Close HTTP session."""
-        if self._session and not self._session.closed:
-            await self._session.close()
-    
-    async def __aenter__(self):
-        return self
-    
-    async def __aexit__(self, exc_type, exc_val, exc_tb):
-        await self.close()
-    
-    async def execute(
-        self,
-        slot: Slot,
-        tool_name: str,
-        args: Dict[str, Any],
-        timeout: Optional[float] = None,
-    ) -> ExecutionResult:
-        """
-        Execute a tool in a slot's workspace.
-        
-        Args:
-            slot: Slot to execute in
-            tool_name: Name of tool (bash, read_file, write_file)
-            args: Tool arguments
-            timeout: Optional timeout override
-            
-        Returns:
-            ExecutionResult with output or error
-        """
-        execution_id = str(uuid.uuid4())
-        exec_timeout = timeout or self.timeout.total or 30.0
-        
-        # Mark slot as executing
-        original_state = slot.state
-        try:
-            if slot.state == SlotState.ACQUIRED:
-                slot.start_execution(execution_id)
-            
-            result = await self._send_execute_request(
-                container_addr=slot.container_addr,
-                slot_id=slot.slot_id,
-                tool_name=tool_name,
-                args=args,
-                execution_id=execution_id,
-                timeout=exec_timeout,
-            )
-            result.slot_id = slot.slot_id
-            return result
-            
-        finally:
-            # Restore slot state
-            if slot.state == SlotState.EXECUTING:
-                slot.end_execution()
-    
-    async def _send_execute_request(
-        self,
-        container_addr: str,
-        slot_id: str,
-        tool_name: str,
-        args: Dict[str, Any],
-        execution_id: str,
-        timeout: float,
-    ) -> ExecutionResult:
-        """Send execution request to sandbox server with retry logic."""
-        session = await self._get_session()
-        url = f"{container_addr}/execute"
-        
-        payload = {
-            "slot_id": slot_id,
-            "tool": tool_name,
-            "args": args,
-            "execution_id": execution_id,
-            "timeout": timeout,
-        }
-        
-        last_error = None
-        for attempt in range(self.max_retries):
-            try:
-                async with session.post(url, json=payload) as response:
-                    data = await response.json()
-                    
-                    return ExecutionResult(
-                        success=data.get("success", False),
-                        output=data.get("output", ""),
-                        error=data.get("error", ""),
-                        execution_id=data.get("execution_id", execution_id),
-                        metadata=data.get("metadata", {}),
-                    )
-                    
-            except aiohttp.ClientError as e:
-                last_error = str(e)
-                if attempt < self.max_retries - 1:
-                    await asyncio.sleep(self.retry_delay * (attempt + 1))
-                continue
-            except asyncio.TimeoutError:
-                last_error = f"Request timed out after {timeout}s"
-                break
-            except Exception as e:
-                last_error = str(e)
-                break
-        
-        return ExecutionResult(
-            success=False,
-            error=f"Failed after {self.max_retries} attempts: {last_error}",
-            execution_id=execution_id,
-        )
-    
-    async def execute_batch(
-        self,
-        requests: List[Tuple[Slot, str, Dict[str, Any]]],
-        timeout: Optional[float] = None,
-    ) -> List[ExecutionResult]:
-        """
-        Execute multiple tools in parallel across slots.
-        
-        This is the key optimization - we batch tool calls to maximize
-        container utilization while agents are waiting for LLM responses.
-        
-        Args:
-            requests: List of (slot, tool_name, args) tuples
-            timeout: Optional timeout override
-            
-        Returns:
-            List of ExecutionResults in same order as requests
-        """
-        if not requests:
-            return []
-        
-        # Group requests by container address for batch API
-        by_container: Dict[str, List[Tuple[int, Slot, str, Dict[str, Any], str]]] = {}
-        
-        for idx, (slot, tool_name, args) in enumerate(requests):
-            execution_id = str(uuid.uuid4())
-            container = slot.container_addr
-            
-            if container not in by_container:
-                by_container[container] = []
-            by_container[container].append((idx, slot, tool_name, args, execution_id))
-            
-            # Mark slots as executing
-            if slot.state == SlotState.ACQUIRED:
-                slot.start_execution(execution_id)
-        
-        # Execute batches in parallel
-        exec_timeout = timeout or self.timeout.total or 30.0
-        batch_tasks = []
-        
-        for container_addr, batch_requests in by_container.items():
-            task = self._send_batch_request(
-                container_addr=container_addr,
-                batch_requests=batch_requests,
-                timeout=exec_timeout,
-            )
-            batch_tasks.append(task)
-        
-        # Gather all batch results
-        batch_results = await asyncio.gather(*batch_tasks, return_exceptions=True)
-        
-        # Collect results in original order
-        results: List[Optional[ExecutionResult]] = [None] * len(requests)
-        
-        for batch_result in batch_results:
-            if isinstance(batch_result, Exception):
-                # Mark all in this batch as failed
-                continue
-            
-            for idx, result in batch_result:
-                results[idx] = result
-        
-        # Fill in any missing results
-        for idx, result in enumerate(results):
-            if result is None:
-                slot, tool_name, args = requests[idx]
-                results[idx] = ExecutionResult(
-                    success=False,
-                    error="Batch execution failed",
-                    slot_id=slot.slot_id,
-                )
-        
-        # End execution on all slots
-        for slot, _, _ in requests:
-            if slot.state == SlotState.EXECUTING:
-                slot.end_execution()
-        
-        return results  # type: ignore
-    
-    async def _send_batch_request(
-        self,
-        container_addr: str,
-        batch_requests: List[Tuple[int, Slot, str, Dict[str, Any], str]],
-        timeout: float,
-    ) -> List[Tuple[int, ExecutionResult]]:
-        """Send batch execution request to a single container."""
-        session = await self._get_session()
-        url = f"{container_addr}/batch"
-        
-        # Build batch payload
-        payload = [
-            {
-                "slot_id": slot.slot_id,
-                "tool": tool_name,
-                "args": args,
-                "execution_id": execution_id,
-                "timeout": timeout,
-            }
-            for _, slot, tool_name, args, execution_id in batch_requests
-        ]
-        
-        try:
-            async with session.post(url, json=payload) as response:
-                data = await response.json()
-                
-                if not isinstance(data, list):
-                    raise ValueError(f"Expected list response, got {type(data)}")
-                
-                results = []
-                for i, (idx, slot, _, _, execution_id) in enumerate(batch_requests):
-                    if i < len(data):
-                        item = data[i]
-                        result = ExecutionResult(
-                            success=item.get("success", False),
-                            output=item.get("output", ""),
-                            error=item.get("error", ""),
-                            execution_id=item.get("execution_id", execution_id),
-                            slot_id=slot.slot_id,
-                            metadata=item.get("metadata", {}),
-                        )
-                    else:
-                        result = ExecutionResult(
-                            success=False,
-                            error="Missing result in batch response",
-                            execution_id=execution_id,
-                            slot_id=slot.slot_id,
-                        )
-                    results.append((idx, result))
-                
-                return results
-                
-        except Exception as e:
-            # Return error for all requests in batch
-            return [
-                (idx, ExecutionResult(
-                    success=False,
-                    error=str(e),
-                    execution_id=execution_id,
-                    slot_id=slot.slot_id,
-                ))
-                for idx, slot, _, _, execution_id in batch_requests
-            ]
-    
-    async def reset_slot(self, slot: Slot) -> ExecutionResult:
-        """
-        Reset a slot's workspace (delete all files).
-        
-        Useful when reusing a slot for a new trajectory.
-        """
-        session = await self._get_session()
-        url = f"{slot.container_addr}/reset"
-        
-        try:
-            async with session.post(url, json={"slot_id": slot.slot_id}) as response:
-                data = await response.json()
-                return ExecutionResult(
-                    success=data.get("success", False),
-                    output=data.get("output", ""),
-                    error=data.get("error", ""),
-                    slot_id=slot.slot_id,
-                )
-        except Exception as e:
-            return ExecutionResult(
-                success=False,
-                error=str(e),
-                slot_id=slot.slot_id,
-            )
-    
-    async def health_check(self, container_addr: str) -> bool:
-        """Check if a sandbox container is healthy."""
-        session = await self._get_session()
-        url = f"{container_addr}/health"
-        
-        try:
-            async with session.get(url) as response:
-                data = await response.json()
-                return data.get("status") == "ok"
-        except Exception:
-            return False
-    
-    async def get_container_status(
-        self, 
-        container_addr: str
-    ) -> Optional[Dict[str, Any]]:
-        """Get status info from a sandbox container."""
-        session = await self._get_session()
-        url = f"{container_addr}/health"
-        
-        try:
-            async with session.get(url) as response:
-                return await response.json()
-        except Exception:
-            return None
-
-    # -------------------------------------------------------------------------
-    # Artifact helpers (optional)
-    # -------------------------------------------------------------------------
-
-    async def _post_json(
-        self,
-        url: str,
-        payload: Dict[str, Any],
-        timeout: Optional[float] = None,
-    ) -> Dict[str, Any]:
-        session = await self._get_session()
-        try:
-            async with session.post(url, json=payload, timeout=timeout) as response:
-                data = await response.json()
-                if isinstance(data, dict):
-                    data.setdefault("http_status", response.status)
-                    return data
-                return {"success": False, "error": f"Unexpected response type: {type(data)}", "http_status": response.status}
-        except Exception as e:
-            return {"success": False, "error": str(e)}
-
-    async def read_artifact(
-        self,
-        slot: Slot,
-        path: str,
-        *,
-        encoding: str = "text",
-        max_bytes: Optional[int] = None,
-        include_sha256: bool = False,
-        timeout: Optional[float] = None,
-    ) -> Dict[str, Any]:
-        url = f"{slot.container_addr}/artifacts/read"
-        payload: Dict[str, Any] = {"slot_id": slot.slot_id, "path": path, "encoding": encoding, "include_sha256": include_sha256}
-        if max_bytes is not None:
-            payload["max_bytes"] = max_bytes
-        return await self._post_json(url, payload, timeout=timeout)
-
-    async def list_artifacts(
-        self,
-        slot: Slot,
-        path: str = ".",
-        *,
-        recursive: bool = False,
-        max_entries: Optional[int] = None,
-        timeout: Optional[float] = None,
-    ) -> Dict[str, Any]:
-        url = f"{slot.container_addr}/artifacts/list"
-        payload: Dict[str, Any] = {"slot_id": slot.slot_id, "path": path, "recursive": recursive}
-        if max_entries is not None:
-            payload["max_entries"] = max_entries
-        return await self._post_json(url, payload, timeout=timeout)
-
-    async def archive_artifacts(
-        self,
-        slot: Slot,
-        path: str = ".",
-        *,
-        archive_format: str = "tar.gz",
-        max_bytes: Optional[int] = None,
-        max_entries: Optional[int] = None,
-        timeout: Optional[float] = None,
-    ) -> Dict[str, Any]:
-        url = f"{slot.container_addr}/artifacts/archive"
-        payload: Dict[str, Any] = {"slot_id": slot.slot_id, "path": path, "format": archive_format}
-        if max_bytes is not None:
-            payload["max_bytes"] = max_bytes
-        if max_entries is not None:
-            payload["max_entries"] = max_entries
-        return await self._post_json(url, payload, timeout=timeout)
--- a/atropos/slots/pool.py
+++ b/atropos/slots/pool.py
@@ -1,659 +0,0 @@
-"""
-SlotPool - Manages slots across Nomad allocations.
-
-The SlotPool is the core abstraction for slot-based multiplexing:
- Tracks available/acquired slots across containers
- Handles slot acquisition and release
- Auto-scales Nomad job count based on demand
- Provides batched tool execution
-"""
-
-import asyncio
-import logging
-import os
-import subprocess
-from dataclasses import dataclass, field
-from pathlib import Path
-from typing import Any, Dict, List, Optional, Tuple
-
-from ..nomad.client import (
-    Allocation,
-    AllocationStatus,
-    NomadClient,
-    create_sandbox_job,
-)
-from .executor import ExecutionResult, SandboxExecutor
-from .slot import Slot, SlotState, create_slots_for_allocation
-
-logger = logging.getLogger(__name__)
-
-
-@dataclass
-class SlotPoolConfig:
-    """Configuration for SlotPool."""
-    
-    # Nomad settings
-    nomad_address: str = "http://localhost:4646"
-    job_id: str = "atropos-sandbox"
-    datacenter: str = "dc1"
-    
-    # Container settings
-    image: str = "atropos-sandbox:local"  # Use :local tag to avoid registry pull
-    slots_per_container: int = 10
-    privileged: bool = False
-    cpu: int = 500  # MHz
-    memory: int = 512  # MB
-    
-    # Driver selection: "docker" or "singularity"
-    driver: str = "docker"
-    # Path to .sif file for singularity driver (required if driver="singularity")
-    singularity_image: Optional[str] = None
-    
-    # Scaling settings
-    min_containers: int = 1
-    max_containers: int = 10
-    
-    # Timeouts
-    acquire_timeout: float = 30.0  # Seconds between acquire polls (also triggers scale-up attempts)
-    health_check_interval: float = 30.0  # Seconds between health checks
-    scale_cooldown: float = 60.0  # Seconds between scale operations
-
-    # Job lifecycle
-    purge_job_on_start: bool = False  # Purge any pre-existing job before starting (local dev/training friendly)
-
-    # Local Docker image convenience (macOS/Nomad dev mode)
-    auto_build_local_image: bool = True  # If image endswith :local and is missing, build it from the bundled Dockerfile.
-    dockerfile_path: Optional[str] = None  # Override Dockerfile path (default: Hermes-Agent/atropos/Dockerfile).
-    docker_build_context: Optional[str] = None  # Override build context (default: Hermes-Agent/atropos).
-
-
-class SlotPool:
-    """
-    Manages a pool of slots across Nomad allocations.
-    
-    The SlotPool:
-    - Deploys sandbox containers to Nomad
-    - Tracks slots across all running containers
-    - Handles slot acquisition/release
-    - Auto-scales based on demand
-    - Provides batched execution via SandboxExecutor
-    
-    Usage:
-        config = SlotPoolConfig(
-            nomad_address="http://localhost:4646",
-            job_id="my-sandbox",
-            slots_per_container=10,
-        )
-        
-        pool = SlotPool(config)
-        await pool.start()
-        
-        # Acquire a slot
-        slot = await pool.acquire()
-        
-        # Execute tool
-        result = await pool.execute(slot, "bash", {"command": "ls"})
-        
-        # Release slot
-        await pool.release(slot)
-        
-        # Shutdown
-        await pool.stop()
-    """
-    
-    def __init__(self, config: Optional[SlotPoolConfig] = None):
-        self.config = config or SlotPoolConfig()
-        
-        # Nomad client
-        self.nomad = NomadClient(address=self.config.nomad_address)
-        
-        # Sandbox executor for tool execution
-        self.executor = SandboxExecutor()
-        
-        # Slot tracking
-        self._slots: Dict[str, Slot] = {}  # slot_key -> Slot
-        self._available_queue: asyncio.Queue[str] = asyncio.Queue()
-        self._lock = asyncio.Lock()
-        self._scale_lock = asyncio.Lock()
-        
-        # State
-        self._started = False
-        self._health_task: Optional[asyncio.Task] = None
-        self._scale_task: Optional[asyncio.Task] = None
-        self._last_scale_time = 0.0
-
-    def _default_dockerfile_path(self) -> Path:
-        # Hermes-Agent/atropos/Dockerfile lives next to this module in source checkouts.
-        return Path(__file__).resolve().parents[1] / "Dockerfile"
-
-    def _default_build_context(self) -> Path:
-        return Path(__file__).resolve().parents[1]
-
-    def _docker_image_exists(self, image: str) -> bool:
-        try:
-            proc = subprocess.run(
-                ["docker", "image", "inspect", image],
-                stdout=subprocess.DEVNULL,
-                stderr=subprocess.DEVNULL,
-                check=False,
-                env={**os.environ, "DOCKER_CLI_HINTS": "false"},
-            )
-            return proc.returncode == 0
-        except FileNotFoundError:
-            return False
-
-    def _try_build_local_image(self, image: str) -> None:
-        dockerfile = Path(self.config.dockerfile_path) if self.config.dockerfile_path else self._default_dockerfile_path()
-        context = Path(self.config.docker_build_context) if self.config.docker_build_context else self._default_build_context()
-
-        if not dockerfile.exists():
-            raise RuntimeError(
-                f"Sandbox Dockerfile not found at {dockerfile}. "
-                "Build the sandbox image manually or set --env.purge_job_on_start false and provide a non-local image."
-            )
-        if not context.exists():
-            raise RuntimeError(f"Docker build context not found at {context}")
-
-        # Prefer buildx+--load to ensure the image ends up in the local daemon (required by Nomad's docker driver).
-        buildx_cmd = [
-            "docker",
-            "buildx",
-            "build",
-            "--load",
-            "-t",
-            image,
-            "-f",
-            str(dockerfile),
-            str(context),
-        ]
-        proc = subprocess.run(buildx_cmd, check=False, env={**os.environ, "DOCKER_CLI_HINTS": "false"})
-        if proc.returncode == 0:
-            return
-
-        # Fallback to classic docker build if buildx isn't available.
-        build_cmd = ["docker", "build", "-t", image, "-f", str(dockerfile), str(context)]
-        proc2 = subprocess.run(build_cmd, check=False, env={**os.environ, "DOCKER_CLI_HINTS": "false"})
-        if proc2.returncode != 0:
-            raise RuntimeError(
-                f"Failed to build local sandbox image {image}. "
-                f"Tried: {' '.join(buildx_cmd)} and {' '.join(build_cmd)}"
-            )
-
-    def _ensure_local_image(self) -> None:
-        image = (self.config.image or "").strip()
-        if not image.endswith(":local"):
-            return
-        if not self.config.auto_build_local_image:
-            return
-
-        if self._docker_image_exists(image):
-            return
-
-        logger.info(f"Local sandbox image {image} not found; building it now...")
-        self._try_build_local_image(image)
-
-    def _slot_key(self, alloc_id: str, slot_id: str) -> str:
-        """Generate unique key for a slot."""
-        return f"{alloc_id}:{slot_id}"
-    
-    @property
-    def total_slots(self) -> int:
-        """Total number of slots in pool."""
-        return len(self._slots)
-    
-    @property
-    def available_slots(self) -> int:
-        """Number of available slots."""
-        return sum(1 for s in self._slots.values() if s.is_available)
-    
-    @property
-    def acquired_slots(self) -> int:
-        """Number of acquired slots."""
-        return sum(1 for s in self._slots.values() if s.is_acquired)
-    
-    async def start(self) -> None:
-        """
-        Start the slot pool.
-        
-        - Checks if Nomad is healthy
-        - Deploys sandbox job if not running
-        - Discovers existing allocations
-        - Starts health check background task
-        """
-        if self._started:
-            return
-        
-        logger.info(f"Starting SlotPool (job_id={self.config.job_id})")
-
-        try:
-            # Make sure local sandbox images exist before Nomad tries to pull them.
-            # This is a common footgun in macOS dev mode with :local tags.
-            self._ensure_local_image()
-
-            # Check Nomad health
-            if not await self.nomad.is_healthy():
-                raise RuntimeError(f"Nomad is not reachable at {self.config.nomad_address}")
-
-            if self.config.purge_job_on_start:
-                logger.info(f"Purging any existing Nomad job: {self.config.job_id}")
-                await self.nomad.stop_job(self.config.job_id, purge=True)
-
-            # Check if job exists (after optional purge)
-            job = await self.nomad.get_job(self.config.job_id)
-
-            if job is None:
-                # Deploy new job
-                logger.info(f"Deploying sandbox job: {self.config.job_id} (driver={self.config.driver})")
-                job_spec = create_sandbox_job(
-                    job_id=self.config.job_id,
-                    image=self.config.image,
-                    count=self.config.min_containers,
-                    slots_per_container=self.config.slots_per_container,
-                    privileged=self.config.privileged,
-                    cpu=self.config.cpu,
-                    memory=self.config.memory,
-                    datacenter=self.config.datacenter,
-                    driver=self.config.driver,
-                    singularity_image=self.config.singularity_image,
-                )
-                result = await self.nomad.submit_job(job_spec)
-                if "error" in result:
-                    raise RuntimeError(f"Failed to submit job: {result}")
-
-            # Wait for allocations to be running (even if the job already existed).
-            await self._wait_for_healthy_allocations(self.config.min_containers)
-
-            # Discover existing allocations and slots
-            await self._refresh_slots()
-
-            # Start health check task
-            self._health_task = asyncio.create_task(self._health_check_loop())
-
-            self._started = True
-            logger.info(f"SlotPool started: {self.total_slots} slots available")
-        except Exception:
-            # Ensure aiohttp sessions are not leaked if we fail to start.
-            await self.stop(purge_job=False)
-            raise
-    
-    async def stop(self, purge_job: bool = False) -> None:
-        """
-        Stop the slot pool.
-        
-        Args:
-            purge_job: If True, also stop the Nomad job
-        """
-        logger.info("Stopping SlotPool")
-
-        # Cancel health check task
-        if self._health_task:
-            self._health_task.cancel()
-            try:
-                await self._health_task
-            except asyncio.CancelledError:
-                pass
-            finally:
-                self._health_task = None
-
-        if self._scale_task:
-            self._scale_task.cancel()
-            try:
-                await self._scale_task
-            except asyncio.CancelledError:
-                pass
-            finally:
-                self._scale_task = None
-
-        # Optionally stop the job (do this even if start() never completed).
-        if purge_job:
-            logger.info(f"Stopping Nomad job: {self.config.job_id}")
-            await self.nomad.stop_job(self.config.job_id, purge=True)
-
-        # Close connections
-        await self.executor.close()
-        await self.nomad.close()
-
-        self._started = False
-        self._slots.clear()
-
-        # Clear the queue
-        while not self._available_queue.empty():
-            try:
-                self._available_queue.get_nowait()
-            except asyncio.QueueEmpty:
-                break
-    
-    async def acquire(self, trajectory_id: Optional[str] = None) -> Slot:
-        """
-        Acquire an available slot.
-        
-        If no slots are available, waits up to acquire_timeout seconds.
-        If still no slots, attempts to scale up.
-        
-        Args:
-            trajectory_id: Optional ID of trajectory acquiring the slot
-            
-        Returns:
-            Acquired Slot
-            
-        Raises:
-            asyncio.TimeoutError: If no slot becomes available
-        """
-        if not self._started:
-            raise RuntimeError("SlotPool not started")
-
-        while True:
-            try:
-                # Try to get an available slot
-                slot_key = await asyncio.wait_for(
-                    self._available_queue.get(),
-                    timeout=self.config.acquire_timeout,
-                )
-            except asyncio.TimeoutError:
-                # Try to scale up, but keep waiting even if scaling isn't possible.
-                # In practice, slots may become available shortly (e.g. contention),
-                # and scaling may be temporarily blocked by Nomad deployments.
-                await self._try_scale_up()
-                continue
-
-            slot = self._slots.get(slot_key)
-            if slot is None:
-                # Slot was removed; discard stale queue entry and retry.
-                continue
-
-            try:
-                slot.acquire(trajectory_id)
-            except RuntimeError:
-                # Slot isn't actually available (e.g. duplicate queue entry); retry.
-                continue
-
-            logger.debug(f"Acquired slot {slot.slot_id} (alloc={slot.alloc_id[:8]})")
-            return slot
-    
-    async def release(self, slot: Slot, reset_workspace: bool = False) -> None:
-        """
-        Release a slot back to the pool.
-        
-        Args:
-            slot: Slot to release
-            reset_workspace: If True, clear the workspace files
-        """
-        slot_key = self._slot_key(slot.alloc_id, slot.slot_id)
-        
-        if slot_key not in self._slots:
-            logger.warning(f"Releasing unknown slot: {slot_key}")
-            return
-        
-        # Optionally reset workspace
-        if reset_workspace:
-            await self.executor.reset_slot(slot)
-        
-        slot.release()
-        await self._available_queue.put(slot_key)
-        
-        logger.debug(f"Released slot {slot.slot_id}")
-    
-    async def execute(
-        self,
-        slot: Slot,
-        tool_name: str,
-        args: Dict[str, Any],
-        timeout: Optional[float] = None,
-    ) -> ExecutionResult:
-        """
-        Execute a tool in a slot's workspace.
-        
-        Args:
-            slot: Slot to execute in
-            tool_name: Name of tool (bash, read_file, write_file)
-            args: Tool arguments
-            timeout: Optional timeout override
-            
-        Returns:
-            ExecutionResult
-        """
-        return await self.executor.execute(slot, tool_name, args, timeout)
-    
-    async def execute_batch(
-        self,
-        requests: List[Tuple[Slot, str, Dict[str, Any]]],
-        timeout: Optional[float] = None,
-    ) -> List[ExecutionResult]:
-        """
-        Execute multiple tools in parallel.
-        
-        This is the key optimization - batch execution across multiple slots
-        maximizes container utilization.
-        
-        Args:
-            requests: List of (slot, tool_name, args) tuples
-            timeout: Optional timeout override
-            
-        Returns:
-            List of ExecutionResults in same order
-        """
-        return await self.executor.execute_batch(requests, timeout)
-    
-    async def _refresh_slots(self) -> None:
-        """Refresh slot inventory from Nomad allocations."""
-        async with self._lock:
-            allocs = await self.nomad.get_job_allocations(self.config.job_id)
-            
-            # Track which slots we've seen
-            seen_keys = set()
-            
-            for alloc in allocs:
-                if alloc.status != AllocationStatus.RUNNING:
-                    continue
-                
-                if not alloc.http_address:
-                    continue
-                
-                # Check container health
-                healthy = await self.executor.health_check(alloc.http_address)
-                if not healthy:
-                    continue
-                
-                # Create slots for this allocation
-                for i in range(self.config.slots_per_container):
-                    slot_id = f"slot_{i}"
-                    slot_key = self._slot_key(alloc.id, slot_id)
-                    seen_keys.add(slot_key)
-                    
-                    if slot_key not in self._slots:
-                        # New slot
-                        slot = Slot(
-                            slot_id=slot_id,
-                            alloc_id=alloc.id,
-                            container_addr=alloc.http_address,
-                        )
-                        self._slots[slot_key] = slot
-                        await self._available_queue.put(slot_key)
-                        logger.debug(f"Added slot: {slot_key}")
-            
-            # Remove slots from dead allocations
-            for slot_key in list(self._slots.keys()):
-                if slot_key not in seen_keys:
-                    slot = self._slots.pop(slot_key)
-                    logger.debug(f"Removed slot: {slot_key}")
-    
-    async def _wait_for_healthy_allocations(
-        self, 
-        min_count: int, 
-        timeout: float = 120.0
-    ) -> None:
-        """Wait for allocations to become healthy."""
-        import time
-        start = time.time()
-
-        def _summarize_alloc_detail(detail: Dict[str, Any]) -> str:
-            task_states = detail.get("TaskStates") or {}
-            parts: List[str] = []
-            if isinstance(task_states, dict):
-                for task_name, st in task_states.items():
-                    events = (st or {}).get("Events") or []
-                    if isinstance(events, list) and events:
-                        # Include a few recent events; the latest can be a generic restart message
-                        # while the true root cause is slightly earlier (e.g. image pull failure).
-                        recent = events[-3:]
-                        msgs: List[str] = []
-                        for ev in recent:
-                            desc = ev.get("DisplayMessage") or ev.get("Message") or ev.get("Type") or ""
-                            if desc:
-                                msgs.append(desc)
-                        if msgs:
-                            parts.append(f"{task_name}: " + " | ".join(msgs))
-            return "; ".join(parts)
-
-        def _alloc_events_lower(detail: Dict[str, Any]) -> str:
-            task_states = detail.get("TaskStates") or {}
-            texts: List[str] = []
-            if isinstance(task_states, dict):
-                for _task_name, st in task_states.items():
-                    events = (st or {}).get("Events") or []
-                    if isinstance(events, list):
-                        for ev in events[-10:]:
-                            desc = ev.get("DisplayMessage") or ev.get("Message") or ev.get("Type") or ""
-                            if desc:
-                                texts.append(desc)
-            return " ".join(texts).lower()
-        
-        while time.time() - start < timeout:
-            allocs = await self.nomad.get_job_allocations(self.config.job_id)
-            
-            healthy_count = 0
-            for alloc in allocs:
-                if alloc.status == AllocationStatus.RUNNING and alloc.http_address:
-                    if await self.executor.health_check(alloc.http_address):
-                        healthy_count += 1
-
-                # Fast-fail on obvious driver/image errors to avoid waiting out the full timeout.
-                if alloc.id:
-                    detail = await self.nomad.get_allocation(alloc.id)
-                    if isinstance(detail, dict):
-                        summary = _summarize_alloc_detail(detail)
-                        lowered = _alloc_events_lower(detail) or summary.lower()
-                        if "failed to pull" in lowered or "pull access denied" in lowered:
-                            raise RuntimeError(
-                                "Nomad allocation failed to start due to a Docker image pull error. "
-                                f"Allocation {alloc.id[:8]}: {summary}\n"
-                                "If you're using a local image tag (e.g. `atropos-sandbox:local`) on macOS, "
-                                "make sure the image is loaded into Docker, e.g.:\n"
-                                "  docker buildx build --load -t atropos-sandbox:local -f Hermes-Agent/atropos/Dockerfile Hermes-Agent/atropos"
-                            )
-                        if "exceeded allowed attempts" in lowered:
-                            raise RuntimeError(
-                                "Nomad allocation is crash-looping and has entered restart backoff. "
-                                f"Allocation {alloc.id[:8]}: {summary}\n"
-                                "Inspect logs with:\n"
-                                f"  nomad alloc logs -stderr -task sandbox-server {alloc.id}\n"
-                                "Common causes include: missing local Docker image tag, container entrypoint error, "
-                                "or sandbox-server startup failure."
-                            )
-            
-            if healthy_count >= min_count:
-                return
-            
-            await asyncio.sleep(2.0)
-
-        # Timed out: include allocation status detail to help debugging.
-        allocs = await self.nomad.get_job_allocations(self.config.job_id)
-        alloc_lines: List[str] = []
-        for alloc in allocs[:10]:
-            addr = alloc.http_address or "-"
-            line = f"{alloc.id[:8]} status={alloc.status.value} http={addr}"
-            detail = await self.nomad.get_allocation(alloc.id)
-            if isinstance(detail, dict):
-                summary = _summarize_alloc_detail(detail)
-                if summary:
-                    line += f" detail={summary}"
-            alloc_lines.append(line)
-
-        hint = (
-            "Timed out waiting for healthy sandbox allocations.\n"
-            f"Job: {self.config.job_id}, desired_healthy: {min_count}\n"
-            "Allocations:\n  - " + "\n  - ".join(alloc_lines)
-        )
-        raise RuntimeError(hint)
-    
-    async def _try_scale_up(self) -> bool:
-        """Attempt to scale up the job."""
-        import time
-
-        async with self._scale_lock:
-            # Check cooldown
-            if time.time() - self._last_scale_time < self.config.scale_cooldown:
-                return False
-
-            # Check max containers
-            status = await self.nomad.get_job_status(self.config.job_id)
-            if status is None:
-                return False
-
-            current_count = status.count
-            if current_count >= self.config.max_containers:
-                logger.warning(f"Cannot scale up: already at max ({self.config.max_containers})")
-                return False
-
-            # Scale up
-            new_count = min(current_count + 1, self.config.max_containers)
-            logger.info(f"Scaling up from {current_count} to {new_count} containers")
-
-            scale_resp = await self.nomad.scale_job(
-                self.config.job_id,
-                count=new_count,
-                task_group="sandbox",
-            )
-
-            # Nomad may return non-JSON errors (e.g. plain text) with a status field.
-            if isinstance(scale_resp, dict) and scale_resp.get("status", 200) >= 400:
-                logger.warning(f"Scale request rejected: {scale_resp}")
-                self._last_scale_time = time.time()
-                return False
-
-            self._last_scale_time = time.time()
-
-            # Wait for new allocation in the background so contended acquires can still
-            # make progress (e.g. by grabbing slots released by other trajectories).
-            if self._scale_task is None or self._scale_task.done():
-                self._scale_task = asyncio.create_task(self._wait_for_scale(new_count))
-
-            return True
-
-    async def _wait_for_scale(self, desired_count: int) -> None:
-        try:
-            await self._wait_for_healthy_allocations(desired_count, timeout=60.0)
-            await self._refresh_slots()
-        except asyncio.CancelledError:
-            raise
-        except Exception as e:
-            logger.error(f"Failed to scale up: {e}")
-    
-    async def _health_check_loop(self) -> None:
-        """Background task to monitor container health."""
-        while True:
-            try:
-                await asyncio.sleep(self.config.health_check_interval)
-                await self._refresh_slots()
-            except asyncio.CancelledError:
-                break
-            except Exception as e:
-                logger.error(f"Health check error: {e}")
-    
-    def get_stats(self) -> Dict[str, Any]:
-        """Get pool statistics."""
-        slots_by_state = {}
-        for slot in self._slots.values():
-            state = slot.state.value
-            slots_by_state[state] = slots_by_state.get(state, 0) + 1
-
-        container_count = len({s.alloc_id for s in self._slots.values()}) if self._slots else 0
-        
-        return {
-            "total_slots": self.total_slots,
-            "available_slots": self.available_slots,
-            "acquired_slots": self.acquired_slots,
-            "containers": container_count,
-            "slots_by_state": slots_by_state,
-            "started": self._started,
-        }
--- a/atropos/slots/slot.py
+++ b/atropos/slots/slot.py
@@ -1,159 +0,0 @@
-"""
-Slot abstraction for atropos-agent.
-
-A Slot represents an isolated workspace for a single agent trajectory.
-Slots are hosted on Nomad allocations and provide workspace isolation
-via filesystem directories.
-"""
-
-from dataclasses import dataclass, field
-from enum import Enum
-from typing import Any, Dict, Optional
-import uuid
-
-
-class SlotState(Enum):
-    """State of a slot in the pool."""
-    AVAILABLE = "available"      # Ready to be acquired
-    ACQUIRED = "acquired"        # Assigned to a trajectory
-    EXECUTING = "executing"      # Currently executing a tool
-    RELEASING = "releasing"      # Being released back to pool
-    ERROR = "error"              # In error state
-
-
-@dataclass
-class Slot:
-    """
-    An isolated workspace for a single agent trajectory.
-    
-    Slots are the unit of scheduling - each trajectory runs in its own slot,
-    with an isolated workspace directory. Multiple slots share a container.
-    
-    Attributes:
-        slot_id: Unique identifier for this slot (e.g., "slot_0")
-        alloc_id: Nomad allocation ID hosting this slot
-        container_addr: HTTP address of the sandbox server (e.g., "http://10.0.0.1:8080")
-        workspace_dir: Path to workspace in container (e.g., "/data/slot_0")
-        state: Current state of the slot
-        trajectory_id: ID of trajectory currently using this slot (if acquired)
-        metadata: Additional metadata
-    """
-    slot_id: str
-    alloc_id: str
-    container_addr: str
-    workspace_dir: str = ""
-    state: SlotState = SlotState.AVAILABLE
-    trajectory_id: Optional[str] = None
-    metadata: Dict[str, Any] = field(default_factory=dict)
-    
-    def __post_init__(self):
-        """Set default workspace_dir if not provided."""
-        if not self.workspace_dir:
-            self.workspace_dir = f"/data/{self.slot_id}"
-    
-    @property
-    def is_available(self) -> bool:
-        """Check if slot is available for acquisition."""
-        return self.state == SlotState.AVAILABLE
-    
-    @property
-    def is_acquired(self) -> bool:
-        """Check if slot is currently acquired."""
-        return self.state in (SlotState.ACQUIRED, SlotState.EXECUTING)
-    
-    def acquire(self, trajectory_id: Optional[str] = None) -> None:
-        """
-        Mark slot as acquired by a trajectory.
-        
-        Args:
-            trajectory_id: Optional ID of acquiring trajectory
-        """
-        if not self.is_available:
-            raise RuntimeError(f"Cannot acquire slot {self.slot_id}: state is {self.state}")
-        
-        self.state = SlotState.ACQUIRED
-        self.trajectory_id = trajectory_id or str(uuid.uuid4())
-    
-    def start_execution(self, execution_id: Optional[str] = None) -> None:
-        """Mark slot as executing."""
-        if self.state != SlotState.ACQUIRED:
-            raise RuntimeError(f"Cannot start execution on slot {self.slot_id}: state is {self.state}")
-        
-        self.state = SlotState.EXECUTING
-        if execution_id:
-            self.metadata["current_execution_id"] = execution_id
-    
-    def end_execution(self) -> None:
-        """Mark execution as complete, return to acquired state."""
-        if self.state != SlotState.EXECUTING:
-            raise RuntimeError(f"Cannot end execution on slot {self.slot_id}: state is {self.state}")
-        
-        self.state = SlotState.ACQUIRED
-        self.metadata.pop("current_execution_id", None)
-    
-    def release(self) -> None:
-        """Release slot back to available state."""
-        self.state = SlotState.AVAILABLE
-        self.trajectory_id = None
-        self.metadata.pop("current_execution_id", None)
-    
-    def mark_error(self, error: str) -> None:
-        """Mark slot as in error state."""
-        self.state = SlotState.ERROR
-        self.metadata["error"] = error
-    
-    def to_dict(self) -> Dict[str, Any]:
-        """Convert to dictionary for serialization."""
-        return {
-            "slot_id": self.slot_id,
-            "alloc_id": self.alloc_id,
-            "container_addr": self.container_addr,
-            "workspace_dir": self.workspace_dir,
-            "state": self.state.value,
-            "trajectory_id": self.trajectory_id,
-            "metadata": self.metadata,
-        }
-    
-    @classmethod
-    def from_dict(cls, data: Dict[str, Any]) -> "Slot":
-        """Create from dictionary."""
-        return cls(
-            slot_id=data["slot_id"],
-            alloc_id=data["alloc_id"],
-            container_addr=data["container_addr"],
-            workspace_dir=data.get("workspace_dir", ""),
-            state=SlotState(data.get("state", "available")),
-            trajectory_id=data.get("trajectory_id"),
-            metadata=data.get("metadata", {}),
-        )
-    
-    def __repr__(self) -> str:
-        return f"Slot({self.slot_id}, state={self.state.value}, alloc={self.alloc_id[:8]}...)"
-
-
-def create_slots_for_allocation(
-    alloc_id: str,
-    container_addr: str,
-    num_slots: int = 10,
-) -> list["Slot"]:
-    """
-    Create slots for a Nomad allocation.
-    
-    Args:
-        alloc_id: Nomad allocation ID
-        container_addr: HTTP address of sandbox server
-        num_slots: Number of slots to create
-        
-    Returns:
-        List of Slot objects
-    """
-    slots = []
-    for i in range(num_slots):
-        slot_id = f"slot_{i}"
-        slots.append(Slot(
-            slot_id=slot_id,
-            alloc_id=alloc_id,
-            container_addr=container_addr,
-            workspace_dir=f"/data/{slot_id}",
-        ))
-    return slots
--- a/atropos/terminal/init.py
+++ b/atropos/terminal/init.py
@@ -1,2 +0,0 @@
-"""Terminal helpers for stateful sandbox interactions."""
-
--- a/atropos/terminal/asciinema_stream.py
+++ b/atropos/terminal/asciinema_stream.py
@@ -1,115 +0,0 @@
-from __future__ import annotations
-
-import json
-from typing import Any
-
-import pyte
-
-
-class AsciinemaStreamDecoder:
-    def __init__(self, *, default_width: int = 80, default_height: int = 24) -> None:
-        self._default_width = max(1, int(default_width))
-        self._default_height = max(1, int(default_height))
-        self._buffer = ""
-        self._has_header = False
-        self.width = self._default_width
-        self.height = self._default_height
-        self._screen = pyte.Screen(self.width, self.height)
-        self._stream = pyte.Stream(self._screen)
-
-    def reset(self) -> None:
-        self._buffer = ""
-        self._has_header = False
-        self.width = self._default_width
-        self.height = self._default_height
-        self._screen = pyte.Screen(self.width, self.height)
-        self._stream = pyte.Stream(self._screen)
-
-    def feed(self, chunk: str | bytes) -> None:
-        if not chunk:
-            return
-        if isinstance(chunk, bytes):
-            chunk = chunk.decode("utf-8", errors="replace")
-        self._buffer += chunk
-        while True:
-            line, sep, rest = self._buffer.partition("\n")
-            if not sep:
-                break
-            self._buffer = rest
-            line = line.strip()
-            if not line:
-                continue
-            parsed = self._parse_json_line(line)
-            if parsed is None:
-                continue
-            if not self._has_header:
-                if isinstance(parsed, dict):
-                    self._init_from_header(parsed)
-                    continue
-                if isinstance(parsed, list):
-                    self._has_header = True
-                    self._apply_event(parsed)
-                    continue
-                continue
-            if isinstance(parsed, list):
-                self._apply_event(parsed)
-
-    def render(self) -> str:
-        return "\n".join(self._screen.display)
-
-    def _parse_json_line(self, line: str) -> Any | None:
-        try:
-            return json.loads(line)
-        except json.JSONDecodeError:
-            return None
-
-    def _init_from_header(self, header: dict[str, Any]) -> None:
-        width = _coerce_int(
-            header.get("width") or header.get("columns") or header.get("cols"),
-            self._default_width,
-        )
-        height = _coerce_int(
-            header.get("height") or header.get("rows") or header.get("lines"),
-            self._default_height,
-        )
-        self.width = max(1, width)
-        self.height = max(1, height)
-        self._screen = pyte.Screen(self.width, self.height)
-        self._stream = pyte.Stream(self._screen)
-        self._has_header = True
-
-    def _apply_event(self, event: list[Any]) -> None:
-        if len(event) < 2:
-            return
-        event_type = event[1]
-        payload = event[2] if len(event) > 2 else ""
-        if event_type == "o":
-            if isinstance(payload, str):
-                self._stream.feed(payload)
-        elif event_type == "r":
-            width, height = _parse_resize(payload)
-            if width and height:
-                self.width = width
-                self.height = height
-                self._screen.resize(width, height)
-
-
-def _coerce_int(value: Any, default: int) -> int:
-    try:
-        return int(value)
-    except (TypeError, ValueError):
-        return int(default)
-
-
-def _parse_resize(payload: Any) -> tuple[int, int]:
-    if isinstance(payload, str) and "x" in payload:
-        left, right = payload.lower().split("x", 1)
-        return _coerce_int(left, 0), _coerce_int(right, 0)
-    if isinstance(payload, dict):
-        width = _coerce_int(payload.get("width") or payload.get("columns") or payload.get("cols"), 0)
-        height = _coerce_int(payload.get("height") or payload.get("rows") or payload.get("lines"), 0)
-        return width, height
-    if isinstance(payload, list) and len(payload) >= 2:
-        return _coerce_int(payload[0], 0), _coerce_int(payload[1], 0)
-    return 0, 0
-
--- a/atropos/tools/init.py
+++ b/atropos/tools/init.py
@@ -1,26 +0,0 @@
-"""
-Tool abstractions for atropos-agent.
-
-Provides base Tool class and common tool implementations.
-"""
-
-from .base import Tool, ToolCall, ToolRegistry, ToolResult, ToolSchema
-from .build_registry import build_tool_registry
-from .sandbox_stubs import BashTool, ReadFileTool, TerminalTool, WriteFileTool
-from .terminal_stateful_tool import TerminalStatefulTool
-from .tmux_tool import TmuxTool
-
-__all__ = [
-    "Tool",
-    "ToolCall",
-    "ToolRegistry",
-    "ToolResult",
-    "ToolSchema",
-    "BashTool",
-    "ReadFileTool",
-    "WriteFileTool",
-    "TerminalTool",
-    "TerminalStatefulTool",
-    "TmuxTool",
-    "build_tool_registry",
-]
--- a/atropos/tools/base.py
+++ b/atropos/tools/base.py
@@ -1,423 +0,0 @@
-"""
-Base Tool abstraction for atropos-agent.
-
-Tools follow a simple pattern:
-1. Define schema (name, description, parameters)
-2. Implement execute() method
-3. Return ToolResult with output/error
-
-Tool calls use Hermes-style XML tags:
-<tool_call>{"name": "bash", "arguments": {"command": "ls"}}</tool_call>
-"""
-
-import json
-import re
-import uuid
-from abc import ABC, abstractmethod
-from dataclasses import dataclass, field
-from typing import Any, Dict, List, Literal, Optional
-
-from pydantic import BaseModel, Field
-
-
-@dataclass
-class ToolSchema:
-    """JSON Schema for a tool's parameters."""
-    
-    name: str
-    description: str
-    parameters: Dict[str, Any] = field(default_factory=dict)
-    required: List[str] = field(default_factory=list)
-    external: bool = False  # Whether the tool must be executed via an external ToolServer (secret proxy) and not inside the sandbox.
-    
-    def to_dict(self) -> Dict[str, Any]:
-        """Convert to OpenAI-compatible function schema."""
-        return {
-            "type": "function",
-            "function": {
-                "name": self.name,
-                "description": self.description,
-                "parameters": {
-                    "type": "object",
-                    "properties": self.parameters,
-                    "required": self.required,
-                },
-            },
-        }
-    
-    def to_prompt_description(self) -> str:
-        """Convert to human-readable description for system prompt."""
-        params_desc = []
-        for name, spec in self.parameters.items():
-            req = "(required)" if name in self.required else "(optional)"
-            desc = spec.get("description", "")
-            param_type = spec.get("type", "string")
-            params_desc.append(f"  - {name} ({param_type}) {req}: {desc}")
-        
-        params_str = "\n".join(params_desc) if params_desc else "  (no parameters)"
-        return f"**{self.name}**: {self.description}\nParameters:\n{params_str}"
-
-
-@dataclass
-class ToolCall:
-    """A parsed tool call from model output."""
-    
-    name: str
-    arguments: Dict[str, Any]
-    raw_text: str = ""  # Original XML/JSON text
-    uniq_id: str = field(default_factory=lambda: str(uuid.uuid4()))  # Unique tool-call id for traceability/reconstruction.
-    
-    @classmethod
-    def parse_from_text(cls, text: str) -> List["ToolCall"]:
-        """
-        Extract tool calls from text using Hermes-style XML tags.
-        
-        Supported formats (STRICT: requires well-formed closing tags):
-        - Hermes JSON wrapper:
-          <tool_call>{"name": "...", "arguments": {...}}</tool_call>
-        - GLM/llama.cpp style:
-          <tool_call>terminal{"command":"ls -la"}</tool_call>
-        """
-        calls: List["ToolCall"] = []
-
-        if not text:
-            return calls
-
-        def _append_from_payload(*, name: str, arguments: Dict[str, Any], raw: str, uniq_id: Optional[str] = None) -> None:
-            if not isinstance(name, str) or not name:
-                return
-            if not isinstance(arguments, dict):
-                return
-            calls.append(
-                cls(
-                    name=name,
-                    arguments=arguments,
-                    raw_text=raw,
-                    uniq_id=uniq_id or str(uuid.uuid4()),
-                )
-            )
-
-        # STRICT parsing: only accept well-formed <tool_call>...</tool_call> blocks.
-        pattern = r"<tool_call>\s*(.*?)\s*</tool_call>"
-        for inner in re.findall(pattern, text, re.DOTALL):
-            cleaned = (inner or "").strip()
-            if not cleaned:
-                continue
-
-            # Hermes JSON wrapper.
-            if cleaned.startswith("{"):
-                try:
-                    data = json.loads(cleaned)
-                except json.JSONDecodeError:
-                    continue
-                uniq_id = data.get("uniq_id") or data.get("id") or None
-                _append_from_payload(
-                    name=data.get("name", ""),
-                    arguments=data.get("arguments", {}),
-                    raw=inner,
-                    uniq_id=uniq_id,
-                )
-                continue
-
-            # GLM/llama.cpp style: terminal{...}
-            m = re.match(r"^\s*([A-Za-z0-9_.:\\-]+)\s*(\{.*\})\s*$", cleaned, re.DOTALL)
-            if not m:
-                continue
-            name = m.group(1)
-            args_text = m.group(2)
-            try:
-                args = json.loads(args_text)
-            except json.JSONDecodeError:
-                continue
-            _append_from_payload(name=name, arguments=args, raw=inner)
-
-        return calls
-    
-    @classmethod
-    def has_tool_call(cls, text: str) -> bool:
-        """Check if text contains any tool calls."""
-        return bool(re.search(r"<tool_call>", text))
-
-
-@dataclass
-class ToolResult:
-    """Result from executing a tool."""
-    
-    success: bool
-    output: str = ""
-    error: str = ""
-    metadata: Dict[str, Any] = field(default_factory=dict)
-    uniq_id: Optional[str] = None  # Should match ToolCall.uniq_id for async execution tracking.
-    
-    def to_xml(self) -> str:
-        """Format as XML for including in conversation."""
-        data = {
-            "success": self.success,
-            "output": self.output,
-        }
-        if self.uniq_id:
-            data["uniq_id"] = self.uniq_id
-        if self.error:
-            data["error"] = self.error
-        if self.metadata:
-            data["metadata"] = self.metadata
-        return f"<tool_response>{json.dumps(data)}</tool_response>"
-    
-    def to_dict(self) -> Dict[str, Any]:
-        """Convert to dictionary."""
-        return {
-            "success": self.success,
-            "output": self.output,
-            "error": self.error,
-            "metadata": self.metadata,
-            "uniq_id": self.uniq_id,
-        }
-
-
-class Tool(ABC):
-    """
-    Abstract base class for tools.
-    
-    Subclasses must implement:
-    - schema: ToolSchema describing the tool
-    - execute(): async method that performs the tool action
-    """
-    
-    @property
-    @abstractmethod
-    def schema(self) -> ToolSchema:
-        """Return the tool's schema."""
-        pass
-    
-    @property
-    def name(self) -> str:
-        """Tool name (from schema)."""
-        return self.schema.name
-    
-    @abstractmethod
-    async def execute(self, **kwargs) -> ToolResult:
-        """
-        Execute the tool with given arguments.
-        
-        Args:
-            **kwargs: Tool-specific arguments
-            
-        Returns:
-            ToolResult with success/failure and output
-        """
-        pass
-    
-    def is_available(self) -> tuple[bool, str | None]:
-        """
-        Return whether this tool should be exposed/executable in the current process.
-
-        Tools that depend on optional binaries/services/env vars can override this
-        to avoid advertising a tool that will fail at runtime.
-        """
-        return True, None
-
-    async def __call__(self, **kwargs) -> ToolResult:
-        """Allow calling tool instance directly."""
-        return await self.execute(**kwargs)
-
-# Note: This is only wrapping declarations for the external ToolServer (for execution on external process tools), and tools preinstalled in envs
-class ToolRegistry:
-    """Registry of available tools."""
-    
-    def __init__(self):
-        self._tools: Dict[str, Tool] = {}
-    
-    def register(self, tool: Tool) -> None:
-        """Register a tool."""
-        self._tools[tool.name] = tool
-    
-    def get(self, name: str) -> Optional[Tool]:
-        """Get a tool by name."""
-        return self._tools.get(name)
-    
-    def list_tools(self) -> List[Tool]:
-        """List all registered tools."""
-        return list(self._tools.values())
-    
-    def get_schemas(self) -> List[ToolSchema]:
-        """Get schemas for all registered tools."""
-        return [tool.schema for tool in self._tools.values()]
-    
-    def get_prompt_description(self) -> str:
-        """Generate tool descriptions for system prompt."""
-        descriptions = [tool.schema.to_prompt_description() for tool in self._tools.values()]
-        return "\n\n".join(descriptions)
-
-    def get_prompt_tool_definitions_json(self) -> str:
-        """
-        Return a Hermes-style JSON list of tool definitions for use inside a `<tools>...</tools>` block.
-
-        Hermes trajectories historically use a simplified schema list:
-          [{"name": ..., "description": ..., "parameters": {...}, "required": null}, ...]
-        """
-        formatted: List[Dict[str, Any]] = []
-        for tool in self._tools.values():
-            fn = tool.schema.to_dict().get("function", {})
-            formatted.append(
-                {
-                    "name": fn.get("name", tool.name),
-                    "description": fn.get("description", ""),
-                    "parameters": fn.get("parameters", {}),
-                    # Keep parity with Hermes saved trajectories (required is typically null there).
-                    "required": None,
-                }
-            )
-        return json.dumps(formatted, ensure_ascii=False)
-    
-    async def execute(self, call: ToolCall) -> ToolResult:
-        """Execute a tool call."""
-        tool = self.get(call.name)
-        if tool is None:
-            return ToolResult(
-                success=False,
-                error=f"Unknown tool: {call.name}",
-                uniq_id=call.uniq_id,
-            )
-        
-        try:
-            result = await tool.execute(**call.arguments)
-            if result.uniq_id is None:
-                result.uniq_id = call.uniq_id
-            return result
-        except Exception as e:
-            return ToolResult(
-                success=False,
-                error=f"Tool execution error: {str(e)}",
-                uniq_id=call.uniq_id,
-            )
-
-
-# =============================================================================
-# FastAPI / transport models
-# =============================================================================
-
-
-class ToolCallPayload(BaseModel):
-    name: str
-    arguments: Dict[str, Any] = Field(default_factory=dict)
-    uniq_id: str
-
-    @classmethod
-    def from_tool_call(cls, call: ToolCall) -> "ToolCallPayload":
-        return cls(name=call.name, arguments=call.arguments, uniq_id=call.uniq_id)
-
-    def to_tool_call(self) -> ToolCall:
-        return ToolCall(name=self.name, arguments=self.arguments, uniq_id=self.uniq_id)
-
-
-class ToolResultPayload(BaseModel):
-    success: bool
-    output: str = ""
-    error: str = ""
-    metadata: Dict[str, Any] = Field(default_factory=dict)
-    uniq_id: Optional[str] = None
-
-    @classmethod
-    def from_tool_result(cls, result: ToolResult) -> "ToolResultPayload":
-        return cls(
-            success=result.success,
-            output=result.output,
-            error=result.error,
-            metadata=result.metadata,
-            uniq_id=result.uniq_id,
-        )
-
-    def to_tool_result(self) -> ToolResult:
-        return ToolResult(
-            success=self.success,
-            output=self.output,
-            error=self.error,
-            metadata=self.metadata,
-            uniq_id=self.uniq_id,
-        )
-
-
-class ToolExecutorExecuteRequest(BaseModel):
-    trajectory_id: str
-    tool: ToolCallPayload
-    timeout_s: Optional[float] = None
-
-
-class ToolExecutorReleaseRequest(BaseModel):
-    trajectory_id: str
-    reset_workspace: bool = False
-
-
-class ToolServerExecuteRequest(BaseModel):
-    trajectory_id: Optional[str] = None
-    tool: ToolCallPayload
-    timeout_s: Optional[float] = None
-    # Optional sandbox context for tools that need workspace artifacts.
-    # This is set by ToolExecutor and is NOT model-controlled.
-    slot_id: Optional[str] = None
-    container_addr: Optional[str] = None
-
-
-# =============================================================================
-# Artifact transport models
-# =============================================================================
-
-
-class ArtifactReadRequestPayload(BaseModel):
-    trajectory_id: str
-    path: str
-    encoding: Literal["text", "base64"] = "text"
-    max_bytes: Optional[int] = None
-    include_sha256: bool = False
-
-
-class ArtifactReadResponsePayload(BaseModel):
-    success: bool
-    content: str = ""
-    error: str = ""
-    encoding: str = "text"
-    truncated: bool = False
-    bytes: int = 0
-    file_size: Optional[int] = None
-    path: str = ""
-    mime: Optional[str] = None
-    sha256: Optional[str] = None
-
-
-class ArtifactListRequestPayload(BaseModel):
-    trajectory_id: str
-    path: str = "."
-    recursive: bool = False
-    max_entries: Optional[int] = None
-
-
-class ArtifactListEntryPayload(BaseModel):
-    path: str
-    is_dir: bool
-    size: int
-    mtime: float
-
-
-class ArtifactListResponsePayload(BaseModel):
-    success: bool
-    entries: List[ArtifactListEntryPayload] = Field(default_factory=list)
-    truncated: bool = False
-    error: str = ""
-
-
-class ArtifactArchiveRequestPayload(BaseModel):
-    trajectory_id: str
-    path: str = "."
-    format: Literal["tar.gz", "tgz"] = "tar.gz"
-    max_bytes: Optional[int] = None
-    max_entries: Optional[int] = None
-
-
-class ArtifactArchiveResponsePayload(BaseModel):
-    success: bool
-    content: str = ""
-    error: str = ""
-    encoding: str = "base64"
-    format: str = "tar.gz"
-    bytes: int = 0
-    entry_count: int = 0
--- a/atropos/tools/build_registry.py
+++ b/atropos/tools/build_registry.py
@@ -1,64 +0,0 @@
-"""
-Unified tool registry builder for Hermes-Agent Atropos integration.
-
-This composes:
- sandbox tool stubs (terminal/bash/read_file/write_file + stateful terminal/tmux)
- Hermes external tools (web/vision/image/moa/skills/browser), executed via ToolServer
-
-ToolExecutor only needs the schema + `external` routing bit; ToolServer executes
-the external tools via Hermes' existing implementations.
-"""
-
-from __future__ import annotations
-
-from typing import List, Optional
-
-from .base import ToolRegistry
-from .hermes_external_tools import build_external_tools
-from .sandbox_stubs import BashTool, ReadFileTool, TerminalTool, WriteFileTool
-from .terminal_stateful_tool import TerminalStatefulTool
-from .tmux_tool import TmuxTool
-from .toolset_resolver import resolve_multiple_toolsets
-
-
-def build_tool_registry(
-    *,
-    enabled_toolsets: Optional[List[str]] = None,
-    disabled_toolsets: Optional[List[str]] = None,
-    tool_server_url: Optional[str] = None,
-) -> ToolRegistry:
-    """
-    Build a ToolRegistry for AgentEnv / ToolExecutor / ToolServer.
-
-    If `tool_server_url` is not provided, external tools will be omitted so we do
-    not advertise tools that cannot execute.
-    """
-    enabled_toolsets = enabled_toolsets or ["default"]
-
-    # Resolve tool names using Hermes toolsets plus Atropos additions.
-    selected = set(resolve_multiple_toolsets(enabled_toolsets))
-    if disabled_toolsets:
-        selected -= set(resolve_multiple_toolsets(disabled_toolsets))
-
-    reg = ToolRegistry()
-
-    # Always register sandbox tools if selected.
-    sandbox_by_name = {
-        "terminal": TerminalTool(),
-        "bash": BashTool(),
-        "read_file": ReadFileTool(),
-        "write_file": WriteFileTool(),
-        "terminal_stateful": TerminalStatefulTool(),
-        "tmux": TmuxTool(),
-    }
-    for name, tool in sandbox_by_name.items():
-        if name in selected:
-            reg.register(tool)
-
-    # External tools: only include when ToolServer is configured.
-    if tool_server_url:
-        for tool in build_external_tools(selected_tool_names=selected):
-            if tool.name in selected:
-                reg.register(tool)
-
-    return reg
--- a/atropos/tools/hermes_external_tools.py
+++ b/atropos/tools/hermes_external_tools.py
@@ -1,90 +0,0 @@
-"""
-Hermes external tool adapter for Atropos ToolServer.
-
-These tools reuse Hermes-Agent's existing tool runner (`model_tools.handle_function_call`)
-so we don't duplicate external tool implementations.
-
-Important:
- These are marked `external=True` and should be executed ONLY by ToolServer.
- We run `handle_function_call` in a worker thread because the Hermes implementation
-  uses `asyncio.run()` internally for some async tools (web_extract, vision, MoA, etc).
-"""
-
-from __future__ import annotations
-
-import asyncio
-import json
-from typing import Any, Dict, List, Optional
-
-import model_tools
-
-from .base import Tool, ToolResult, ToolSchema
-
-
-def _schema_from_openai_tool_dict(tool: Dict[str, Any], *, external: bool) -> ToolSchema:
-    fn = tool.get("function") or {}
-    name = str(fn.get("name") or "")
-    description = str(fn.get("description") or "")
-    params = fn.get("parameters") or {}
-    properties = params.get("properties") or {}
-    required = params.get("required") or []
-    if not isinstance(required, list):
-        required = []
-    return ToolSchema(
-        name=name,
-        description=description,
-        parameters=dict(properties),
-        required=[str(x) for x in required if isinstance(x, (str, int))],
-        external=external,
-    )
-
-
-class HermesExternalTool(Tool):
-    def __init__(self, schema: ToolSchema):
-        self._schema = schema
-
-    @property
-    def schema(self) -> ToolSchema:
-        return self._schema
-
-    async def execute(self, task_id: Optional[str] = None, **kwargs: Any) -> ToolResult:
-        # `model_tools.handle_function_call` returns a JSON string (success or error).
-        # Run in a thread because some Hermes tool handlers call `asyncio.run()`.
-        raw = await asyncio.to_thread(model_tools.handle_function_call, self.name, kwargs, task_id)
-
-        try:
-            parsed = json.loads(raw)
-        except Exception:
-            # Keep as plain string.
-            return ToolResult(success=True, output=str(raw))
-
-        if isinstance(parsed, dict) and parsed.get("error"):
-            return ToolResult(success=False, error=str(parsed.get("error")), output="")
-
-        return ToolResult(success=True, output=json.dumps(parsed, ensure_ascii=False))
-
-
-def build_external_tools(
-    *,
-    selected_tool_names: Optional[set[str]] = None,
-) -> List[HermesExternalTool]:
-    """
-    Build external tool wrappers from Hermes tool declarations.
-
-    Filters out sandbox-oriented tools (e.g. `terminal`) since those should run
-    inside the sandbox via ToolExecutor.
-    """
-    # IMPORTANT: Hermes' `model_tools.get_tool_definitions()` only understands Hermes toolsets.
-    # Atropos envs add extra toolsets (filesystem/sandbox/stateful). To avoid noisy "Unknown toolset"
-    # prints and accidental filtering, we fetch ALL Hermes tool definitions here and filter by name.
-    tools = model_tools.get_tool_definitions(enabled_toolsets=None, disabled_toolsets=None, quiet_mode=True)
-
-    wrappers: List[HermesExternalTool] = []
-    for t in tools:
-        schema = _schema_from_openai_tool_dict(t, external=True)
-        if schema.name in {"terminal"}:
-            continue
-        if selected_tool_names is not None and schema.name not in selected_tool_names:
-            continue
-        wrappers.append(HermesExternalTool(schema))
-    return wrappers
--- a/atropos/tools/sandbox_stubs.py
+++ b/atropos/tools/sandbox_stubs.py
@@ -1,99 +0,0 @@
-"""
-Sandbox tool stubs for Atropos ToolExecutor.
-
-These tools are executed inside the sandbox containers via:
-ToolExecutor -> SlotPool -> sandbox_server.py
-
-They intentionally do NOT execute anything on the host process. If they are
-called directly (outside ToolExecutor), they return a clear error.
-"""
-
-from __future__ import annotations
-
-from typing import Optional
-
-from .base import Tool, ToolResult, ToolSchema
-
-
-class TerminalTool(Tool):
-    @property
-    def schema(self) -> ToolSchema:
-        return ToolSchema(
-            name="terminal",
-            description=(
-                "Execute a command inside the sandbox slot workspace and return stdout/stderr. "
-                "Filesystem persists within a trajectory slot. Background processes are not supported "
-                "in stateless mode. Commands run under POSIX /bin/sh and each tool call runs in a fresh "
-                "shell (no persisted env vars). Avoid bash-only syntax like `source`; prefer `. .venv/bin/activate` "
-                "or invoke `.venv/bin/python ...` directly."
-            ),
-            parameters={
-                "command": {"type": "string", "description": "The command to execute"},
-                "timeout": {
-                    "type": "integer",
-                    "description": "Command timeout in seconds (optional).",
-                    "minimum": 1,
-                },
-                "background": {
-                    "type": "boolean",
-                    "description": "Not supported in sandbox terminal (always false).",
-                    "default": False,
-                },
-            },
-            required=["command"],
-            external=False,
-        )
-
-    async def execute(self, **_kwargs) -> ToolResult:
-        return ToolResult(
-            success=False,
-            error="terminal must be executed via ToolExecutor inside the sandbox",
-        )
-
-
-class BashTool(Tool):
-    @property
-    def schema(self) -> ToolSchema:
-        return ToolSchema(
-            name="bash",
-            description="Execute a bash command inside the sandbox slot workspace.",
-            parameters={"command": {"type": "string", "description": "The bash command to execute"}},
-            required=["command"],
-            external=False,
-        )
-
-    async def execute(self, **_kwargs) -> ToolResult:
-        return ToolResult(success=False, error="bash must be executed via ToolExecutor inside the sandbox")
-
-
-class ReadFileTool(Tool):
-    @property
-    def schema(self) -> ToolSchema:
-        return ToolSchema(
-            name="read_file",
-            description="Read a file from the sandbox slot workspace.",
-            parameters={"path": {"type": "string", "description": "Path to the file"}},
-            required=["path"],
-            external=False,
-        )
-
-    async def execute(self, **_kwargs) -> ToolResult:
-        return ToolResult(success=False, error="read_file must be executed via ToolExecutor inside the sandbox")
-
-
-class WriteFileTool(Tool):
-    @property
-    def schema(self) -> ToolSchema:
-        return ToolSchema(
-            name="write_file",
-            description="Write a file into the sandbox slot workspace.",
-            parameters={
-                "path": {"type": "string", "description": "Path to the file"},
-                "content": {"type": "string", "description": "File content"},
-            },
-            required=["path", "content"],
-            external=False,
-        )
-
-    async def execute(self, **_kwargs) -> ToolResult:
-        return ToolResult(success=False, error="write_file must be executed via ToolExecutor inside the sandbox")
--- a/atropos/tools/terminal_stateful_tool.py
+++ b/atropos/tools/terminal_stateful_tool.py
@@ -1,45 +0,0 @@
-"""
-Stateful terminal tool schema.
-
-This is a sandbox tool that routes to the sandbox server as `bash_stateful`
-via ToolExecutor mapping. It exists to expose an explicit, opt-in terminal
-primitive suitable for stateful workflows (e.g. tmux sessions / TUIs).
-"""
-
-from __future__ import annotations
-
-from typing import Optional
-
-from .base import Tool, ToolResult, ToolSchema
-
-
-class TerminalStatefulTool(Tool):
-    @property
-    def schema(self) -> ToolSchema:
-        return ToolSchema(
-            name="terminal_stateful",
-            description=(
-                "Execute a command in the sandbox, allowing stateful/background processes to persist "
-                "across tool calls within the same trajectory slot (e.g. tmux sessions). "
-                "Use sparingly; output is still non-interactive."
-            ),
-            parameters={
-                "command": {"type": "string", "description": "The command to execute"},
-                "timeout": {
-                    "type": "integer",
-                    "description": "Command timeout in seconds (optional).",
-                    "minimum": 1,
-                },
-            },
-            required=["command"],
-        )
-
-    def is_available(self) -> tuple[bool, str | None]:
-        return True, None
-
-    async def execute(self, command: str, timeout: Optional[int] = None) -> ToolResult:
-        _ = (command, timeout)
-        return ToolResult(
-            success=False,
-            error="terminal_stateful must be executed via ToolExecutor inside the sandbox",
-        )
--- a/atropos/tools/tmux_tool.py
+++ b/atropos/tools/tmux_tool.py
@@ -1,89 +0,0 @@
-"""
-tmux tool schema (sandbox).
-
-This is a sandbox tool that provides basic tmux session control suitable for
-TUI-style terminal interactions:
- send keys (arrow keys, enter, etc.)
- capture the current screen buffer
-
-Execution is routed by ToolExecutor to the sandbox server's `tmux` backend.
-"""
-
-from __future__ import annotations
-
-from typing import Any, Dict, Optional
-
-from .base import Tool, ToolResult, ToolSchema
-
-
-class TmuxTool(Tool):
-    @property
-    def schema(self) -> ToolSchema:
-        return ToolSchema(
-            name="tmux",
-            description=(
-                "Control a per-trajectory tmux session inside the sandbox (stateful terminal). "
-                "Use this for TUI-style interactions: send keys and capture the current screen."
-            ),
-            parameters={
-                "action": {
-                    "type": "string",
-                    "description": "Action to perform: start | send_keys | stream | stop.",
-                    "enum": ["start", "send_keys", "stream", "stop", "capture"],
-                },
-                "keys": {
-                    "description": "Keys to send (string or list of strings) when action=send_keys.",
-                },
-                "block": {
-                    "type": "boolean",
-                    "description": "If true, wait for shell command completion (only valid at a shell prompt).",
-                    "default": False,
-                },
-                "min_wait_s": {
-                    "type": "number",
-                    "description": "For non-blocking send_keys, sleep this long after sending keys (seconds).",
-                    "default": 0.0,
-                },
-                "max_wait_s": {
-                    "type": "number",
-                    "description": "For blocking send_keys, max time to wait for completion (seconds).",
-                },
-                "capture_entire": {
-                    "type": "boolean",
-                    "description": "Deprecated. Streaming is preferred.",
-                    "default": False,
-                },
-                "max_bytes": {
-                    "type": "integer",
-                    "description": "Max bytes to return per stream call.",
-                },
-                "reset": {
-                    "type": "boolean",
-                    "description": "If true, reset stream offset to the beginning of the asciinema recording.",
-                    "default": False,
-                },
-                "pane_width": {
-                    "type": "integer",
-                    "description": "Pane width for action=start (columns).",
-                    "minimum": 20,
-                },
-                "pane_height": {
-                    "type": "integer",
-                    "description": "Pane height for action=start (rows).",
-                    "minimum": 10,
-                },
-            },
-            required=["action"],
-        )
-
-    def is_available(self) -> tuple[bool, str | None]:
-        return True, None
-
-    async def execute(self, **kwargs: Dict[str, Any]) -> ToolResult:
-        # This tool is intended to be executed via ToolExecutor -> sandbox server.
-        # We keep a safe fallback for non-sandbox contexts.
-        action = str(kwargs.get("action") or "").strip()
-        return ToolResult(
-            success=False,
-            error=f"tmux tool must be executed in the sandbox (got action={action!r})",
-        )
--- a/atropos/tools/tool_executor.py
+++ b/atropos/tools/tool_executor.py
@@ -1,500 +0,0 @@
-"""
-ToolExecutor - queued, batched tool dispatch for multiplexed agent trajectories.
-
-This component is responsible for:
- Maintaining trajectory -> Slot affinity (workspace continuity)
- Batching sandbox tool calls across trajectories to maximize container utilization
- Routing external tools (ToolSchema.external=True) to a ToolServer (Phase 4.5)
-
-For now, only sandbox tools are executed:
- bash
- read_file
- write_file
-"""
-
-from __future__ import annotations
-
-import asyncio
-import time
-from dataclasses import dataclass
-from typing import Any, Dict, List, Optional
-
-import httpx
-
-from .base import (
-    ArtifactArchiveRequestPayload,
-    ArtifactArchiveResponsePayload,
-    ArtifactListRequestPayload,
-    ArtifactListResponsePayload,
-    ArtifactReadRequestPayload,
-    ArtifactReadResponsePayload,
-    ToolCall,
-    ToolCallPayload,
-    ToolRegistry,
-    ToolResult,
-    ToolResultPayload,
-    ToolServerExecuteRequest,
-)
-from ..backends.base import ToolBackend
-from ..slots import Slot
-
-
-@dataclass
-class ToolExecutorConfig:
-    batch_window_ms: int = 20
-    max_batch_size: int = 200
-    allow_network: bool = True
-    require_sandbox: bool = False
-    require_stateful_sandbox: bool = False
-    tool_server_url: Optional[str] = None
-    tool_server_token: Optional[str] = None
-
-
-@dataclass
-class _QueuedToolRequest:
-    trajectory_id: str
-    call: ToolCall
-    timeout_s: Optional[float]
-    future: asyncio.Future
-
-
-class ToolExecutor:
-    def __init__(
-        self,
-        backend: ToolBackend,
-        tools: ToolRegistry,
-        config: Optional[ToolExecutorConfig] = None,
-    ) -> None:
-        self.backend = backend
-        self.tools = tools
-        self.config = config or ToolExecutorConfig()
-
-        self._queue: asyncio.Queue[Optional[_QueuedToolRequest]] = asyncio.Queue()
-        self._task: Optional[asyncio.Task] = None
-        self._stopping = asyncio.Event()
-
-        self._slots_lock = asyncio.Lock()
-        self._slot_by_trajectory: Dict[str, Slot] = {}
-
-        self._tool_server_client: Optional[httpx.AsyncClient] = None
-        self._tool_server_lock = asyncio.Lock()
-
-        # lightweight stats for status endpoints
-        self.total_requests: int = 0
-        self.total_errors: int = 0
-        self.latencies_s: List[float] = []
-
-    async def start(self) -> None:
-        if self._task is None:
-            self._task = asyncio.create_task(self._run_loop())
-
-    def queue_size(self) -> int:
-        return self._queue.qsize()
-
-    async def close(self) -> None:
-        self._stopping.set()
-        await self._queue.put(None)
-        if self._task:
-            await self._task
-            self._task = None
-
-        client = self._tool_server_client
-        self._tool_server_client = None
-        if client is not None:
-            await client.aclose()
-
-        # Best-effort release any remaining slots.
-        async with self._slots_lock:
-            slots = list(self._slot_by_trajectory.items())
-            self._slot_by_trajectory.clear()
-
-        for _, slot in slots:
-            try:
-                await self.backend.release(slot, reset_workspace=False)
-            except Exception:
-                pass
-
-    async def execute(
-        self,
-        trajectory_id: str,
-        call: ToolCall,
-        timeout_s: Optional[float] = None,
-    ) -> ToolResult:
-        if self._task is None:
-            raise RuntimeError("ToolExecutor not started (call start() first)")
-
-        # Allow tool args to suggest a timeout (Hermes-compatible terminal tool),
-        # but never let the model choose "infinite" timeouts.
-        if timeout_s is None:
-            raw_timeout = call.arguments.get("timeout")
-            if isinstance(raw_timeout, (int, float)):
-                timeout_s = float(raw_timeout)
-        if timeout_s is not None:
-            timeout_s = max(1.0, min(float(timeout_s), 600.0))
-
-        loop = asyncio.get_running_loop()
-        fut: asyncio.Future = loop.create_future()
-        started = time.perf_counter()
-        await self._queue.put(_QueuedToolRequest(trajectory_id=trajectory_id, call=call, timeout_s=timeout_s, future=fut))
-        try:
-            result: ToolResult = await fut
-            return result
-        finally:
-            self.latencies_s.append(time.perf_counter() - started)
-
-    async def release_trajectory(self, trajectory_id: str, reset_workspace: bool = False) -> None:
-        async with self._slots_lock:
-            slot = self._slot_by_trajectory.pop(trajectory_id, None)
-
-        if slot is not None:
-            await self.backend.release(slot, reset_workspace=reset_workspace)
-
-    async def _get_slot_if_present(self, trajectory_id: str) -> Optional[Slot]:
-        async with self._slots_lock:
-            return self._slot_by_trajectory.get(trajectory_id)
-
-    # ---------------------------------------------------------------------
-    # Artifact helpers (optional)
-    # ---------------------------------------------------------------------
-
-    async def read_artifact(self, req: ArtifactReadRequestPayload) -> ArtifactReadResponsePayload:
-        slot = await self._get_slot_if_present(req.trajectory_id)
-        if slot is None:
-            return ArtifactReadResponsePayload(success=False, error="No active slot for trajectory (run a sandbox tool first)")
-        data = await self.backend.read_artifact(
-            slot,
-            req.path,
-            encoding=req.encoding,
-            max_bytes=req.max_bytes,
-            include_sha256=req.include_sha256,
-        )
-        if isinstance(data, dict):
-            data = dict(data)
-            data.pop("http_status", None)
-        try:
-            return ArtifactReadResponsePayload(**(data or {}))
-        except Exception as e:
-            return ArtifactReadResponsePayload(success=False, error=f"Invalid artifact read response: {e}")
-
-    async def list_artifacts(self, req: ArtifactListRequestPayload) -> ArtifactListResponsePayload:
-        slot = await self._get_slot_if_present(req.trajectory_id)
-        if slot is None:
-            return ArtifactListResponsePayload(success=False, error="No active slot for trajectory (run a sandbox tool first)")
-        data = await self.backend.list_artifacts(
-            slot,
-            req.path,
-            recursive=req.recursive,
-            max_entries=req.max_entries,
-        )
-        if isinstance(data, dict):
-            data = dict(data)
-            data.pop("http_status", None)
-        try:
-            return ArtifactListResponsePayload(**(data or {}))
-        except Exception as e:
-            return ArtifactListResponsePayload(success=False, error=f"Invalid artifact list response: {e}")
-
-    async def archive_artifacts(self, req: ArtifactArchiveRequestPayload) -> ArtifactArchiveResponsePayload:
-        slot = await self._get_slot_if_present(req.trajectory_id)
-        if slot is None:
-            return ArtifactArchiveResponsePayload(success=False, error="No active slot for trajectory (run a sandbox tool first)")
-        data = await self.backend.archive_artifacts(
-            slot,
-            req.path,
-            archive_format=req.format,
-            max_bytes=req.max_bytes,
-            max_entries=req.max_entries,
-        )
-        if isinstance(data, dict):
-            data = dict(data)
-            data.pop("http_status", None)
-        try:
-            return ArtifactArchiveResponsePayload(**(data or {}))
-        except Exception as e:
-            return ArtifactArchiveResponsePayload(success=False, error=f"Invalid artifact archive response: {e}")
-
-    async def _get_or_acquire_slot(self, trajectory_id: str) -> Slot:
-        async with self._slots_lock:
-            existing = self._slot_by_trajectory.get(trajectory_id)
-            if existing is not None:
-                return existing
-
-        slot = await self.backend.acquire(trajectory_id)
-
-        async with self._slots_lock:
-            existing = self._slot_by_trajectory.get(trajectory_id)
-            if existing is not None:
-                # Another coroutine won the race; return its slot.
-                await self.backend.release(slot, reset_workspace=False)
-                return existing
-            self._slot_by_trajectory[trajectory_id] = slot
-            return slot
-
-    async def _run_loop(self) -> None:
-        pending: List[_QueuedToolRequest] = []
-        deadline: Optional[float] = None
-
-        batch_window_s = max(0.0, self.config.batch_window_ms / 1000.0)
-        max_batch = max(1, self.config.max_batch_size)
-
-        while True:
-            if self._stopping.is_set() and self._queue.empty() and not pending:
-                break
-
-            timeout = None
-            if pending and deadline is not None:
-                timeout = max(0.0, deadline - time.perf_counter())
-
-            try:
-                item = await asyncio.wait_for(self._queue.get(), timeout=timeout)
-                if item is None:
-                    continue
-                pending.append(item)
-                if len(pending) == 1:
-                    deadline = time.perf_counter() + batch_window_s
-                if len(pending) < max_batch:
-                    continue
-            except asyncio.TimeoutError:
-                # batch window elapsed
-                pass
-
-            if not pending:
-                deadline = None
-                continue
-
-            batch = pending
-            pending = []
-            deadline = None
-
-            await self._execute_batch(batch)
-
-    async def _get_tool_server_client(self) -> httpx.AsyncClient:
-        url = self.config.tool_server_url
-        if not url:
-            raise RuntimeError("ToolServer not configured")
-
-        if self._tool_server_client is not None:
-            return self._tool_server_client
-
-        async with self._tool_server_lock:
-            if self._tool_server_client is None:
-                self._tool_server_client = httpx.AsyncClient(base_url=url.rstrip("/"))
-            return self._tool_server_client
-
-    def _tool_server_headers(self) -> Dict[str, str]:
-        token = self.config.tool_server_token
-        if not token:
-            return {}
-        return {"Authorization": f"Bearer {token}"}
-
-    async def _execute_external(self, req: _QueuedToolRequest) -> ToolResult:
-        client = await self._get_tool_server_client()
-        slot_id: Optional[str] = None
-        container_addr: Optional[str] = None
-        slot = await self._get_slot_if_present(req.trajectory_id)
-        if slot is not None:
-            slot_id = slot.slot_id
-            container_addr = slot.container_addr
-
-        payload = ToolServerExecuteRequest(
-            trajectory_id=req.trajectory_id,
-            tool=ToolCallPayload.from_tool_call(req.call),
-            timeout_s=req.timeout_s,
-            slot_id=slot_id,
-            container_addr=container_addr,
-        )
-
-        try:
-            resp = await client.post(
-                "/execute",
-                json=payload.model_dump(),
-                headers=self._tool_server_headers(),
-                timeout=req.timeout_s,
-            )
-            resp.raise_for_status()
-            data = resp.json()
-            parsed = ToolResultPayload(**data)
-            result = parsed.to_tool_result()
-            if result.uniq_id is None:
-                result.uniq_id = req.call.uniq_id
-            return result
-        except Exception as e:
-            return ToolResult(
-                success=False,
-                error=f"External tool failed: {e}",
-                uniq_id=req.call.uniq_id,
-            )
-
-    async def _execute_batch(self, batch: List[_QueuedToolRequest]) -> None:
-        # Resolve tool schemas once per request and separate sandbox/external/unknown.
-        sandbox_items: List[_QueuedToolRequest] = []
-        external_items: List[_QueuedToolRequest] = []
-        unknown_items: List[_QueuedToolRequest] = []
-
-        for it in batch:
-            tool = self.tools.get(it.call.name)
-            if tool is None:
-                unknown_items.append(it)
-                continue
-
-            schema = tool.schema
-            if not schema.external:
-                sandbox_items.append(it)
-            else:
-                external_items.append(it)
-
-        for it in unknown_items:
-            self.total_requests += 1
-            self.total_errors += 1
-            if not it.future.done():
-                it.future.set_result(
-                    ToolResult(
-                        success=False,
-                        error=f"Unknown tool: {it.call.name}",
-                        uniq_id=it.call.uniq_id,
-                    )
-                )
-
-        if external_items:
-            if not self.config.tool_server_url:
-                for it in external_items:
-                    self.total_requests += 1
-                    self.total_errors += 1
-                    if not it.future.done():
-                        it.future.set_result(
-                            ToolResult(
-                                success=False,
-                                error=f"External tool not available (ToolServer not configured): {it.call.name}",
-                                uniq_id=it.call.uniq_id,
-                            )
-                        )
-            else:
-                results = await asyncio.gather(*[self._execute_external(it) for it in external_items])
-                for it, res in zip(external_items, results):
-                    self.total_requests += 1
-                    if not getattr(res, "success", False):
-                        self.total_errors += 1
-                    if not it.future.done():
-                        it.future.set_result(res)
-
-        if not sandbox_items:
-            return
-
-        # Acquire slots for the distinct trajectories in this batch.
-        try:
-            traj_ids = list({it.trajectory_id for it in sandbox_items})
-            slots = await asyncio.gather(*[self._get_or_acquire_slot(tid) for tid in traj_ids])
-            slot_by_traj = dict(zip(traj_ids, slots))
-        except Exception as e:
-            for it in sandbox_items:
-                self.total_requests += 1
-                self.total_errors += 1
-                if not it.future.done():
-                    it.future.set_result(
-                        ToolResult(
-                            success=False,
-                            error=f"Failed to acquire slot: {e}",
-                            uniq_id=it.call.uniq_id,
-                        )
-                    )
-            return
-
-        # Group by timeout so we don't accidentally make short timeouts wait on long ones.
-        by_timeout: Dict[float, List[_QueuedToolRequest]] = {}
-        default_timeout = self.backend.default_timeout_s
-
-        for it in sandbox_items:
-            t = it.timeout_s
-            if t is None:
-                t = default_timeout
-            if t is None:
-                t = 30.0
-            by_timeout.setdefault(float(t), []).append(it)
-
-        for timeout_s, items in by_timeout.items():
-            requests = []
-            dispatched: List[_QueuedToolRequest] = []
-            for it in items:
-                slot = slot_by_traj[it.trajectory_id]
-                tool_name = it.call.name
-                args = dict(it.call.arguments)
-
-                # Hermes compatibility: treat `terminal` as an alias of sandbox `bash`.
-                if tool_name == "terminal":
-                    if args.get("background"):
-                        self.total_requests += 1
-                        self.total_errors += 1
-                        if not it.future.done():
-                            it.future.set_result(
-                                ToolResult(
-                                    success=False,
-                                    error="terminal background execution is not supported in sandbox",
-                                    uniq_id=it.call.uniq_id,
-                                )
-                            )
-                        continue
-                    tool_name = "bash"
-                    # `timeout` is handled at the ToolExecutor level, not passed to the sandbox tool args.
-                    args.pop("timeout", None)
-                elif tool_name == "terminal_stateful":
-                    tool_name = "bash_stateful"
-                    args.pop("timeout", None)
-                elif tool_name == "tmux":
-                    # `tmux` is a sandbox tool backed by the stateful session manager.
-                    # Network policy is env-controlled.
-                    args.pop("allow_network", None)
-
-                if tool_name == "bash":
-                    # Network policy is set by the environment/executor, not by the model.
-                    args.pop("allow_network", None)
-                    args.pop("require_sandbox", None)
-                    args["allow_network"] = bool(self.config.allow_network)
-                    args["require_sandbox"] = bool(self.config.require_sandbox)
-                    # `timeout` is handled at the ToolExecutor level, not passed to the sandbox tool args.
-                    args.pop("timeout", None)
-                elif tool_name == "bash_stateful":
-                    # Network policy is set by the environment/executor, not by the model.
-                    args.pop("allow_network", None)
-                    args.pop("require_sandbox", None)
-                    args.pop("require_stateful_sandbox", None)
-                    args["allow_network"] = bool(self.config.allow_network)
-                    args["require_stateful_sandbox"] = bool(self.config.require_stateful_sandbox)
-                    args.pop("timeout", None)
-                elif tool_name == "tmux":
-                    # Network policy applies to the underlying stateful session.
-                    args.pop("allow_network", None)
-                    args.pop("require_sandbox", None)
-                    args.pop("require_stateful_sandbox", None)
-                    args["allow_network"] = bool(self.config.allow_network)
-                    args["require_stateful_sandbox"] = bool(self.config.require_stateful_sandbox)
-
-                requests.append((slot, tool_name, args))
-                dispatched.append(it)
-
-            results = None
-            try:
-                if not dispatched:
-                    continue
-                results = await self.backend.execute_batch(requests, timeout_s=timeout_s)
-            except Exception as e:
-                for it in items:
-                    self.total_requests += 1
-                    self.total_errors += 1
-                    if not it.future.done():
-                        it.future.set_result(
-                            ToolResult(
-                                success=False,
-                                error=f"Batch execution failed: {e}",
-                                uniq_id=it.call.uniq_id,
-                            )
-                        )
-                continue
-
-            for it, res in zip(dispatched, results):
-                self.total_requests += 1
-                if not getattr(res, "success", False):
-                    self.total_errors += 1
-                tool_result = res.to_tool_result()
-                tool_result.uniq_id = it.call.uniq_id
-                if not it.future.done():
-                    it.future.set_result(tool_result)
--- a/atropos/tools/toolset_resolver.py
+++ b/atropos/tools/toolset_resolver.py
@@ -1,88 +0,0 @@
-"""
-Toolset resolution for Hermes-Agent Atropos integration.
-
-We primarily reuse Hermes-Agent toolsets (`toolsets.py`), but Atropos training/envs
-need a few extra sandbox-oriented toolsets that Hermes doesn't expose by default
-(e.g. filesystem + stateful terminal).
-"""
-
-from __future__ import annotations
-
-from typing import Any, Dict, List, Optional, Set
-
-import toolsets as hermes_toolsets
-
-
-ATROPOS_TOOLSETS: Dict[str, Dict[str, Any]] = {
-    "filesystem": {
-        "description": "Read/write files in the sandbox workspace.",
-        "tools": ["read_file", "write_file"],
-        "includes": [],
-    },
-    "terminal_stateful": {
-        "description": "Stateful terminal execution (tmux/TUI support) inside the sandbox.",
-        "tools": ["terminal_stateful", "tmux"],
-        "includes": [],
-    },
-    "sandbox": {
-        "description": "Sandbox tools (terminal + filesystem).",
-        "tools": [],
-        "includes": ["terminal", "filesystem"],
-    },
-    "default": {
-        "description": "Default toolset for Atropos AgentEnv tasks.",
-        "tools": [],
-        "includes": ["sandbox"],
-    },
-    "full": {
-        "description": "All Hermes tools plus Atropos sandbox additions.",
-        "tools": [],
-        "includes": ["all", "filesystem", "sandbox", "terminal_stateful"],
-    },
-}
-
-
-def validate_toolset(name: str) -> bool:
-    if name in {"all", "*"}:
-        return True
-    return hermes_toolsets.validate_toolset(name) or name in ATROPOS_TOOLSETS
-
-
-def resolve_toolset(name: str, visited: Optional[Set[str]] = None) -> List[str]:
-    if visited is None:
-        visited = set()
-
-    if name in {"all", "*"}:
-        # Union Hermes + Atropos toolsets.
-        all_tools: Set[str] = set()
-        for tname in hermes_toolsets.get_toolset_names():
-            all_tools.update(resolve_toolset(tname, visited=set()))
-        for tname, spec in ATROPOS_TOOLSETS.items():
-            # Avoid recursion: some Atropos toolsets (e.g. "full") include "all".
-            if tname == "full" or "all" in (spec.get("includes") or []):
-                continue
-            all_tools.update(resolve_toolset(tname, visited=set()))
-        return sorted(all_tools)
-
-    if name in ATROPOS_TOOLSETS:
-        if name in visited:
-            return []
-        visited.add(name)
-        spec = ATROPOS_TOOLSETS[name]
-        tools: Set[str] = set(spec.get("tools", []))
-        for inc in spec.get("includes", []):
-            tools.update(resolve_toolset(inc, visited=set(visited)))
-        return sorted(tools)
-
-    # Fall back to Hermes toolsets.
-    # IMPORTANT: do not pre-add `name` to `visited` here; Hermes' resolver uses
-    # `visited` for its own cycle detection and will treat the presence of `name`
-    # as a circular dependency.
-    return sorted(hermes_toolsets.resolve_toolset(name, visited=set(visited)))
-
-
-def resolve_multiple_toolsets(names: List[str]) -> List[str]:
-    tools: Set[str] = set()
-    for name in names:
-        tools.update(resolve_toolset(name, visited=set()))
-    return sorted(tools)
--- a/atropos_compatible_agent.py
+++ b/atropos_compatible_agent.py
@@ -1,415 +0,0 @@
-#!/usr/bin/env python3
-"""
-Atropos-compatible Hermes agent runner.
-
-This is a minimal subclass of Hermes-Agent's `AIAgent` that swaps the OpenAI
-function-calling backend for Atroposlib's `ManagedServer`/`ServerManager` backend
-and uses Hermes-style XML tool tags:
-
- <tool_call>{"name": "...", "arguments": {...}}</tool_call>
- <tool_response>{...}</tool_response>
-
-Tool observations are appended as `role="user"` messages containing one or more
-`<tool_response>` blocks so they survive common chat templates during tokenization.
-"""
-
-from __future__ import annotations
-
-import asyncio
-import json
-import re
-import time
-import warnings
-import os
-from contextlib import asynccontextmanager
-from typing import Any, AsyncGenerator, Dict, List, Optional, Tuple
-
-from model_tools import cleanup_vm, handle_function_call
-from run_agent import AIAgent
-
-_TOOL_CALL_RE = re.compile(r"<tool_call>\\s*(.*?)\\s*</tool_call>", re.DOTALL)
-
-
-ATROPOS_TOOL_SYSTEM_PROMPT = """You are a helpful AI assistant with access to tools.
-
-## Available Tools
-<tools>
-{tool_descriptions}
-</tools>
-
-## How to Use Tools
-To call a tool, output:
-<tool_call>{{"name": "tool_name", "arguments": {{"arg1": "value1"}}}}</tool_call>
-
-You may include optional reasoning in <think>...</think> before tool calls.
-
-After each tool call, you will receive tool results as:
-<tool_response>{{...}}</tool_response>
-
-Continue until finished, then provide a final response with no <tool_call> blocks.
-"""
-
-
-class AtroposAIAgent(AIAgent):
-    """
-    Hermes `AIAgent` variant that uses Atroposlib ServerManager/ManagedServer.
-
-    Notes:
-    - The default Hermes `AIAgent` remains unchanged; this class is opt-in.
-    - The underlying server must expose `managed_server(tokenizer=...)` OR be a single
-      APIServer-compatible object usable by Atroposlib's `ManagedServer`.
-    """
-
-    def __init__(
-        self,
-        *,
-        server: Any,
-        tokenizer: Any = None,
-        model: str = "local",
-        max_iterations: int = 10,
-        tool_delay: float = 0.0,
-        enabled_toolsets: Optional[List[str]] = None,
-        disabled_toolsets: Optional[List[str]] = None,
-        save_trajectories: bool = False,
-        verbose_logging: bool = False,
-        quiet_mode: bool = False,
-        ephemeral_system_prompt: Optional[str] = None,
-        log_prefix_chars: int = 100,
-        log_prefix: str = "",
-        session_id: Optional[str] = None,
-        temperature: Optional[float] = None,
-        max_tokens: Optional[int] = None,
-    ):
-        # Call parent init mainly to reuse tool selection + trajectory saving utilities.
-        super().__init__(
-            base_url="http://unused",
-            api_key="dummy-key",
-            model=model,
-            max_iterations=max_iterations,
-            tool_delay=tool_delay,
-            enabled_toolsets=enabled_toolsets,
-            disabled_toolsets=disabled_toolsets,
-            save_trajectories=save_trajectories,
-            verbose_logging=verbose_logging,
-            quiet_mode=quiet_mode,
-            ephemeral_system_prompt=ephemeral_system_prompt,
-            log_prefix_chars=log_prefix_chars,
-            log_prefix=log_prefix,
-            session_id=session_id,
-        )
-
-        self.server = server
-        self.tokenizer = tokenizer
-        self.temperature = temperature
-        self.max_tokens = max_tokens
-
-    @asynccontextmanager
-    async def _managed(self) -> AsyncGenerator[Any, None]:
-        if hasattr(self.server, "managed_server"):
-            with warnings.catch_warnings():
-                warnings.filterwarnings(
-                    "ignore",
-                    message=r"Using OpenAIServer with managed_server does not allow for state tracking",
-                    category=UserWarning,
-                )
-                async with self.server.managed_server(tokenizer=self.tokenizer) as managed:
-                    yield managed
-            return
-
-        # Fall back to directly wrapping a single server object.
-        from atroposlib.envs.server_handling.managed_server import ManagedServer
-
-        managed = ManagedServer(server=self.server, tokenizer=self.tokenizer)
-        try:
-            yield managed
-        finally:
-            managed.reset()
-
-    def _tool_descriptions_text(self) -> str:
-        if not self.tools:
-            return "(no tools available)"
-
-        parts: List[str] = []
-        for tool in self.tools:
-            fn = (tool or {}).get("function", {})
-            name = fn.get("name", "")
-            desc = (fn.get("description") or "").strip()
-            if not name:
-                continue
-            if desc:
-                parts.append(f"- {name}: {desc}")
-            else:
-                parts.append(f"- {name}")
-        return "\n".join(parts) if parts else "(no tools available)"
-
-    def _build_system_prompt(self, system_message: Optional[str]) -> Optional[str]:
-        tool_prompt = ATROPOS_TOOL_SYSTEM_PROMPT.format(
-            tool_descriptions=self._tool_descriptions_text()
-        )
-
-        parts: List[str] = []
-        if system_message:
-            parts.append(system_message)
-        if self.ephemeral_system_prompt:
-            parts.append(self.ephemeral_system_prompt)
-        parts.append(tool_prompt)
-
-        return "\n\n".join(parts)
-
-    def _parse_tool_calls(self, content: str) -> Tuple[List[Tuple[str, Dict[str, Any]]], List[str]]:
-        """
-        Returns:
-          (calls, errors)
-        """
-        calls: List[Tuple[str, Dict[str, Any]]] = []
-        errors: List[str] = []
-
-        for raw in _TOOL_CALL_RE.findall(content or ""):
-            try:
-                payload = json.loads(raw)
-            except json.JSONDecodeError as exc:
-                errors.append(f"Invalid JSON inside <tool_call>: {exc}")
-                continue
-
-            name = payload.get("name")
-            args = payload.get("arguments", {})
-            if not isinstance(name, str) or not name:
-                errors.append("Tool call missing 'name' string")
-                continue
-            if not isinstance(args, dict):
-                errors.append("Tool call 'arguments' must be an object")
-                continue
-
-            calls.append((name, args))
-
-        return calls, errors
-
-    async def run_conversation_async(
-        self,
-        user_message: str,
-        system_message: Optional[str] = None,
-        conversation_history: Optional[List[Dict[str, Any]]] = None,
-        task_id: Optional[str] = None,
-    ) -> Dict[str, Any]:
-        import uuid
-
-        effective_task_id = task_id or str(uuid.uuid4())
-
-        messages: List[Dict[str, Any]] = conversation_history.copy() if conversation_history else []
-        messages.append({"role": "user", "content": user_message})
-
-        active_system_prompt = self._build_system_prompt(system_message)
-
-        api_call_count = 0
-        final_response: Optional[str] = None
-        managed_state: Optional[Dict[str, Any]] = None
-        completed = False
-
-        try:
-            async with self._managed() as managed:
-                while api_call_count < self.max_iterations:
-                    api_call_count += 1
-
-                    api_messages = messages.copy()
-                    if active_system_prompt:
-                        api_messages = [{"role": "system", "content": active_system_prompt}] + api_messages
-
-                    chat_kwargs: Dict[str, Any] = {"messages": api_messages, "n": 1}
-                    if self.max_tokens is not None:
-                        chat_kwargs["max_tokens"] = self.max_tokens
-                    if self.temperature is not None:
-                        chat_kwargs["temperature"] = self.temperature
-
-                    # Prefer OpenAI tool calling when supported by the backend:
-                    # - Many providers normalize Hermes-style <tool_call> tags into tool_calls when `tools` is provided.
-                    # - ManagedServer (atroposlib) does prompt->completion conversion and does not support `tools`.
-                    #   Only pass `tools` when we're calling an OpenAI-compatible chat endpoint directly.
-                    tool_schemas = self.tools if self.tools else None
-                    managed_cls = type(managed).__name__
-                    if tool_schemas and managed_cls != "ManagedServer":
-                        chat_kwargs["tools"] = tool_schemas
-
-                    if os.getenv("HERMES_DEBUG_ATROPOS_REQUEST") == "1":
-                        meta = {
-                            "managed_type": managed_cls,
-                            "model": getattr(getattr(managed, "config", None), "model_name", self.model),
-                            "base_url": getattr(getattr(managed, "config", None), "base_url", None),
-                            "kwargs": chat_kwargs,
-                        }
-                        # Avoid dumping megabytes of data accidentally.
-                        # (Messages can be large; this is still "full" but bounded.)
-                        print("\n=== HERMES_DEBUG_ATROPOS_REQUEST ===", flush=True)
-                        print(json.dumps(meta, ensure_ascii=False, indent=2)[:200_000], flush=True)
-
-                    response = await managed.chat_completion(**chat_kwargs)
-
-                    if os.getenv("HERMES_DEBUG_ATROPOS_RESPONSE") == "1":
-                        try:
-                            dumped = response.model_dump()  # openai pydantic model
-                        except Exception:
-                            dumped = getattr(response, "__dict__", {"repr": repr(response)})
-                        print("\n=== HERMES_DEBUG_ATROPOS_RESPONSE: ChatCompletion (raw) ===", flush=True)
-                        print(json.dumps(dumped, ensure_ascii=False, indent=2), flush=True)
-
-                    if hasattr(managed, "get_state"):
-                        managed_state = managed.get_state()
-
-                    msg = response.choices[0].message
-                    assistant_content = (msg.content or "")
-                    msg_reasoning = getattr(msg, "reasoning", None)
-
-                    # Use tool_calls if the backend provides them (preferred).
-                    structured_tool_calls = getattr(msg, "tool_calls", None)
-
-                    # If the backend emits content="" but includes useful text in reasoning,
-                    # use it for parsing *only if needed* (e.g. tool tags).
-                    if assistant_content == "" and isinstance(msg_reasoning, str) and msg_reasoning:
-                        if os.getenv("HERMES_DEBUG_ATROPOS_RESPONSE") == "1":
-                            print("\n=== HERMES_DEBUG_ATROPOS_RESPONSE: message.reasoning present (content empty) ===", flush=True)
-                            print(msg_reasoning, flush=True)
-
-                    assistant_msg: Dict[str, Any] = {"role": "assistant", "content": assistant_content}
-                    if structured_tool_calls:
-                        # Preserve tool_calls so the next request is consistent with OpenAI protocol.
-                        try:
-                            assistant_msg["tool_calls"] = [
-                                {
-                                    "id": tc.id,
-                                    "type": tc.type,
-                                    "function": {"name": tc.function.name, "arguments": tc.function.arguments},
-                                }
-                                for tc in structured_tool_calls
-                            ]
-                        except Exception:
-                            # Best-effort; keep conversation moving.
-                            pass
-                    messages.append(assistant_msg)
-
-                    # Mode A: OpenAI tool calling (preferred when supported)
-                    if structured_tool_calls:
-                        for tc in structured_tool_calls:
-                            tool_start = time.time()
-                            try:
-                                tool_args = json.loads(tc.function.arguments or "{}")
-                            except Exception:
-                                tool_args = {}
-                            tool_result = handle_function_call(tc.function.name, tool_args, effective_task_id)
-                            tool_duration = time.time() - tool_start
-
-                            # Keep the raw tool result as tool content (OpenAI protocol expects role=tool).
-                            messages.append(
-                                {
-                                    "role": "tool",
-                                    "tool_call_id": tc.id,
-                                    "content": tool_result,
-                                }
-                            )
-
-                            if self.tool_delay and self.tool_delay > 0:
-                                await asyncio.sleep(self.tool_delay)
-
-                        # Continue loop after tool execution.
-                        continue
-
-                    # Mode B: Hermes XML tool tags in assistant text (fallback).
-                    parse_source = assistant_content or (msg_reasoning or "")
-                    tool_calls, parse_errors = self._parse_tool_calls(parse_source)
-
-                    if parse_errors and not tool_calls:
-                        # Ask the model to retry with valid tool JSON.
-                        err_text = "; ".join(parse_errors[:3])
-                        messages.append(
-                            {
-                                "role": "user",
-                                "content": (
-                                    f"<tool_response>{json.dumps({'error': err_text}, ensure_ascii=False)}</tool_response>\n"
-                                    "The previous <tool_call> blocks were invalid. Please output valid JSON inside <tool_call>."
-                                ),
-                            }
-                        )
-                        continue
-
-                    if not tool_calls:
-                        # No tool calls: treat as final answer.
-                        final_response = (assistant_content or "").strip()
-                        completed = True
-                        break
-
-                    tool_responses: List[str] = []
-                    for tool_name, tool_args in tool_calls:
-                        tool_start = time.time()
-                        tool_result = handle_function_call(tool_name, tool_args, effective_task_id)
-                        tool_duration = time.time() - tool_start
-
-                        try:
-                            parsed = json.loads(tool_result)
-                            payload: Any = parsed
-                        except Exception:
-                            payload = tool_result
-
-                        tool_payload = {
-                            "name": tool_name,
-                            "duration_s": round(tool_duration, 3),
-                            "result": payload,
-                        }
-                        tool_responses.append(
-                            f"<tool_response>{json.dumps(tool_payload, ensure_ascii=False)}</tool_response>"
-                        )
-
-                        if self.tool_delay and self.tool_delay > 0:
-                            await asyncio.sleep(self.tool_delay)
-
-                    messages.append({"role": "user", "content": "\n".join(tool_responses)})
-
-                if final_response is None:
-                    final_response = "I've reached the maximum number of iterations."
-
-        finally:
-            try:
-                cleanup_vm(effective_task_id)
-            except Exception:
-                pass
-
-        # Save trajectory using Hermes formatting (optional).
-        self._save_trajectory(messages, user_message, completed=completed)
-
-        return {
-            "final_response": final_response,
-            "messages": messages,
-            "api_calls": api_call_count,
-            "completed": completed,
-            "managed_state": managed_state,
-            "system_prompt": active_system_prompt,
-            "task_id": effective_task_id,
-        }
-
-    def run_conversation(self, *args: Any, **kwargs: Any) -> Dict[str, Any]:
-        """
-        Sync wrapper for convenience.
-
-        If called from within a running event loop (e.g. prompt_toolkit), this
-        runs the async conversation in a dedicated thread to avoid nested loops.
-        """
-        try:
-            asyncio.get_running_loop()
-        except RuntimeError:
-            return asyncio.run(self.run_conversation_async(*args, **kwargs))
-
-        import queue
-        import threading
-
-        out: "queue.Queue[object]" = queue.Queue(maxsize=1)
-
-        def runner() -> None:
-            try:
-                out.put(asyncio.run(self.run_conversation_async(*args, **kwargs)))
-            except BaseException as exc:  # noqa: BLE001
-                out.put(exc)
-
-        thread = threading.Thread(target=runner, daemon=True)
-        thread.start()
-
-        result = out.get()
-        if isinstance(result, BaseException):
-            raise result
-        return result  # type: ignore[return-value]
--- a/cli-config.yaml.example
+++ b/cli-config.yaml.example
@@ -23,9 +23,12 @@ model:
 # OPTION 1: Local execution (default)
 # Commands run directly on your machine in the current directory
 # -----------------------------------------------------------------------------
+# Working directory behavior:
+#   - CLI (`hermes` command): Uses "." (current directory where you run hermes)
+#   - Messaging (Telegram/Discord): Uses MESSAGING_CWD from .env (default: home)
 terminal:
  env_type: "local"
-  cwd: "."  # Use "." for current directory, or specify absolute path
+  cwd: "."  # CLI working directory - "." means current directory
  timeout: 180
  lifetime_seconds: 300
  # sudo_password: ""  # Enable sudo commands (pipes via sudo -S) - SECURITY WARNING: plaintext!
@@ -55,7 +58,7 @@ terminal:
 #   cwd: "/workspace"
 #   timeout: 180
 #   lifetime_seconds: 300
-#   docker_image: "python:3.11"
+#   docker_image: "nikolaik/python-nodejs:python3.11-nodejs20"

 # -----------------------------------------------------------------------------
 # OPTION 4: Singularity/Apptainer container
@@ -67,7 +70,7 @@ terminal:
 #   cwd: "/workspace"
 #   timeout: 180
 #   lifetime_seconds: 300
-#   singularity_image: "docker://python:3.11"
+#   singularity_image: "docker://nikolaik/python-nodejs:python3.11-nodejs20"

 # -----------------------------------------------------------------------------
 # OPTION 5: Modal cloud execution
@@ -79,7 +82,7 @@ terminal:
 #   cwd: "/workspace"
 #   timeout: 180
 #   lifetime_seconds: 300
-#   modal_image: "python:3.11"
+#   modal_image: "nikolaik/python-nodejs:python3.11-nodejs20"

 # -----------------------------------------------------------------------------
 # SUDO SUPPORT (works with ALL backends above)
@@ -112,12 +115,41 @@ browser:
  # after this period of no activity between agent loops (default: 120 = 2 minutes)
  inactivity_timeout: 120

+# =============================================================================
+# Context Compression (Auto-shrinks long conversations)
+# =============================================================================
+# When conversation approaches model's context limit, middle turns are
+# automatically summarized to free up space while preserving important context.
+#
+# HOW IT WORKS:
+# 1. Tracks actual token usage from API responses (not estimates)
+# 2. When prompt_tokens >= threshold% of model's context_length, triggers compression
+# 3. Protects first 3 turns (system prompt, initial request, first response)
+# 4. Protects last 4 turns (recent context is most relevant)
+# 5. Summarizes middle turns using a fast/cheap model
+# 6. Inserts summary as a user message, continues conversation seamlessly
+#
+compression:
+  # Enable automatic context compression (default: true)
+  # Set to false if you prefer to manage context manually or want errors on overflow
+  enabled: true
+  
+  # Trigger compression at this % of model's context limit (default: 0.85 = 85%)
+  # Lower values = more aggressive compression, higher values = compress later
+  threshold: 0.85
+  
+  # Model to use for generating summaries (fast/cheap recommended)
+  # This model compresses the middle turns into a concise summary
+  summary_model: "google/gemini-2.0-flash-001"
+
 # =============================================================================
 # Agent Behavior
 # =============================================================================
 agent:
-  # Maximum conversation turns before stopping
-  max_turns: 20
+  # Maximum tool-calling iterations per conversation
+  # Higher = more room for complex tasks, but costs more tokens
+  # Recommended: 20-30 for focused tasks, 50-100 for open exploration
+  max_turns: 60
  
  # Enable verbose logging
  verbose: false
--- a/cli.py
+++ b/cli.py
--- a/cron/init.py
+++ b/cron/init.py
@@ -0,0 +1,36 @@
+"""
+Cron job scheduling system for Hermes Agent.
+
+This module provides scheduled task execution, allowing the agent to:
+- Run automated tasks on schedules (cron expressions, intervals, one-shot)
+- Self-schedule reminders and follow-up tasks
+- Execute tasks in isolated sessions (no prior context)
+
+Usage:
+    # Run due jobs (for system cron integration)
+    python -c "from cron import tick; tick()"
+    
+    # Or via CLI
+    python cli.py --cron-daemon
+"""
+
+from cron.jobs import (
+    create_job,
+    get_job,
+    list_jobs,
+    remove_job,
+    update_job,
+    JOBS_FILE,
+)
+from cron.scheduler import tick, run_daemon
+
+__all__ = [
+    "create_job",
+    "get_job", 
+    "list_jobs",
+    "remove_job",
+    "update_job",
+    "tick",
+    "run_daemon",
+    "JOBS_FILE",
+]
--- a/cron/jobs.py
+++ b/cron/jobs.py
@@ -0,0 +1,383 @@
+"""
+Cron job storage and management.
+
+Jobs are stored in ~/.hermes/cron/jobs.json
+Output is saved to ~/.hermes/cron/output/{job_id}/{timestamp}.md
+"""
+
+import json
+import os
+import re
+import uuid
+from datetime import datetime, timedelta
+from pathlib import Path
+from typing import Optional, Dict, List, Any
+
+try:
+    from croniter import croniter
+    HAS_CRONITER = True
+except ImportError:
+    HAS_CRONITER = False
+
+# =============================================================================
+# Configuration
+# =============================================================================
+
+HERMES_DIR = Path.home() / ".hermes"
+CRON_DIR = HERMES_DIR / "cron"
+JOBS_FILE = CRON_DIR / "jobs.json"
+OUTPUT_DIR = CRON_DIR / "output"
+
+
+def ensure_dirs():
+    """Ensure cron directories exist."""
+    CRON_DIR.mkdir(parents=True, exist_ok=True)
+    OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
+
+
+# =============================================================================
+# Schedule Parsing
+# =============================================================================
+
+def parse_duration(s: str) -> int:
+    """
+    Parse duration string into minutes.
+    
+    Examples:
+        "30m" → 30
+        "2h" → 120
+        "1d" → 1440
+    """
+    s = s.strip().lower()
+    match = re.match(r'^(\d+)\s*(m|min|mins|minute|minutes|h|hr|hrs|hour|hours|d|day|days)$', s)
+    if not match:
+        raise ValueError(f"Invalid duration: '{s}'. Use format like '30m', '2h', or '1d'")
+    
+    value = int(match.group(1))
+    unit = match.group(2)[0]  # First char: m, h, or d
+    
+    multipliers = {'m': 1, 'h': 60, 'd': 1440}
+    return value * multipliers[unit]
+
+
+def parse_schedule(schedule: str) -> Dict[str, Any]:
+    """
+    Parse schedule string into structured format.
+    
+    Returns dict with:
+        - kind: "once" | "interval" | "cron"
+        - For "once": "run_at" (ISO timestamp)
+        - For "interval": "minutes" (int)
+        - For "cron": "expr" (cron expression)
+    
+    Examples:
+        "30m"              → once in 30 minutes
+        "2h"               → once in 2 hours
+        "every 30m"        → recurring every 30 minutes
+        "every 2h"         → recurring every 2 hours
+        "0 9 * * *"        → cron expression
+        "2026-02-03T14:00" → once at timestamp
+    """
+    schedule = schedule.strip()
+    original = schedule
+    schedule_lower = schedule.lower()
+    
+    # "every X" pattern → recurring interval
+    if schedule_lower.startswith("every "):
+        duration_str = schedule[6:].strip()
+        minutes = parse_duration(duration_str)
+        return {
+            "kind": "interval",
+            "minutes": minutes,
+            "display": f"every {minutes}m"
+        }
+    
+    # Check for cron expression (5 or 6 space-separated fields)
+    # Cron fields: minute hour day month weekday [year]
+    parts = schedule.split()
+    if len(parts) >= 5 and all(
+        re.match(r'^[\d\*\-,/]+$', p) for p in parts[:5]
+    ):
+        if not HAS_CRONITER:
+            raise ValueError("Cron expressions require 'croniter' package. Install with: pip install croniter")
+        # Validate cron expression
+        try:
+            croniter(schedule)
+        except Exception as e:
+            raise ValueError(f"Invalid cron expression '{schedule}': {e}")
+        return {
+            "kind": "cron",
+            "expr": schedule,
+            "display": schedule
+        }
+    
+    # ISO timestamp (contains T or looks like date)
+    if 'T' in schedule or re.match(r'^\d{4}-\d{2}-\d{2}', schedule):
+        try:
+            # Parse and validate
+            dt = datetime.fromisoformat(schedule.replace('Z', '+00:00'))
+            return {
+                "kind": "once",
+                "run_at": dt.isoformat(),
+                "display": f"once at {dt.strftime('%Y-%m-%d %H:%M')}"
+            }
+        except ValueError as e:
+            raise ValueError(f"Invalid timestamp '{schedule}': {e}")
+    
+    # Duration like "30m", "2h", "1d" → one-shot from now
+    try:
+        minutes = parse_duration(schedule)
+        run_at = datetime.now() + timedelta(minutes=minutes)
+        return {
+            "kind": "once",
+            "run_at": run_at.isoformat(),
+            "display": f"once in {original}"
+        }
+    except ValueError:
+        pass
+    
+    raise ValueError(
+        f"Invalid schedule '{original}'. Use:\n"
+        f"  - Duration: '30m', '2h', '1d' (one-shot)\n"
+        f"  - Interval: 'every 30m', 'every 2h' (recurring)\n"
+        f"  - Cron: '0 9 * * *' (cron expression)\n"
+        f"  - Timestamp: '2026-02-03T14:00:00' (one-shot at time)"
+    )
+
+
+def compute_next_run(schedule: Dict[str, Any], last_run_at: Optional[str] = None) -> Optional[str]:
+    """
+    Compute the next run time for a schedule.
+    
+    Returns ISO timestamp string, or None if no more runs.
+    """
+    now = datetime.now()
+    
+    if schedule["kind"] == "once":
+        run_at = datetime.fromisoformat(schedule["run_at"])
+        # If in the future, return it; if in the past, no more runs
+        return schedule["run_at"] if run_at > now else None
+    
+    elif schedule["kind"] == "interval":
+        minutes = schedule["minutes"]
+        if last_run_at:
+            # Next run is last_run + interval
+            last = datetime.fromisoformat(last_run_at)
+            next_run = last + timedelta(minutes=minutes)
+        else:
+            # First run is now + interval
+            next_run = now + timedelta(minutes=minutes)
+        return next_run.isoformat()
+    
+    elif schedule["kind"] == "cron":
+        if not HAS_CRONITER:
+            return None
+        cron = croniter(schedule["expr"], now)
+        next_run = cron.get_next(datetime)
+        return next_run.isoformat()
+    
+    return None
+
+
+# =============================================================================
+# Job CRUD Operations
+# =============================================================================
+
+def load_jobs() -> List[Dict[str, Any]]:
+    """Load all jobs from storage."""
+    ensure_dirs()
+    if not JOBS_FILE.exists():
+        return []
+    
+    try:
+        with open(JOBS_FILE, 'r', encoding='utf-8') as f:
+            data = json.load(f)
+            return data.get("jobs", [])
+    except (json.JSONDecodeError, IOError):
+        return []
+
+
+def save_jobs(jobs: List[Dict[str, Any]]):
+    """Save all jobs to storage."""
+    ensure_dirs()
+    with open(JOBS_FILE, 'w', encoding='utf-8') as f:
+        json.dump({"jobs": jobs, "updated_at": datetime.now().isoformat()}, f, indent=2)
+
+
+def create_job(
+    prompt: str,
+    schedule: str,
+    name: Optional[str] = None,
+    repeat: Optional[int] = None,
+    deliver: Optional[str] = None,
+    origin: Optional[Dict[str, Any]] = None
+) -> Dict[str, Any]:
+    """
+    Create a new cron job.
+    
+    Args:
+        prompt: The prompt to run (must be self-contained)
+        schedule: Schedule string (see parse_schedule)
+        name: Optional friendly name
+        repeat: How many times to run (None = forever, 1 = once)
+        deliver: Where to deliver output ("origin", "local", "telegram", etc.)
+        origin: Source info where job was created (for "origin" delivery)
+    
+    Returns:
+        The created job dict
+    """
+    parsed_schedule = parse_schedule(schedule)
+    
+    # Auto-set repeat=1 for one-shot schedules if not specified
+    if parsed_schedule["kind"] == "once" and repeat is None:
+        repeat = 1
+    
+    # Default delivery to origin if available, otherwise local
+    if deliver is None:
+        deliver = "origin" if origin else "local"
+    
+    job_id = uuid.uuid4().hex[:12]
+    now = datetime.now().isoformat()
+    
+    job = {
+        "id": job_id,
+        "name": name or prompt[:50].strip(),
+        "prompt": prompt,
+        "schedule": parsed_schedule,
+        "schedule_display": parsed_schedule.get("display", schedule),
+        "repeat": {
+            "times": repeat,  # None = forever
+            "completed": 0
+        },
+        "enabled": True,
+        "created_at": now,
+        "next_run_at": compute_next_run(parsed_schedule),
+        "last_run_at": None,
+        "last_status": None,
+        "last_error": None,
+        # Delivery configuration
+        "deliver": deliver,
+        "origin": origin,  # Tracks where job was created for "origin" delivery
+    }
+    
+    jobs = load_jobs()
+    jobs.append(job)
+    save_jobs(jobs)
+    
+    return job
+
+
+def get_job(job_id: str) -> Optional[Dict[str, Any]]:
+    """Get a job by ID."""
+    jobs = load_jobs()
+    for job in jobs:
+        if job["id"] == job_id:
+            return job
+    return None
+
+
+def list_jobs(include_disabled: bool = False) -> List[Dict[str, Any]]:
+    """List all jobs, optionally including disabled ones."""
+    jobs = load_jobs()
+    if not include_disabled:
+        jobs = [j for j in jobs if j.get("enabled", True)]
+    return jobs
+
+
+def update_job(job_id: str, updates: Dict[str, Any]) -> Optional[Dict[str, Any]]:
+    """Update a job by ID."""
+    jobs = load_jobs()
+    for i, job in enumerate(jobs):
+        if job["id"] == job_id:
+            jobs[i] = {**job, **updates}
+            save_jobs(jobs)
+            return jobs[i]
+    return None
+
+
+def remove_job(job_id: str) -> bool:
+    """Remove a job by ID."""
+    jobs = load_jobs()
+    original_len = len(jobs)
+    jobs = [j for j in jobs if j["id"] != job_id]
+    if len(jobs) < original_len:
+        save_jobs(jobs)
+        return True
+    return False
+
+
+def mark_job_run(job_id: str, success: bool, error: Optional[str] = None):
+    """
+    Mark a job as having been run.
+    
+    Updates last_run_at, last_status, increments completed count,
+    computes next_run_at, and auto-deletes if repeat limit reached.
+    """
+    jobs = load_jobs()
+    for i, job in enumerate(jobs):
+        if job["id"] == job_id:
+            now = datetime.now().isoformat()
+            job["last_run_at"] = now
+            job["last_status"] = "ok" if success else "error"
+            job["last_error"] = error if not success else None
+            
+            # Increment completed count
+            if job.get("repeat"):
+                job["repeat"]["completed"] = job["repeat"].get("completed", 0) + 1
+                
+                # Check if we've hit the repeat limit
+                times = job["repeat"].get("times")
+                completed = job["repeat"]["completed"]
+                if times is not None and completed >= times:
+                    # Remove the job (limit reached)
+                    jobs.pop(i)
+                    save_jobs(jobs)
+                    return
+            
+            # Compute next run
+            job["next_run_at"] = compute_next_run(job["schedule"], now)
+            
+            # If no next run (one-shot completed), disable
+            if job["next_run_at"] is None:
+                job["enabled"] = False
+            
+            save_jobs(jobs)
+            return
+    
+    save_jobs(jobs)
+
+
+def get_due_jobs() -> List[Dict[str, Any]]:
+    """Get all jobs that are due to run now."""
+    now = datetime.now()
+    jobs = load_jobs()
+    due = []
+    
+    for job in jobs:
+        if not job.get("enabled", True):
+            continue
+        
+        next_run = job.get("next_run_at")
+        if not next_run:
+            continue
+        
+        next_run_dt = datetime.fromisoformat(next_run)
+        if next_run_dt <= now:
+            due.append(job)
+    
+    return due
+
+
+def save_job_output(job_id: str, output: str):
+    """Save job output to file."""
+    ensure_dirs()
+    job_output_dir = OUTPUT_DIR / job_id
+    job_output_dir.mkdir(parents=True, exist_ok=True)
+    
+    timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
+    output_file = job_output_dir / f"{timestamp}.md"
+    
+    with open(output_file, 'w', encoding='utf-8') as f:
+        f.write(output)
+    
+    return output_file
--- a/cron/scheduler.py
+++ b/cron/scheduler.py
@@ -0,0 +1,188 @@
+"""
+Cron job scheduler - executes due jobs.
+
+This module provides:
+- tick(): Run all due jobs once (for system cron integration)
+- run_daemon(): Run continuously, checking every 60 seconds
+"""
+
+import os
+import sys
+import time
+import traceback
+from datetime import datetime
+from pathlib import Path
+from typing import Optional
+
+# Add parent directory to path for imports
+sys.path.insert(0, str(Path(__file__).parent.parent))
+
+from cron.jobs import get_due_jobs, mark_job_run, save_job_output
+
+
+def run_job(job: dict) -> tuple[bool, str, Optional[str]]:
+    """
+    Execute a single cron job.
+    
+    Returns:
+        Tuple of (success, output, error_message)
+    """
+    from run_agent import AIAgent
+    
+    job_id = job["id"]
+    job_name = job["name"]
+    prompt = job["prompt"]
+    
+    print(f"[cron] Running job '{job_name}' (ID: {job_id})")
+    print(f"[cron] Prompt: {prompt[:100]}{'...' if len(prompt) > 100 else ''}")
+    
+    try:
+        # Create agent with default settings
+        # Jobs run in isolated sessions (no prior context)
+        agent = AIAgent(
+            model=os.getenv("HERMES_MODEL", "anthropic/claude-sonnet-4"),
+            quiet_mode=True,
+            session_id=f"cron_{job_id}_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
+        )
+        
+        # Run the conversation
+        result = agent.run_conversation(prompt)
+        
+        # Extract final response
+        final_response = result.get("final_response", "")
+        if not final_response:
+            final_response = "(No response generated)"
+        
+        # Build output document
+        output = f"""# Cron Job: {job_name}
+
+**Job ID:** {job_id}
+**Run Time:** {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}
+**Schedule:** {job.get('schedule_display', 'N/A')}
+
+## Prompt
+
+{prompt}
+
+## Response
+
+{final_response}
+"""
+        
+        print(f"[cron] Job '{job_name}' completed successfully")
+        return True, output, None
+        
+    except Exception as e:
+        error_msg = f"{type(e).__name__}: {str(e)}"
+        print(f"[cron] Job '{job_name}' failed: {error_msg}")
+        
+        # Build error output
+        output = f"""# Cron Job: {job_name} (FAILED)
+
+**Job ID:** {job_id}
+**Run Time:** {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}
+**Schedule:** {job.get('schedule_display', 'N/A')}
+
+## Prompt
+
+{prompt}
+
+## Error
+
+```
+{error_msg}
+
+{traceback.format_exc()}
+```
+"""
+        return False, output, error_msg
+
+
+def tick(verbose: bool = True) -> int:
+    """
+    Check and run all due jobs.
+    
+    This is designed to be called by system cron every minute:
+        */1 * * * * cd ~/hermes-agent && python -c "from cron import tick; tick()"
+    
+    Args:
+        verbose: Whether to print status messages
+    
+    Returns:
+        Number of jobs executed
+    """
+    due_jobs = get_due_jobs()
+    
+    if verbose and not due_jobs:
+        print(f"[cron] {datetime.now().strftime('%H:%M:%S')} - No jobs due")
+        return 0
+    
+    if verbose:
+        print(f"[cron] {datetime.now().strftime('%H:%M:%S')} - {len(due_jobs)} job(s) due")
+    
+    executed = 0
+    for job in due_jobs:
+        try:
+            success, output, error = run_job(job)
+            
+            # Save output to file
+            output_file = save_job_output(job["id"], output)
+            if verbose:
+                print(f"[cron] Output saved to: {output_file}")
+            
+            # Mark job as run (handles repeat counting, next_run computation)
+            mark_job_run(job["id"], success, error)
+            executed += 1
+            
+        except Exception as e:
+            print(f"[cron] Error processing job {job['id']}: {e}")
+            mark_job_run(job["id"], False, str(e))
+    
+    return executed
+
+
+def run_daemon(check_interval: int = 60, verbose: bool = True):
+    """
+    Run the cron daemon continuously.
+    
+    Checks for due jobs every `check_interval` seconds.
+    
+    Args:
+        check_interval: Seconds between checks (default: 60)
+        verbose: Whether to print status messages
+    """
+    print(f"[cron] Starting daemon (checking every {check_interval}s)")
+    print(f"[cron] Press Ctrl+C to stop")
+    print()
+    
+    try:
+        while True:
+            try:
+                tick(verbose=verbose)
+            except Exception as e:
+                print(f"[cron] Tick error: {e}")
+            
+            time.sleep(check_interval)
+            
+    except KeyboardInterrupt:
+        print("\n[cron] Daemon stopped")
+
+
+if __name__ == "__main__":
+    # Allow running directly: python cron/scheduler.py [daemon|tick]
+    import argparse
+    
+    parser = argparse.ArgumentParser(description="Hermes Cron Scheduler")
+    parser.add_argument("mode", choices=["daemon", "tick"], default="tick", nargs="?",
+                        help="Mode: 'tick' to run once, 'daemon' to run continuously")
+    parser.add_argument("--interval", type=int, default=60,
+                        help="Check interval in seconds for daemon mode")
+    parser.add_argument("--quiet", "-q", action="store_true",
+                        help="Suppress status messages")
+    
+    args = parser.parse_args()
+    
+    if args.mode == "daemon":
+        run_daemon(check_interval=args.interval, verbose=not args.quiet)
+    else:
+        tick(verbose=not args.quiet)
--- a/docs/MODAL_BACKEND.md
+++ b/docs/MODAL_BACKEND.md
@@ -1,224 +0,0 @@
-# Modal Backend
-
-Hermes Agent uses [Modal](https://modal.com) for scalable, isolated cloud execution environments. There are two Modal integrations:
-
-1. **Terminal Tool** (`tools/terminal_tool.py`) - For CLI/agent command execution
-2. **Atropos Backend** (`atropos/backends/modal_backend.py`) - For batch RL training workloads
-
-
-
---
-
-## Terminal Tool (CLI/Agent)
-
-The terminal tool provides a simple interface for executing commands in Modal sandboxes.
-
-### Configuration
-
-Set environment variables:
-
-```bash
-export TERMINAL_ENV=modal
-export TERMINAL_MODAL_IMAGE=python:3.11
-export TERMINAL_MODAL_APP_NAME=hermes-sandbox
-```
-
-Or use a YAML config file (`modal_profiles.yaml`):
-
-```yaml
-profiles:
-  default:
-    image: python:3.11
-    cpu: 1.0
-    memory: 2048
-    min_pool: 1
-    max_pool: 5
-    idle_timeout: 120
-
-  gpu:
-    image: pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime
-    gpu: T4
-    memory: 16384
-    min_pool: 0
-    max_pool: 2
-```
-
-### Features
-
-| Feature | Description |
-|---------|-------------|
-| **Sandbox Pool** | Pre-warmed sandboxes for low latency |
-| **Auto-scaling** | Grows/shrinks pool based on demand |
-| **Idle Timeout** | Sandboxes auto-terminate when unused |
-| **Profile Selection** | Different configs for different workloads |
-| **Credential Injection** | `modal.Secret` integration |
-
-### Usage
-
-```python
-from tools.terminal_tool import terminal_tool
-
-# Simple command
-output = terminal_tool("echo hello", task_id="my-task")
-
-# With profile selection
-output = terminal_tool("python train.py", task_id="training", profile="gpu")
-
-# Cleanup when done
-from tools.terminal_tool import cleanup_vm
-cleanup_vm("my-task")
-```
-
-### Architecture
-
-```
-_ModalPoolManager (singleton)
-    ├── "default" pool → [sandbox-0, sandbox-1, ...]
-    └── "gpu" pool     → [sandbox-0, ...]
-
-Each pool:
-  - Maintains min_pool warm sandboxes
-  - Scales up to max_pool on demand  
-  - Background thread scales down idle sandboxes
-```
-
---
-
-## Atropos Backend (RL Training)
-
-The Atropos backend is designed for high-throughput batch execution during reinforcement learning training.
-
-### Key Concept: Slot-based Multiplexing
-
-Instead of one sandbox per trajectory, multiple trajectories share sandboxes via **slots**:
-
-```
-Sandbox (1 container)
-    ├── Slot 0 → Trajectory A (workspace: /data/slot_0)
-    ├── Slot 1 → Trajectory B (workspace: /data/slot_1)
-    └── Slot 2 → Trajectory C (workspace: /data/slot_2)
-```
-
-**Benefits**:
- Fewer containers = lower cost
- Shared warm-up time
- Better GPU utilization
-
-### Configuration
-
-```python
-from atropos.backends.modal_backend import ModalSandboxConfig, ModalToolBackend
-
-config = ModalSandboxConfig(
-    name="default",
-    image="python:3.11",
-    cpu=1.0,
-    memory=2048,
-    slots_per_sandbox=10,  # 10 trajectories per container
-    min_sandboxes=1,
-    max_sandboxes=5,
-)
-
-backend = ModalToolBackend(config.with_app_name("my-training"))
-```
-
-### Multi-Profile Support
-
-Different trajectory types can request different resources:
-
-```python
-backend = ModalToolBackend.with_profiles(
-    app_name="rl-training",
-    profiles={
-        "default": ModalSandboxConfig(
-            name="default",
-            cpu=1.0,
-            memory=2048,
-        ),
-        "pytorch-gpu": ModalSandboxConfig(
-            name="pytorch-gpu",
-            image="pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime",
-            gpu="T4",
-            memory=16384,
-        ),
-    }
-)
-
-# CPU task
-slot1 = await backend.acquire("traj-1", profile="default")
-
-# GPU task
-slot2 = await backend.acquire("traj-2", profile="pytorch-gpu")
-```
-
-### Batched Execution
-
-The key optimization - execute many commands in parallel:
-
-```python
-# Acquire slots for multiple trajectories
-slots = [await backend.acquire(f"traj-{i}") for i in range(50)]
-
-# Execute batch across all slots in parallel
-results = await backend.execute_batch([
-    (slot, "bash", {"command": "python step.py"})
-    for slot in slots
-])
-
-# Release slots
-for slot in slots:
-    await backend.release(slot)
-```
-
-### Architecture
-
-```
-ModalToolBackend
-    └── _ModalMultiProfileManager
-            ├── "default" → _ModalSandboxPool
-            │                   ├── Sandbox 0 (slots 0-9)
-            │                   └── Sandbox 1 (slots 0-9)
-            │
-            └── "pytorch-gpu" → _ModalSandboxPool
-                                    └── Sandbox 0 (slots 0-9)
-```
-
---
-
-## Credentials
-
-Inject secrets securely using Modal's secret management:
-
-```bash
-# Create secret in Modal dashboard or CLI
-modal secret create my-api-key API_KEY=sk-xxx
-```
-
-```python
-# Reference in config
-config = ModalSandboxConfig(
-    secrets=["my-api-key"],  # Modal secret names
-    env_vars={"DEBUG": "1"},  # Additional env vars
-)
-```
-
-## Troubleshooting
-
-### "Modal package not installed"
-```bash
-pip install modal
-modal token new  # Authenticate
-```
-
-### "Sandbox creation failed"
- Check Modal dashboard for quota limits
- Verify image exists and is accessible
- Check secret names are correct
-
-### Shutdown errors
-These are harmless warnings during Python interpreter shutdown:
-```
-[Modal] Error terminating ...: cannot schedule new futures after interpreter shutdown
-```
-
-The sandboxes will auto-terminate via Modal's idle_timeout anyway.
--- a/docs/cli.md
+++ b/docs/cli.md
@@ -250,6 +250,38 @@ This is useful for:
 - Replaying conversations
 - Training data inspection

+### Context Compression
+
+Long conversations can exceed model context limits. The CLI automatically compresses context when approaching the limit:
+
+```yaml
+# In cli-config.yaml
+compression:
+  enabled: true                    # Enable auto-compression
+  threshold: 0.85                  # Compress at 85% of context limit  
+  summary_model: "google/gemini-2.0-flash-001"
+```
+
+**How it works:**
+1. Tracks actual token usage from each API response
+2. When tokens reach threshold, middle turns are summarized
+3. First 3 and last 4 turns are always protected
+4. Conversation continues seamlessly after compression
+
+**When compression triggers:**
+```
+📦 Context compression triggered (170,000 tokens ≥ 170,000 threshold)
+   📊 Model context limit: 200,000 tokens (85% = 170,000)
+   🗜️  Summarizing turns 4-15 (12 turns)
+   ✅ Compressed: 20 → 9 messages (~45,000 tokens saved)
+```
+
+To disable compression:
+```yaml
+compression:
+  enabled: false
+```
+
 ## Quiet Mode

 The CLI runs in "quiet mode" (`HERMES_QUIET=1`), which:
--- a/docs/messaging.md
+++ b/docs/messaging.md
@@ -0,0 +1,515 @@
+# Messaging Platform Integrations (Gateway)
+
+Hermes Agent can connect to messaging platforms like Telegram, Discord, and WhatsApp to serve as a conversational AI assistant.
+
+## Quick Start
+
+```bash
+# 1. Set your bot token(s) in .env file
+echo 'TELEGRAM_BOT_TOKEN="your_telegram_bot_token"' >> .env
+echo 'DISCORD_BOT_TOKEN="your_discord_bot_token"' >> .env
+
+# 2. Test the gateway (foreground)
+./scripts/hermes-gateway run
+
+# 3. Install as a system service (runs in background)
+./scripts/hermes-gateway install
+
+# 4. Manage the service
+./scripts/hermes-gateway start
+./scripts/hermes-gateway stop
+./scripts/hermes-gateway restart
+./scripts/hermes-gateway status
+```
+
+**Quick test (without service install):**
+```bash
+python cli.py --gateway  # Runs in foreground, useful for debugging
+```
+
+## Architecture Overview
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                      Hermes Gateway                             │
+├─────────────────────────────────────────────────────────────────┤
+│                                                                 │
+│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐          │
+│  │   Telegram   │  │   Discord    │  │   WhatsApp   │          │
+│  │   Adapter    │  │   Adapter    │  │   Adapter    │          │
+│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘          │
+│         │                 │                 │                   │
+│         └─────────────────┼─────────────────┘                   │
+│                           │                                     │
+│                  ┌────────▼────────┐                            │
+│                  │  Session Store  │                            │
+│                  │  (per-chat)     │                            │
+│                  └────────┬────────┘                            │
+│                           │                                     │
+│                  ┌────────▼────────┐                            │
+│                  │   AIAgent       │                            │
+│                  │   (run_agent)   │                            │
+│                  └─────────────────┘                            │
+│                                                                 │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+## Session Management
+
+### Session Persistence
+
+Sessions persist across messages until they reset. The agent remembers your conversation context.
+
+### Reset Policies
+
+Sessions reset based on configurable policies:
+
+| Policy | Default | Description |
+|--------|---------|-------------|
+| Daily | 4:00 AM | Reset at a specific hour each day |
+| Idle | 120 min | Reset after N minutes of inactivity |
+| Both | (combined) | Whichever triggers first |
+
+### Manual Reset
+
+Send `/new` or `/reset` as a message to start fresh.
+
+### Per-Platform Overrides
+
+Configure different reset policies per platform:
+
+```json
+{
+  "reset_by_platform": {
+    "telegram": { "mode": "idle", "idle_minutes": 240 },
+    "discord": { "mode": "idle", "idle_minutes": 60 }
+  }
+}
+```
+
+## Platform Setup
+
+### Telegram
+
+1. **Create a bot** via [@BotFather](https://t.me/BotFather)
+2. **Get your token** (looks like `123456789:ABCdefGHIjklMNOpqrsTUVwxyz`)
+3. **Set environment variable:**
+   ```bash
+   export TELEGRAM_BOT_TOKEN="your_token_here"
+   ```
+4. **Optional: Set home channel** for cron job delivery:
+   ```bash
+   export TELEGRAM_HOME_CHANNEL="-1001234567890"
+   export TELEGRAM_HOME_CHANNEL_NAME="My Notes"
+   ```
+
+**Requirements:**
+```bash
+pip install python-telegram-bot>=20.0
+```
+
+### Discord
+
+1. **Create an application** at [Discord Developer Portal](https://discord.com/developers/applications)
+2. **Create a bot** under your application
+3. **Get the bot token**
+4. **Enable required intents:**
+   - Message Content Intent
+   - Server Members Intent (optional)
+5. **Invite to your server** using OAuth2 URL generator (scopes: `bot`, `applications.commands`)
+6. **Set environment variable:**
+   ```bash
+   export DISCORD_BOT_TOKEN="your_token_here"
+   ```
+7. **Optional: Set home channel:**
+   ```bash
+   export DISCORD_HOME_CHANNEL="123456789012345678"
+   export DISCORD_HOME_CHANNEL_NAME="#bot-updates"
+   ```
+
+**Requirements:**
+```bash
+pip install discord.py>=2.0
+```
+
+### WhatsApp
+
+WhatsApp integration is more complex due to the lack of a simple bot API.
+
+**Options:**
+1. **WhatsApp Business API** (requires Meta verification)
+2. **whatsapp-web.js** via Node.js bridge (for personal accounts)
+
+**Bridge Setup:**
+1. Install Node.js
+2. Set up the bridge script (see `scripts/whatsapp-bridge/` for reference)
+3. Configure in gateway:
+   ```json
+   {
+     "platforms": {
+       "whatsapp": {
+         "enabled": true,
+         "extra": {
+           "bridge_script": "/path/to/bridge.js",
+           "bridge_port": 3000
+         }
+       }
+     }
+   }
+   ```
+
+## Configuration
+
+There are **three ways** to configure the gateway (in order of precedence):
+
+### 1. Environment Variables (`.env` file) - Recommended for Quick Setup
+
+Add to your `~/.hermes/.env` file:
+
+```bash
+# =============================================================================
+# MESSAGING PLATFORM TOKENS
+# =============================================================================
+
+# Telegram - get from @BotFather on Telegram
+TELEGRAM_BOT_TOKEN=your_telegram_bot_token
+TELEGRAM_ALLOWED_USERS=123456789,987654321    # Security: restrict to these user IDs
+
+# Optional: Default channel for cron job delivery
+TELEGRAM_HOME_CHANNEL=-1001234567890
+TELEGRAM_HOME_CHANNEL_NAME="My Notes"
+
+# Discord - get from Discord Developer Portal
+DISCORD_BOT_TOKEN=your_discord_bot_token
+DISCORD_ALLOWED_USERS=123456789012345678      # Security: restrict to these user IDs
+
+# Optional: Default channel for cron job delivery
+DISCORD_HOME_CHANNEL=123456789012345678
+DISCORD_HOME_CHANNEL_NAME="#bot-updates"
+
+# WhatsApp - requires Node.js bridge setup
+WHATSAPP_ENABLED=true
+
+# =============================================================================
+# AGENT SETTINGS
+# =============================================================================
+
+# Max tool-calling iterations per conversation (default: 60)
+HERMES_MAX_ITERATIONS=60
+
+# Working directory for terminal commands (default: home ~)
+MESSAGING_CWD=/home/myuser
+
+# =============================================================================
+# TOOL PROGRESS NOTIFICATIONS
+# =============================================================================
+
+# Show progress messages as agent uses tools
+HERMES_TOOL_PROGRESS=true
+
+# Mode: "new" (only when tool changes) or "all" (every tool call)
+HERMES_TOOL_PROGRESS_MODE=new
+
+# =============================================================================
+# SESSION SETTINGS
+# =============================================================================
+
+# Reset sessions after N minutes of inactivity (default: 120)
+SESSION_IDLE_MINUTES=120
+
+# Daily reset hour in 24h format (default: 4 = 4am)
+SESSION_RESET_HOUR=4
+```
+
+### 2. Gateway Config File (`~/.hermes/gateway.json`) - Full Control
+
+For advanced configuration, create `~/.hermes/gateway.json`:
+
+```json
+{
+  "platforms": {
+    "telegram": {
+      "enabled": true,
+      "token": "your_telegram_token",
+      "home_channel": {
+        "platform": "telegram",
+        "chat_id": "-1001234567890",
+        "name": "My Notes"
+      }
+    },
+    "discord": {
+      "enabled": true,
+      "token": "your_discord_token",
+      "home_channel": {
+        "platform": "discord",
+        "chat_id": "123456789012345678",
+        "name": "#bot-updates"
+      }
+    }
+  },
+  "default_reset_policy": {
+    "mode": "both",
+    "at_hour": 4,
+    "idle_minutes": 120
+  },
+  "reset_by_platform": {
+    "discord": {
+      "mode": "idle",
+      "idle_minutes": 60
+    }
+  },
+  "always_log_local": true
+}
+```
+
+## Platform-Specific Toolsets
+
+Each platform has its own toolset for security:
+
+| Platform | Toolset | Capabilities |
+|----------|---------|--------------|
+| CLI | `hermes-cli` | Full access (terminal, browser, etc.) |
+| Telegram | `hermes-telegram` | Full tools including terminal |
+| Discord | `hermes-discord` | Full tools including terminal |
+| WhatsApp | `hermes-whatsapp` | Full tools including terminal |
+
+## User Experience Features
+
+### Typing Indicator
+
+The gateway keeps the "typing..." indicator active throughout processing, refreshing every 4 seconds. This lets users know the bot is working even during long tool-calling sequences.
+
+### Tool Progress Notifications
+
+When `HERMES_TOOL_PROGRESS=true`, the bot sends status messages as it works:
+
+```
+💻 `ls -la`...
+🔍 web_search...
+📄 web_extract...
+🎨 image_generate...
+```
+
+Terminal commands show the actual command (truncated to 50 chars). Other tools just show the tool name.
+
+**Modes:**
+- `new`: Only sends message when switching to a different tool (less spam)
+- `all`: Sends message for every single tool call
+
+### Working Directory
+
+- **CLI (`hermes` command)**: Uses current directory where you run the command
+- **Messaging**: Uses `MESSAGING_CWD` (default: home directory `~`)
+
+This is intentional: CLI users are in a terminal and expect the agent to work in their current directory, while messaging users need a consistent starting location.
+
+### Max Iterations
+
+If the agent hits the max iteration limit while working, instead of a generic error, it asks the model to summarize what it found so far. This gives you a useful response even when the task couldn't be fully completed.
+
+## Cron Job Delivery
+
+When scheduling cron jobs, you can specify where the output should be delivered:
+
+```
+User: "Remind me to check the server in 30 minutes"
+
+Agent uses: schedule_cronjob(
+  prompt="Check server status...",
+  schedule="30m",
+  deliver="origin"  # Back to this chat
+)
+```
+
+### Delivery Options
+
+| Option | Description |
+|--------|-------------|
+| `"origin"` | Back to where the job was created |
+| `"local"` | Save to local files only |
+| `"telegram"` | Telegram home channel |
+| `"discord"` | Discord home channel |
+| `"telegram:123456"` | Specific Telegram chat |
+
+## Dynamic Context Injection
+
+The agent knows where it is via injected context:
+
+```
+## Current Session Context
+
+**Source:** Telegram (group: Dev Team, ID: -1001234567890)
+**Connected Platforms:** local, telegram, discord
+
+**Home Channels:**
+  - telegram: My Notes (ID: -1001234567890)
+  - discord: #bot-updates (ID: 123456789012345678)
+
+**Delivery options for scheduled tasks:**
+- "origin" → Back to this chat (Dev Team)
+- "local" → Save to local files only
+- "telegram" → Home channel (My Notes)
+- "discord" → Home channel (#bot-updates)
+```
+
+## CLI Commands
+
+| Command | Description |
+|---------|-------------|
+| `/platforms` | Show gateway configuration and status |
+| `--gateway` | Start the gateway (CLI flag) |
+
+## Troubleshooting
+
+### "python-telegram-bot not installed"
+
+```bash
+pip install python-telegram-bot>=20.0
+```
+
+### "discord.py not installed"
+
+```bash
+pip install discord.py>=2.0
+```
+
+### "No platforms connected"
+
+1. Check your environment variables are set
+2. Check your tokens are valid
+3. Try `/platforms` to see configuration status
+
+### Session not persisting
+
+1. Check `~/.hermes/sessions/` exists
+2. Check session policies aren't too aggressive
+3. Verify no errors in gateway logs
+
+## Adding a New Platform
+
+To add a new messaging platform:
+
+### 1. Create the adapter
+
+Create `gateway/platforms/your_platform.py`:
+
+```python
+from gateway.platforms.base import BasePlatformAdapter, MessageEvent, SendResult
+from gateway.config import Platform, PlatformConfig
+
+class YourPlatformAdapter(BasePlatformAdapter):
+    def __init__(self, config: PlatformConfig):
+        super().__init__(config, Platform.YOUR_PLATFORM)
+    
+    async def connect(self) -> bool:
+        # Connect to the platform
+        ...
+    
+    async def disconnect(self) -> None:
+        # Disconnect
+        ...
+    
+    async def send(self, chat_id: str, content: str, ...) -> SendResult:
+        # Send a message
+        ...
+    
+    async def get_chat_info(self, chat_id: str) -> Dict[str, Any]:
+        # Get chat information
+        ...
+```
+
+### 2. Register the platform
+
+Add to `gateway/config.py`:
+
+```python
+class Platform(Enum):
+    # ... existing ...
+    YOUR_PLATFORM = "your_platform"
+```
+
+### 3. Add to gateway runner
+
+Update `gateway/run.py` `_create_adapter()`:
+
+```python
+elif platform == Platform.YOUR_PLATFORM:
+    from gateway.platforms.your_platform import YourPlatformAdapter
+    return YourPlatformAdapter(config)
+```
+
+### 4. Create a toolset (optional)
+
+Add to `toolsets.py`:
+
+```python
+"hermes-your-platform": {
+    "description": "Your platform toolset",
+    "tools": [...],
+    "includes": []
+}
+```
+
+### 5. Configure
+
+Add environment variables to `.env`:
+
+```bash
+YOUR_PLATFORM_TOKEN=...
+YOUR_PLATFORM_HOME_CHANNEL=...
+```
+
+## Service Management
+
+### Linux (systemd)
+
+```bash
+# Install as user service
+./scripts/hermes-gateway install
+
+# Manage
+systemctl --user start hermes-gateway
+systemctl --user stop hermes-gateway
+systemctl --user restart hermes-gateway
+systemctl --user status hermes-gateway
+
+# View logs
+journalctl --user -u hermes-gateway -f
+
+# Enable lingering (keeps running after logout)
+sudo loginctl enable-linger $USER
+```
+
+### macOS (launchd)
+
+```bash
+# Install
+./scripts/hermes-gateway install
+
+# Manage
+launchctl start ai.hermes.gateway
+launchctl stop ai.hermes.gateway
+
+# View logs
+tail -f ~/.hermes/logs/gateway.log
+```
+
+### Manual (any platform)
+
+```bash
+# Run in foreground (for testing/debugging)
+./scripts/hermes-gateway run
+
+# Or via CLI (also foreground)
+python cli.py --gateway
+```
+
+## Storage Locations
+
+| Path | Purpose |
+|------|---------|
+| `~/.hermes/gateway.json` | Gateway configuration |
+| `~/.hermes/sessions/sessions.json` | Session index |
+| `~/.hermes/sessions/{id}.jsonl` | Conversation transcripts |
+| `~/.hermes/cron/output/` | Cron job outputs |
+| `~/.hermes/logs/gateway.log` | Gateway logs (macOS launchd) |
--- a/gateway/init.py
+++ b/gateway/init.py
@@ -0,0 +1,35 @@
+"""
+Hermes Gateway - Multi-platform messaging integration.
+
+This module provides a unified gateway for connecting the Hermes agent
+to various messaging platforms (Telegram, Discord, WhatsApp) with:
+- Session management (persistent conversations with reset policies)
+- Dynamic context injection (agent knows where messages come from)
+- Delivery routing (cron job outputs to appropriate channels)
+- Platform-specific toolsets (different capabilities per platform)
+"""
+
+from .config import GatewayConfig, PlatformConfig, HomeChannel, load_gateway_config
+from .session import (
+    SessionContext,
+    SessionStore,
+    SessionResetPolicy,
+    build_session_context_prompt,
+)
+from .delivery import DeliveryRouter, DeliveryTarget
+
+__all__ = [
+    # Config
+    "GatewayConfig",
+    "PlatformConfig", 
+    "HomeChannel",
+    "load_gateway_config",
+    # Session
+    "SessionContext",
+    "SessionStore",
+    "SessionResetPolicy",
+    "build_session_context_prompt",
+    # Delivery
+    "DeliveryRouter",
+    "DeliveryTarget",
+]
--- a/gateway/config.py
+++ b/gateway/config.py
@@ -0,0 +1,333 @@
+"""
+Gateway configuration management.
+
+Handles loading and validating configuration for:
+- Connected platforms (Telegram, Discord, WhatsApp)
+- Home channels for each platform
+- Session reset policies
+- Delivery preferences
+"""
+
+import os
+import json
+from pathlib import Path
+from dataclasses import dataclass, field
+from typing import Dict, List, Optional, Any
+from enum import Enum
+
+
+class Platform(Enum):
+    """Supported messaging platforms."""
+    LOCAL = "local"
+    TELEGRAM = "telegram"
+    DISCORD = "discord"
+    WHATSAPP = "whatsapp"
+
+
+@dataclass
+class HomeChannel:
+    """
+    Default destination for a platform.
+    
+    When a cron job specifies deliver="telegram" without a specific chat ID,
+    messages are sent to this home channel.
+    """
+    platform: Platform
+    chat_id: str
+    name: str  # Human-readable name for display
+    
+    def to_dict(self) -> Dict[str, Any]:
+        return {
+            "platform": self.platform.value,
+            "chat_id": self.chat_id,
+            "name": self.name,
+        }
+    
+    @classmethod
+    def from_dict(cls, data: Dict[str, Any]) -> "HomeChannel":
+        return cls(
+            platform=Platform(data["platform"]),
+            chat_id=str(data["chat_id"]),
+            name=data.get("name", "Home"),
+        )
+
+
+@dataclass
+class SessionResetPolicy:
+    """
+    Controls when sessions reset (lose context).
+    
+    Modes:
+    - "daily": Reset at a specific hour each day
+    - "idle": Reset after N minutes of inactivity
+    - "both": Whichever triggers first (daily boundary OR idle timeout)
+    """
+    mode: str = "both"  # "daily", "idle", or "both"
+    at_hour: int = 4  # Hour for daily reset (0-23, local time)
+    idle_minutes: int = 120  # Minutes of inactivity before reset
+    
+    def to_dict(self) -> Dict[str, Any]:
+        return {
+            "mode": self.mode,
+            "at_hour": self.at_hour,
+            "idle_minutes": self.idle_minutes,
+        }
+    
+    @classmethod
+    def from_dict(cls, data: Dict[str, Any]) -> "SessionResetPolicy":
+        return cls(
+            mode=data.get("mode", "both"),
+            at_hour=data.get("at_hour", 4),
+            idle_minutes=data.get("idle_minutes", 120),
+        )
+
+
+@dataclass
+class PlatformConfig:
+    """Configuration for a single messaging platform."""
+    enabled: bool = False
+    token: Optional[str] = None  # Bot token (Telegram, Discord)
+    api_key: Optional[str] = None  # API key if different from token
+    home_channel: Optional[HomeChannel] = None
+    
+    # Platform-specific settings
+    extra: Dict[str, Any] = field(default_factory=dict)
+    
+    def to_dict(self) -> Dict[str, Any]:
+        result = {
+            "enabled": self.enabled,
+            "extra": self.extra,
+        }
+        if self.token:
+            result["token"] = self.token
+        if self.api_key:
+            result["api_key"] = self.api_key
+        if self.home_channel:
+            result["home_channel"] = self.home_channel.to_dict()
+        return result
+    
+    @classmethod
+    def from_dict(cls, data: Dict[str, Any]) -> "PlatformConfig":
+        home_channel = None
+        if "home_channel" in data:
+            home_channel = HomeChannel.from_dict(data["home_channel"])
+        
+        return cls(
+            enabled=data.get("enabled", False),
+            token=data.get("token"),
+            api_key=data.get("api_key"),
+            home_channel=home_channel,
+            extra=data.get("extra", {}),
+        )
+
+
+@dataclass
+class GatewayConfig:
+    """
+    Main gateway configuration.
+    
+    Manages all platform connections, session policies, and delivery settings.
+    """
+    # Platform configurations
+    platforms: Dict[Platform, PlatformConfig] = field(default_factory=dict)
+    
+    # Session reset policies by type
+    default_reset_policy: SessionResetPolicy = field(default_factory=SessionResetPolicy)
+    reset_by_type: Dict[str, SessionResetPolicy] = field(default_factory=dict)
+    reset_by_platform: Dict[Platform, SessionResetPolicy] = field(default_factory=dict)
+    
+    # Reset trigger commands
+    reset_triggers: List[str] = field(default_factory=lambda: ["/new", "/reset"])
+    
+    # Storage paths
+    sessions_dir: Path = field(default_factory=lambda: Path.home() / ".hermes" / "sessions")
+    
+    # Delivery settings
+    always_log_local: bool = True  # Always save cron outputs to local files
+    
+    def get_connected_platforms(self) -> List[Platform]:
+        """Return list of platforms that are enabled and configured."""
+        connected = []
+        for platform, config in self.platforms.items():
+            if config.enabled and (config.token or config.api_key):
+                connected.append(platform)
+        return connected
+    
+    def get_home_channel(self, platform: Platform) -> Optional[HomeChannel]:
+        """Get the home channel for a platform."""
+        config = self.platforms.get(platform)
+        if config:
+            return config.home_channel
+        return None
+    
+    def get_reset_policy(
+        self, 
+        platform: Optional[Platform] = None,
+        session_type: Optional[str] = None
+    ) -> SessionResetPolicy:
+        """
+        Get the appropriate reset policy for a session.
+        
+        Priority: platform override > type override > default
+        """
+        # Platform-specific override takes precedence
+        if platform and platform in self.reset_by_platform:
+            return self.reset_by_platform[platform]
+        
+        # Type-specific override (dm, group, thread)
+        if session_type and session_type in self.reset_by_type:
+            return self.reset_by_type[session_type]
+        
+        return self.default_reset_policy
+    
+    def to_dict(self) -> Dict[str, Any]:
+        return {
+            "platforms": {
+                p.value: c.to_dict() for p, c in self.platforms.items()
+            },
+            "default_reset_policy": self.default_reset_policy.to_dict(),
+            "reset_by_type": {
+                k: v.to_dict() for k, v in self.reset_by_type.items()
+            },
+            "reset_by_platform": {
+                p.value: v.to_dict() for p, v in self.reset_by_platform.items()
+            },
+            "reset_triggers": self.reset_triggers,
+            "sessions_dir": str(self.sessions_dir),
+            "always_log_local": self.always_log_local,
+        }
+    
+    @classmethod
+    def from_dict(cls, data: Dict[str, Any]) -> "GatewayConfig":
+        platforms = {}
+        for platform_name, platform_data in data.get("platforms", {}).items():
+            try:
+                platform = Platform(platform_name)
+                platforms[platform] = PlatformConfig.from_dict(platform_data)
+            except ValueError:
+                pass  # Skip unknown platforms
+        
+        reset_by_type = {}
+        for type_name, policy_data in data.get("reset_by_type", {}).items():
+            reset_by_type[type_name] = SessionResetPolicy.from_dict(policy_data)
+        
+        reset_by_platform = {}
+        for platform_name, policy_data in data.get("reset_by_platform", {}).items():
+            try:
+                platform = Platform(platform_name)
+                reset_by_platform[platform] = SessionResetPolicy.from_dict(policy_data)
+            except ValueError:
+                pass
+        
+        default_policy = SessionResetPolicy()
+        if "default_reset_policy" in data:
+            default_policy = SessionResetPolicy.from_dict(data["default_reset_policy"])
+        
+        sessions_dir = Path.home() / ".hermes" / "sessions"
+        if "sessions_dir" in data:
+            sessions_dir = Path(data["sessions_dir"])
+        
+        return cls(
+            platforms=platforms,
+            default_reset_policy=default_policy,
+            reset_by_type=reset_by_type,
+            reset_by_platform=reset_by_platform,
+            reset_triggers=data.get("reset_triggers", ["/new", "/reset"]),
+            sessions_dir=sessions_dir,
+            always_log_local=data.get("always_log_local", True),
+        )
+
+
+def load_gateway_config() -> GatewayConfig:
+    """
+    Load gateway configuration from multiple sources.
+    
+    Priority (highest to lowest):
+    1. Environment variables
+    2. ~/.hermes/gateway.json
+    3. cli-config.yaml gateway section
+    4. Defaults
+    """
+    config = GatewayConfig()
+    
+    # Try loading from ~/.hermes/gateway.json
+    gateway_config_path = Path.home() / ".hermes" / "gateway.json"
+    if gateway_config_path.exists():
+        try:
+            with open(gateway_config_path, "r") as f:
+                data = json.load(f)
+                config = GatewayConfig.from_dict(data)
+        except Exception as e:
+            print(f"[gateway] Warning: Failed to load {gateway_config_path}: {e}")
+    
+    # Override with environment variables
+    _apply_env_overrides(config)
+    
+    return config
+
+
+def _apply_env_overrides(config: GatewayConfig) -> None:
+    """Apply environment variable overrides to config."""
+    
+    # Telegram
+    telegram_token = os.getenv("TELEGRAM_BOT_TOKEN")
+    if telegram_token:
+        if Platform.TELEGRAM not in config.platforms:
+            config.platforms[Platform.TELEGRAM] = PlatformConfig()
+        config.platforms[Platform.TELEGRAM].enabled = True
+        config.platforms[Platform.TELEGRAM].token = telegram_token
+    
+    telegram_home = os.getenv("TELEGRAM_HOME_CHANNEL")
+    if telegram_home and Platform.TELEGRAM in config.platforms:
+        config.platforms[Platform.TELEGRAM].home_channel = HomeChannel(
+            platform=Platform.TELEGRAM,
+            chat_id=telegram_home,
+            name=os.getenv("TELEGRAM_HOME_CHANNEL_NAME", "Home"),
+        )
+    
+    # Discord
+    discord_token = os.getenv("DISCORD_BOT_TOKEN")
+    if discord_token:
+        if Platform.DISCORD not in config.platforms:
+            config.platforms[Platform.DISCORD] = PlatformConfig()
+        config.platforms[Platform.DISCORD].enabled = True
+        config.platforms[Platform.DISCORD].token = discord_token
+    
+    discord_home = os.getenv("DISCORD_HOME_CHANNEL")
+    if discord_home and Platform.DISCORD in config.platforms:
+        config.platforms[Platform.DISCORD].home_channel = HomeChannel(
+            platform=Platform.DISCORD,
+            chat_id=discord_home,
+            name=os.getenv("DISCORD_HOME_CHANNEL_NAME", "Home"),
+        )
+    
+    # WhatsApp (typically uses different auth mechanism)
+    whatsapp_enabled = os.getenv("WHATSAPP_ENABLED", "").lower() in ("true", "1", "yes")
+    if whatsapp_enabled:
+        if Platform.WHATSAPP not in config.platforms:
+            config.platforms[Platform.WHATSAPP] = PlatformConfig()
+        config.platforms[Platform.WHATSAPP].enabled = True
+    
+    # Session settings
+    idle_minutes = os.getenv("SESSION_IDLE_MINUTES")
+    if idle_minutes:
+        try:
+            config.default_reset_policy.idle_minutes = int(idle_minutes)
+        except ValueError:
+            pass
+    
+    reset_hour = os.getenv("SESSION_RESET_HOUR")
+    if reset_hour:
+        try:
+            config.default_reset_policy.at_hour = int(reset_hour)
+        except ValueError:
+            pass
+
+
+def save_gateway_config(config: GatewayConfig) -> None:
+    """Save gateway configuration to ~/.hermes/gateway.json."""
+    gateway_config_path = Path.home() / ".hermes" / "gateway.json"
+    gateway_config_path.parent.mkdir(parents=True, exist_ok=True)
+    
+    with open(gateway_config_path, "w") as f:
+        json.dump(config.to_dict(), f, indent=2)
--- a/gateway/delivery.py
+++ b/gateway/delivery.py
@@ -0,0 +1,318 @@
+"""
+Delivery routing for cron job outputs and agent responses.
+
+Routes messages to the appropriate destination based on:
+- Explicit targets (e.g., "telegram:123456789")
+- Platform home channels (e.g., "telegram" → home channel)
+- Origin (back to where the job was created)
+- Local (always saved to files)
+"""
+
+import json
+from pathlib import Path
+from datetime import datetime
+from dataclasses import dataclass
+from typing import Dict, List, Optional, Any, Union
+from enum import Enum
+
+from .config import Platform, GatewayConfig, HomeChannel
+from .session import SessionSource
+
+
+@dataclass
+class DeliveryTarget:
+    """
+    A single delivery target.
+    
+    Represents where a message should be sent:
+    - "origin" → back to source
+    - "local" → save to local files
+    - "telegram" → Telegram home channel
+    - "telegram:123456" → specific Telegram chat
+    """
+    platform: Platform
+    chat_id: Optional[str] = None  # None means use home channel
+    is_origin: bool = False
+    is_explicit: bool = False  # True if chat_id was explicitly specified
+    
+    @classmethod
+    def parse(cls, target: str, origin: Optional[SessionSource] = None) -> "DeliveryTarget":
+        """
+        Parse a delivery target string.
+        
+        Formats:
+        - "origin" → back to source
+        - "local" → local files only
+        - "telegram" → Telegram home channel
+        - "telegram:123456" → specific Telegram chat
+        """
+        target = target.strip().lower()
+        
+        if target == "origin":
+            if origin:
+                return cls(
+                    platform=origin.platform,
+                    chat_id=origin.chat_id,
+                    is_origin=True,
+                )
+            else:
+                # Fallback to local if no origin
+                return cls(platform=Platform.LOCAL, is_origin=True)
+        
+        if target == "local":
+            return cls(platform=Platform.LOCAL)
+        
+        # Check for platform:chat_id format
+        if ":" in target:
+            platform_str, chat_id = target.split(":", 1)
+            try:
+                platform = Platform(platform_str)
+                return cls(platform=platform, chat_id=chat_id, is_explicit=True)
+            except ValueError:
+                # Unknown platform, treat as local
+                return cls(platform=Platform.LOCAL)
+        
+        # Just a platform name (use home channel)
+        try:
+            platform = Platform(target)
+            return cls(platform=platform)
+        except ValueError:
+            # Unknown platform, treat as local
+            return cls(platform=Platform.LOCAL)
+    
+    def to_string(self) -> str:
+        """Convert back to string format."""
+        if self.is_origin:
+            return "origin"
+        if self.platform == Platform.LOCAL:
+            return "local"
+        if self.chat_id:
+            return f"{self.platform.value}:{self.chat_id}"
+        return self.platform.value
+
+
+class DeliveryRouter:
+    """
+    Routes messages to appropriate destinations.
+    
+    Handles the logic of resolving delivery targets and dispatching
+    messages to the right platform adapters.
+    """
+    
+    def __init__(self, config: GatewayConfig, adapters: Dict[Platform, Any] = None):
+        """
+        Initialize the delivery router.
+        
+        Args:
+            config: Gateway configuration
+            adapters: Dict mapping platforms to their adapter instances
+        """
+        self.config = config
+        self.adapters = adapters or {}
+        self.output_dir = Path.home() / ".hermes" / "cron" / "output"
+    
+    def resolve_targets(
+        self,
+        deliver: Union[str, List[str]],
+        origin: Optional[SessionSource] = None
+    ) -> List[DeliveryTarget]:
+        """
+        Resolve delivery specification to concrete targets.
+        
+        Args:
+            deliver: Delivery spec - "origin", "telegram", ["local", "discord"], etc.
+            origin: The source where the request originated (for "origin" target)
+        
+        Returns:
+            List of resolved delivery targets
+        """
+        if isinstance(deliver, str):
+            deliver = [deliver]
+        
+        targets = []
+        seen_platforms = set()
+        
+        for target_str in deliver:
+            target = DeliveryTarget.parse(target_str, origin)
+            
+            # Resolve home channel if needed
+            if target.chat_id is None and target.platform != Platform.LOCAL:
+                home = self.config.get_home_channel(target.platform)
+                if home:
+                    target.chat_id = home.chat_id
+                else:
+                    # No home channel configured, skip this platform
+                    continue
+            
+            # Deduplicate
+            key = (target.platform, target.chat_id)
+            if key not in seen_platforms:
+                seen_platforms.add(key)
+                targets.append(target)
+        
+        # Always include local if configured
+        if self.config.always_log_local:
+            local_key = (Platform.LOCAL, None)
+            if local_key not in seen_platforms:
+                targets.append(DeliveryTarget(platform=Platform.LOCAL))
+        
+        return targets
+    
+    async def deliver(
+        self,
+        content: str,
+        targets: List[DeliveryTarget],
+        job_id: Optional[str] = None,
+        job_name: Optional[str] = None,
+        metadata: Optional[Dict[str, Any]] = None
+    ) -> Dict[str, Any]:
+        """
+        Deliver content to all specified targets.
+        
+        Args:
+            content: The message/output to deliver
+            targets: List of delivery targets
+            job_id: Optional job ID (for cron jobs)
+            job_name: Optional job name
+            metadata: Additional metadata to include
+        
+        Returns:
+            Dict with delivery results per target
+        """
+        results = {}
+        
+        for target in targets:
+            try:
+                if target.platform == Platform.LOCAL:
+                    result = self._deliver_local(content, job_id, job_name, metadata)
+                else:
+                    result = await self._deliver_to_platform(target, content, metadata)
+                
+                results[target.to_string()] = {
+                    "success": True,
+                    "result": result
+                }
+            except Exception as e:
+                results[target.to_string()] = {
+                    "success": False,
+                    "error": str(e)
+                }
+        
+        return results
+    
+    def _deliver_local(
+        self,
+        content: str,
+        job_id: Optional[str],
+        job_name: Optional[str],
+        metadata: Optional[Dict[str, Any]]
+    ) -> Dict[str, Any]:
+        """Save content to local files."""
+        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
+        
+        if job_id:
+            output_path = self.output_dir / job_id / f"{timestamp}.md"
+        else:
+            output_path = self.output_dir / "misc" / f"{timestamp}.md"
+        
+        output_path.parent.mkdir(parents=True, exist_ok=True)
+        
+        # Build the output document
+        lines = []
+        if job_name:
+            lines.append(f"# {job_name}")
+        else:
+            lines.append("# Delivery Output")
+        
+        lines.append("")
+        lines.append(f"**Timestamp:** {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
+        
+        if job_id:
+            lines.append(f"**Job ID:** {job_id}")
+        
+        if metadata:
+            for key, value in metadata.items():
+                lines.append(f"**{key}:** {value}")
+        
+        lines.append("")
+        lines.append("---")
+        lines.append("")
+        lines.append(content)
+        
+        output_path.write_text("\n".join(lines))
+        
+        return {
+            "path": str(output_path),
+            "timestamp": timestamp
+        }
+    
+    async def _deliver_to_platform(
+        self,
+        target: DeliveryTarget,
+        content: str,
+        metadata: Optional[Dict[str, Any]]
+    ) -> Dict[str, Any]:
+        """Deliver content to a messaging platform."""
+        adapter = self.adapters.get(target.platform)
+        
+        if not adapter:
+            raise ValueError(f"No adapter configured for {target.platform.value}")
+        
+        if not target.chat_id:
+            raise ValueError(f"No chat ID for {target.platform.value} delivery")
+        
+        # Call the adapter's send method
+        # Adapters should implement: async def send(chat_id: str, content: str) -> Dict
+        return await adapter.send(target.chat_id, content, metadata=metadata)
+
+
+def parse_deliver_spec(
+    deliver: Optional[Union[str, List[str]]],
+    origin: Optional[SessionSource] = None,
+    default: str = "origin"
+) -> Union[str, List[str]]:
+    """
+    Normalize a delivery specification.
+    
+    If None or empty, returns the default.
+    """
+    if not deliver:
+        return default
+    return deliver
+
+
+def build_delivery_context_for_tool(
+    config: GatewayConfig,
+    origin: Optional[SessionSource] = None
+) -> Dict[str, Any]:
+    """
+    Build context for the schedule_cronjob tool to understand delivery options.
+    
+    This is passed to the tool so it can validate and explain delivery targets.
+    """
+    connected = config.get_connected_platforms()
+    
+    options = {
+        "origin": {
+            "description": "Back to where this job was created",
+            "available": origin is not None,
+        },
+        "local": {
+            "description": "Save to local files only",
+            "available": True,
+        }
+    }
+    
+    for platform in connected:
+        home = config.get_home_channel(platform)
+        options[platform.value] = {
+            "description": f"{platform.value.title()} home channel",
+            "available": True,
+            "home_channel": home.to_dict() if home else None,
+        }
+    
+    return {
+        "origin": origin.to_dict() if origin else None,
+        "options": options,
+        "always_log_local": config.always_log_local,
+    }
--- a/gateway/platforms/init.py
+++ b/gateway/platforms/init.py
@@ -0,0 +1,17 @@
+"""
+Platform adapters for messaging integrations.
+
+Each adapter handles:
+- Receiving messages from a platform
+- Sending messages/responses back
+- Platform-specific authentication
+- Message formatting and media handling
+"""
+
+from .base import BasePlatformAdapter, MessageEvent, SendResult
+
+__all__ = [
+    "BasePlatformAdapter",
+    "MessageEvent",
+    "SendResult",
+]
--- a/gateway/platforms/base.py
+++ b/gateway/platforms/base.py
@@ -0,0 +1,365 @@
+"""
+Base platform adapter interface.
+
+All platform adapters (Telegram, Discord, WhatsApp) inherit from this
+and implement the required methods.
+"""
+
+import asyncio
+from abc import ABC, abstractmethod
+from dataclasses import dataclass, field
+from datetime import datetime
+from typing import Dict, List, Optional, Any, Callable, Awaitable
+from enum import Enum
+
+import sys
+sys.path.insert(0, str(__file__).rsplit("/", 3)[0])
+
+from gateway.config import Platform, PlatformConfig
+from gateway.session import SessionSource
+
+
+class MessageType(Enum):
+    """Types of incoming messages."""
+    TEXT = "text"
+    PHOTO = "photo"
+    VIDEO = "video"
+    AUDIO = "audio"
+    VOICE = "voice"
+    DOCUMENT = "document"
+    STICKER = "sticker"
+    COMMAND = "command"  # /command style
+
+
+@dataclass
+class MessageEvent:
+    """
+    Incoming message from a platform.
+    
+    Normalized representation that all adapters produce.
+    """
+    # Message content
+    text: str
+    message_type: MessageType = MessageType.TEXT
+    
+    # Source information
+    source: SessionSource = None
+    
+    # Original platform data
+    raw_message: Any = None
+    message_id: Optional[str] = None
+    
+    # Media attachments
+    media_urls: List[str] = field(default_factory=list)
+    media_types: List[str] = field(default_factory=list)
+    
+    # Reply context
+    reply_to_message_id: Optional[str] = None
+    
+    # Timestamps
+    timestamp: datetime = field(default_factory=datetime.now)
+    
+    def is_command(self) -> bool:
+        """Check if this is a command message (e.g., /new, /reset)."""
+        return self.text.startswith("/")
+    
+    def get_command(self) -> Optional[str]:
+        """Extract command name if this is a command message."""
+        if not self.is_command():
+            return None
+        # Split on space and get first word, strip the /
+        parts = self.text.split(maxsplit=1)
+        return parts[0][1:].lower() if parts else None
+    
+    def get_command_args(self) -> str:
+        """Get the arguments after a command."""
+        if not self.is_command():
+            return self.text
+        parts = self.text.split(maxsplit=1)
+        return parts[1] if len(parts) > 1 else ""
+
+
+@dataclass 
+class SendResult:
+    """Result of sending a message."""
+    success: bool
+    message_id: Optional[str] = None
+    error: Optional[str] = None
+    raw_response: Any = None
+
+
+# Type for message handlers
+MessageHandler = Callable[[MessageEvent], Awaitable[Optional[str]]]
+
+
+class BasePlatformAdapter(ABC):
+    """
+    Base class for platform adapters.
+    
+    Subclasses implement platform-specific logic for:
+    - Connecting and authenticating
+    - Receiving messages
+    - Sending messages/responses
+    - Handling media
+    """
+    
+    def __init__(self, config: PlatformConfig, platform: Platform):
+        self.config = config
+        self.platform = platform
+        self._message_handler: Optional[MessageHandler] = None
+        self._running = False
+        
+        # Track active message handlers per session for interrupt support
+        # Key: session_key (e.g., chat_id), Value: (event, asyncio.Event for interrupt)
+        self._active_sessions: Dict[str, asyncio.Event] = {}
+        self._pending_messages: Dict[str, MessageEvent] = {}
+    
+    @property
+    def name(self) -> str:
+        """Human-readable name for this adapter."""
+        return self.platform.value.title()
+    
+    @property
+    def is_connected(self) -> bool:
+        """Check if adapter is currently connected."""
+        return self._running
+    
+    def set_message_handler(self, handler: MessageHandler) -> None:
+        """
+        Set the handler for incoming messages.
+        
+        The handler receives a MessageEvent and should return
+        an optional response string.
+        """
+        self._message_handler = handler
+    
+    @abstractmethod
+    async def connect(self) -> bool:
+        """
+        Connect to the platform and start receiving messages.
+        
+        Returns True if connection was successful.
+        """
+        pass
+    
+    @abstractmethod
+    async def disconnect(self) -> None:
+        """Disconnect from the platform."""
+        pass
+    
+    @abstractmethod
+    async def send(
+        self,
+        chat_id: str,
+        content: str,
+        reply_to: Optional[str] = None,
+        metadata: Optional[Dict[str, Any]] = None
+    ) -> SendResult:
+        """
+        Send a message to a chat.
+        
+        Args:
+            chat_id: The chat/channel ID to send to
+            content: Message content (may be markdown)
+            reply_to: Optional message ID to reply to
+            metadata: Additional platform-specific options
+        
+        Returns:
+            SendResult with success status and message ID
+        """
+        pass
+    
+    async def send_typing(self, chat_id: str) -> None:
+        """
+        Send a typing indicator.
+        
+        Override in subclasses if the platform supports it.
+        """
+        pass
+    
+    async def _keep_typing(self, chat_id: str, interval: float = 2.0) -> None:
+        """
+        Continuously send typing indicator until cancelled.
+        
+        Telegram/Discord typing status expires after ~5 seconds, so we refresh every 2
+        to recover quickly after progress messages interrupt it.
+        """
+        try:
+            while True:
+                await self.send_typing(chat_id)
+                await asyncio.sleep(interval)
+        except asyncio.CancelledError:
+            pass  # Normal cancellation when handler completes
+    
+    async def handle_message(self, event: MessageEvent) -> None:
+        """
+        Process an incoming message.
+        
+        This method returns quickly by spawning background tasks.
+        This allows new messages to be processed even while an agent is running,
+        enabling interruption support.
+        """
+        if not self._message_handler:
+            return
+        
+        session_key = event.source.chat_id
+        
+        # Check if there's already an active handler for this session
+        if session_key in self._active_sessions:
+            # Store this as a pending message - it will interrupt the running agent
+            print(f"[{self.name}] ⚡ New message while session {session_key} is active - triggering interrupt")
+            self._pending_messages[session_key] = event
+            # Signal the interrupt (the processing task checks this)
+            self._active_sessions[session_key].set()
+            return  # Don't process now - will be handled after current task finishes
+        
+        # Spawn background task to process this message
+        asyncio.create_task(self._process_message_background(event, session_key))
+    
+    async def _process_message_background(self, event: MessageEvent, session_key: str) -> None:
+        """Background task that actually processes the message."""
+        # Create interrupt event for this session
+        interrupt_event = asyncio.Event()
+        self._active_sessions[session_key] = interrupt_event
+        
+        # Start continuous typing indicator (refreshes every 2 seconds)
+        typing_task = asyncio.create_task(self._keep_typing(event.source.chat_id))
+        
+        try:
+            # Call the handler (this can take a while with tool calls)
+            response = await self._message_handler(event)
+            
+            # Send response if any
+            if response:
+                result = await self.send(
+                    chat_id=event.source.chat_id,
+                    content=response,
+                    reply_to=event.message_id
+                )
+                
+                # Log send failures (don't raise - user already saw tool progress)
+                if not result.success:
+                    print(f"[{self.name}] Failed to send response: {result.error}")
+                    # Try sending without markdown as fallback
+                    fallback_result = await self.send(
+                        chat_id=event.source.chat_id,
+                        content=f"(Response formatting failed, plain text:)\n\n{response[:3500]}",
+                        reply_to=event.message_id
+                    )
+                    if not fallback_result.success:
+                        print(f"[{self.name}] Fallback send also failed: {fallback_result.error}")
+            
+            # Check if there's a pending message that was queued during our processing
+            if session_key in self._pending_messages:
+                pending_event = self._pending_messages.pop(session_key)
+                print(f"[{self.name}] 📨 Processing queued message from interrupt")
+                # Clean up current session before processing pending
+                if session_key in self._active_sessions:
+                    del self._active_sessions[session_key]
+                typing_task.cancel()
+                try:
+                    await typing_task
+                except asyncio.CancelledError:
+                    pass
+                # Process pending message in new background task
+                await self._process_message_background(pending_event, session_key)
+                return  # Already cleaned up
+                
+        except Exception as e:
+            print(f"[{self.name}] Error handling message: {e}")
+            import traceback
+            traceback.print_exc()
+        finally:
+            # Stop typing indicator
+            typing_task.cancel()
+            try:
+                await typing_task
+            except asyncio.CancelledError:
+                pass
+            # Clean up session tracking
+            if session_key in self._active_sessions:
+                del self._active_sessions[session_key]
+    
+    def has_pending_interrupt(self, session_key: str) -> bool:
+        """Check if there's a pending interrupt for a session."""
+        return session_key in self._active_sessions and self._active_sessions[session_key].is_set()
+    
+    def get_pending_message(self, session_key: str) -> Optional[MessageEvent]:
+        """Get and clear any pending message for a session."""
+        return self._pending_messages.get(session_key)
+    
+    def build_source(
+        self,
+        chat_id: str,
+        chat_name: Optional[str] = None,
+        chat_type: str = "dm",
+        user_id: Optional[str] = None,
+        user_name: Optional[str] = None,
+        thread_id: Optional[str] = None
+    ) -> SessionSource:
+        """Helper to build a SessionSource for this platform."""
+        return SessionSource(
+            platform=self.platform,
+            chat_id=str(chat_id),
+            chat_name=chat_name,
+            chat_type=chat_type,
+            user_id=str(user_id) if user_id else None,
+            user_name=user_name,
+            thread_id=str(thread_id) if thread_id else None,
+        )
+    
+    @abstractmethod
+    async def get_chat_info(self, chat_id: str) -> Dict[str, Any]:
+        """
+        Get information about a chat/channel.
+        
+        Returns dict with at least:
+        - name: Chat name
+        - type: "dm", "group", "channel"
+        """
+        pass
+    
+    def format_message(self, content: str) -> str:
+        """
+        Format a message for this platform.
+        
+        Override in subclasses to handle platform-specific formatting
+        (e.g., Telegram MarkdownV2, Discord markdown).
+        
+        Default implementation returns content as-is.
+        """
+        return content
+    
+    def truncate_message(self, content: str, max_length: int = 4096) -> List[str]:
+        """
+        Split a long message into chunks.
+        
+        Args:
+            content: The full message content
+            max_length: Maximum length per chunk (platform-specific)
+        
+        Returns:
+            List of message chunks
+        """
+        if len(content) <= max_length:
+            return [content]
+        
+        chunks = []
+        while content:
+            if len(content) <= max_length:
+                chunks.append(content)
+                break
+            
+            # Try to split at a newline
+            split_idx = content.rfind("\n", 0, max_length)
+            if split_idx == -1:
+                # No newline, split at space
+                split_idx = content.rfind(" ", 0, max_length)
+            if split_idx == -1:
+                # No space either, hard split
+                split_idx = max_length
+            
+            chunks.append(content[:split_idx])
+            content = content[split_idx:].lstrip()
+        
+        return chunks
--- a/gateway/platforms/discord.py
+++ b/gateway/platforms/discord.py
@@ -0,0 +1,297 @@
+"""
+Discord platform adapter.
+
+Uses discord.py library for:
+- Receiving messages from servers and DMs
+- Sending responses back
+- Handling threads and channels
+"""
+
+import asyncio
+from typing import Dict, List, Optional, Any
+
+try:
+    import discord
+    from discord import Message as DiscordMessage, Intents
+    from discord.ext import commands
+    DISCORD_AVAILABLE = True
+except ImportError:
+    DISCORD_AVAILABLE = False
+    discord = None
+    DiscordMessage = Any
+    Intents = Any
+    commands = None
+
+import sys
+sys.path.insert(0, str(__file__).rsplit("/", 3)[0])
+
+from gateway.config import Platform, PlatformConfig
+from gateway.platforms.base import (
+    BasePlatformAdapter,
+    MessageEvent,
+    MessageType,
+    SendResult,
+)
+
+
+def check_discord_requirements() -> bool:
+    """Check if Discord dependencies are available."""
+    return DISCORD_AVAILABLE
+
+
+class DiscordAdapter(BasePlatformAdapter):
+    """
+    Discord bot adapter.
+    
+    Handles:
+    - Receiving messages from servers and DMs
+    - Sending responses with Discord markdown
+    - Thread support
+    - Slash commands (future)
+    """
+    
+    # Discord message limits
+    MAX_MESSAGE_LENGTH = 2000
+    
+    def __init__(self, config: PlatformConfig):
+        super().__init__(config, Platform.DISCORD)
+        self._client: Optional[commands.Bot] = None
+        self._ready_event = asyncio.Event()
+    
+    async def connect(self) -> bool:
+        """Connect to Discord and start receiving events."""
+        if not DISCORD_AVAILABLE:
+            print(f"[{self.name}] discord.py not installed. Run: pip install discord.py")
+            return False
+        
+        if not self.config.token:
+            print(f"[{self.name}] No bot token configured")
+            return False
+        
+        try:
+            # Set up intents
+            intents = Intents.default()
+            intents.message_content = True
+            intents.dm_messages = True
+            intents.guild_messages = True
+            
+            # Create bot
+            self._client = commands.Bot(
+                command_prefix="!",  # Not really used, we handle raw messages
+                intents=intents,
+            )
+            
+            # Register event handlers
+            @self._client.event
+            async def on_ready():
+                print(f"[{self.name}] Connected as {self._client.user}")
+                self._ready_event.set()
+            
+            @self._client.event
+            async def on_message(message: DiscordMessage):
+                # Ignore bot's own messages
+                if message.author == self._client.user:
+                    return
+                await self._handle_message(message)
+            
+            # Start the bot in background
+            asyncio.create_task(self._client.start(self.config.token))
+            
+            # Wait for ready
+            await asyncio.wait_for(self._ready_event.wait(), timeout=30)
+            
+            self._running = True
+            return True
+            
+        except asyncio.TimeoutError:
+            print(f"[{self.name}] Timeout waiting for connection")
+            return False
+        except Exception as e:
+            print(f"[{self.name}] Failed to connect: {e}")
+            return False
+    
+    async def disconnect(self) -> None:
+        """Disconnect from Discord."""
+        if self._client:
+            try:
+                await self._client.close()
+            except Exception as e:
+                print(f"[{self.name}] Error during disconnect: {e}")
+        
+        self._running = False
+        self._client = None
+        self._ready_event.clear()
+        print(f"[{self.name}] Disconnected")
+    
+    async def send(
+        self,
+        chat_id: str,
+        content: str,
+        reply_to: Optional[str] = None,
+        metadata: Optional[Dict[str, Any]] = None
+    ) -> SendResult:
+        """Send a message to a Discord channel."""
+        if not self._client:
+            return SendResult(success=False, error="Not connected")
+        
+        try:
+            # Get the channel
+            channel = self._client.get_channel(int(chat_id))
+            if not channel:
+                channel = await self._client.fetch_channel(int(chat_id))
+            
+            if not channel:
+                return SendResult(success=False, error=f"Channel {chat_id} not found")
+            
+            # Format and split message if needed
+            formatted = self.format_message(content)
+            chunks = self.truncate_message(formatted, self.MAX_MESSAGE_LENGTH)
+            
+            message_ids = []
+            reference = None
+            
+            if reply_to:
+                try:
+                    ref_msg = await channel.fetch_message(int(reply_to))
+                    reference = ref_msg
+                except Exception:
+                    pass  # Ignore if we can't find the referenced message
+            
+            for i, chunk in enumerate(chunks):
+                msg = await channel.send(
+                    content=chunk,
+                    reference=reference if i == 0 else None,
+                )
+                message_ids.append(str(msg.id))
+            
+            return SendResult(
+                success=True,
+                message_id=message_ids[0] if message_ids else None,
+                raw_response={"message_ids": message_ids}
+            )
+            
+        except Exception as e:
+            return SendResult(success=False, error=str(e))
+    
+    async def send_typing(self, chat_id: str) -> None:
+        """Send typing indicator."""
+        if self._client:
+            try:
+                channel = self._client.get_channel(int(chat_id))
+                if channel:
+                    await channel.typing()
+            except Exception:
+                pass  # Ignore typing indicator failures
+    
+    async def get_chat_info(self, chat_id: str) -> Dict[str, Any]:
+        """Get information about a Discord channel."""
+        if not self._client:
+            return {"name": "Unknown", "type": "dm"}
+        
+        try:
+            channel = self._client.get_channel(int(chat_id))
+            if not channel:
+                channel = await self._client.fetch_channel(int(chat_id))
+            
+            if not channel:
+                return {"name": str(chat_id), "type": "dm"}
+            
+            # Determine channel type
+            if isinstance(channel, discord.DMChannel):
+                chat_type = "dm"
+                name = channel.recipient.name if channel.recipient else str(chat_id)
+            elif isinstance(channel, discord.Thread):
+                chat_type = "thread"
+                name = channel.name
+            elif isinstance(channel, discord.TextChannel):
+                chat_type = "channel"
+                name = f"#{channel.name}"
+                if channel.guild:
+                    name = f"{channel.guild.name} / {name}"
+            else:
+                chat_type = "channel"
+                name = getattr(channel, "name", str(chat_id))
+            
+            return {
+                "name": name,
+                "type": chat_type,
+                "guild_id": str(channel.guild.id) if hasattr(channel, "guild") and channel.guild else None,
+                "guild_name": channel.guild.name if hasattr(channel, "guild") and channel.guild else None,
+            }
+        except Exception as e:
+            return {"name": str(chat_id), "type": "dm", "error": str(e)}
+    
+    def format_message(self, content: str) -> str:
+        """
+        Format message for Discord.
+        
+        Discord uses its own markdown variant.
+        """
+        # Discord markdown is fairly standard, no special escaping needed
+        return content
+    
+    async def _handle_message(self, message: DiscordMessage) -> None:
+        """Handle incoming Discord messages."""
+        # Determine message type
+        msg_type = MessageType.TEXT
+        if message.content.startswith("/"):
+            msg_type = MessageType.COMMAND
+        elif message.attachments:
+            # Check attachment types
+            for att in message.attachments:
+                if att.content_type:
+                    if att.content_type.startswith("image/"):
+                        msg_type = MessageType.PHOTO
+                    elif att.content_type.startswith("video/"):
+                        msg_type = MessageType.VIDEO
+                    elif att.content_type.startswith("audio/"):
+                        msg_type = MessageType.AUDIO
+                    else:
+                        msg_type = MessageType.DOCUMENT
+                    break
+        
+        # Determine chat type
+        if isinstance(message.channel, discord.DMChannel):
+            chat_type = "dm"
+            chat_name = message.author.name
+        elif isinstance(message.channel, discord.Thread):
+            chat_type = "thread"
+            chat_name = message.channel.name
+        else:
+            chat_type = "group"  # Treat server channels as groups
+            chat_name = getattr(message.channel, "name", str(message.channel.id))
+            if hasattr(message.channel, "guild") and message.channel.guild:
+                chat_name = f"{message.channel.guild.name} / #{chat_name}"
+        
+        # Get thread ID if in a thread
+        thread_id = None
+        if isinstance(message.channel, discord.Thread):
+            thread_id = str(message.channel.id)
+        
+        # Build source
+        source = self.build_source(
+            chat_id=str(message.channel.id),
+            chat_name=chat_name,
+            chat_type=chat_type,
+            user_id=str(message.author.id),
+            user_name=message.author.display_name,
+            thread_id=thread_id,
+        )
+        
+        # Build media URLs
+        media_urls = [att.url for att in message.attachments]
+        media_types = [att.content_type or "unknown" for att in message.attachments]
+        
+        event = MessageEvent(
+            text=message.content,
+            message_type=msg_type,
+            source=source,
+            raw_message=message,
+            message_id=str(message.id),
+            media_urls=media_urls,
+            media_types=media_types,
+            reply_to_message_id=str(message.reference.message_id) if message.reference else None,
+            timestamp=message.created_at,
+        )
+        
+        await self.handle_message(event)
--- a/gateway/platforms/telegram.py
+++ b/gateway/platforms/telegram.py
@@ -0,0 +1,298 @@
+"""
+Telegram platform adapter.
+
+Uses python-telegram-bot library for:
+- Receiving messages from users/groups
+- Sending responses back
+- Handling media and commands
+"""
+
+import asyncio
+from typing import Dict, List, Optional, Any
+
+try:
+    from telegram import Update, Bot, Message
+    from telegram.ext import (
+        Application,
+        CommandHandler,
+        MessageHandler as TelegramMessageHandler,
+        ContextTypes,
+        filters,
+    )
+    from telegram.constants import ParseMode, ChatType
+    TELEGRAM_AVAILABLE = True
+except ImportError:
+    TELEGRAM_AVAILABLE = False
+    Update = Any
+    Bot = Any
+    Message = Any
+    Application = Any
+    ContextTypes = Any
+
+import sys
+sys.path.insert(0, str(__file__).rsplit("/", 3)[0])
+
+from gateway.config import Platform, PlatformConfig
+from gateway.platforms.base import (
+    BasePlatformAdapter,
+    MessageEvent,
+    MessageType,
+    SendResult,
+)
+
+
+def check_telegram_requirements() -> bool:
+    """Check if Telegram dependencies are available."""
+    return TELEGRAM_AVAILABLE
+
+
+class TelegramAdapter(BasePlatformAdapter):
+    """
+    Telegram bot adapter.
+    
+    Handles:
+    - Receiving messages from users and groups
+    - Sending responses with Telegram markdown
+    - Forum topics (thread_id support)
+    - Media messages
+    """
+    
+    # Telegram message limits
+    MAX_MESSAGE_LENGTH = 4096
+    
+    def __init__(self, config: PlatformConfig):
+        super().__init__(config, Platform.TELEGRAM)
+        self._app: Optional[Application] = None
+        self._bot: Optional[Bot] = None
+    
+    async def connect(self) -> bool:
+        """Connect to Telegram and start polling for updates."""
+        if not TELEGRAM_AVAILABLE:
+            print(f"[{self.name}] python-telegram-bot not installed. Run: pip install python-telegram-bot")
+            return False
+        
+        if not self.config.token:
+            print(f"[{self.name}] No bot token configured")
+            return False
+        
+        try:
+            # Build the application
+            self._app = Application.builder().token(self.config.token).build()
+            self._bot = self._app.bot
+            
+            # Register handlers
+            self._app.add_handler(TelegramMessageHandler(
+                filters.TEXT & ~filters.COMMAND,
+                self._handle_text_message
+            ))
+            self._app.add_handler(TelegramMessageHandler(
+                filters.COMMAND,
+                self._handle_command
+            ))
+            self._app.add_handler(TelegramMessageHandler(
+                filters.PHOTO | filters.VIDEO | filters.AUDIO | filters.VOICE | filters.Document.ALL,
+                self._handle_media_message
+            ))
+            
+            # Start polling in background
+            await self._app.initialize()
+            await self._app.start()
+            await self._app.updater.start_polling(allowed_updates=Update.ALL_TYPES)
+            
+            self._running = True
+            print(f"[{self.name}] Connected and polling for updates")
+            return True
+            
+        except Exception as e:
+            print(f"[{self.name}] Failed to connect: {e}")
+            return False
+    
+    async def disconnect(self) -> None:
+        """Stop polling and disconnect."""
+        if self._app:
+            try:
+                await self._app.updater.stop()
+                await self._app.stop()
+                await self._app.shutdown()
+            except Exception as e:
+                print(f"[{self.name}] Error during disconnect: {e}")
+        
+        self._running = False
+        self._app = None
+        self._bot = None
+        print(f"[{self.name}] Disconnected")
+    
+    async def send(
+        self,
+        chat_id: str,
+        content: str,
+        reply_to: Optional[str] = None,
+        metadata: Optional[Dict[str, Any]] = None
+    ) -> SendResult:
+        """Send a message to a Telegram chat."""
+        if not self._bot:
+            return SendResult(success=False, error="Not connected")
+        
+        try:
+            # Format and split message if needed
+            formatted = self.format_message(content)
+            chunks = self.truncate_message(formatted, self.MAX_MESSAGE_LENGTH)
+            
+            message_ids = []
+            thread_id = metadata.get("thread_id") if metadata else None
+            
+            for i, chunk in enumerate(chunks):
+                # Try Markdown first, fall back to plain text if it fails
+                try:
+                    msg = await self._bot.send_message(
+                        chat_id=int(chat_id),
+                        text=chunk,
+                        parse_mode=ParseMode.MARKDOWN,
+                        reply_to_message_id=int(reply_to) if reply_to and i == 0 else None,
+                        message_thread_id=int(thread_id) if thread_id else None,
+                    )
+                except Exception as md_error:
+                    # Markdown parsing failed, try plain text
+                    if "parse" in str(md_error).lower() or "markdown" in str(md_error).lower():
+                        msg = await self._bot.send_message(
+                            chat_id=int(chat_id),
+                            text=chunk,
+                            parse_mode=None,  # Plain text
+                            reply_to_message_id=int(reply_to) if reply_to and i == 0 else None,
+                            message_thread_id=int(thread_id) if thread_id else None,
+                        )
+                    else:
+                        raise  # Re-raise if not a parse error
+                message_ids.append(str(msg.message_id))
+            
+            return SendResult(
+                success=True,
+                message_id=message_ids[0] if message_ids else None,
+                raw_response={"message_ids": message_ids}
+            )
+            
+        except Exception as e:
+            return SendResult(success=False, error=str(e))
+    
+    async def send_typing(self, chat_id: str) -> None:
+        """Send typing indicator."""
+        if self._bot:
+            try:
+                await self._bot.send_chat_action(
+                    chat_id=int(chat_id),
+                    action="typing"
+                )
+            except Exception:
+                pass  # Ignore typing indicator failures
+    
+    async def get_chat_info(self, chat_id: str) -> Dict[str, Any]:
+        """Get information about a Telegram chat."""
+        if not self._bot:
+            return {"name": "Unknown", "type": "dm"}
+        
+        try:
+            chat = await self._bot.get_chat(int(chat_id))
+            
+            chat_type = "dm"
+            if chat.type == ChatType.GROUP:
+                chat_type = "group"
+            elif chat.type == ChatType.SUPERGROUP:
+                chat_type = "group"
+                if chat.is_forum:
+                    chat_type = "forum"
+            elif chat.type == ChatType.CHANNEL:
+                chat_type = "channel"
+            
+            return {
+                "name": chat.title or chat.full_name or str(chat_id),
+                "type": chat_type,
+                "username": chat.username,
+                "is_forum": getattr(chat, "is_forum", False),
+            }
+        except Exception as e:
+            return {"name": str(chat_id), "type": "dm", "error": str(e)}
+    
+    def format_message(self, content: str) -> str:
+        """
+        Format message for Telegram.
+        
+        Telegram uses a subset of markdown. We'll use the simpler
+        Markdown mode (not MarkdownV2) for compatibility.
+        """
+        # Basic escaping for Telegram Markdown
+        # In Markdown mode (not V2), only certain characters need escaping
+        return content
+    
+    async def _handle_text_message(self, update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
+        """Handle incoming text messages."""
+        if not update.message or not update.message.text:
+            return
+        
+        event = self._build_message_event(update.message, MessageType.TEXT)
+        await self.handle_message(event)
+    
+    async def _handle_command(self, update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
+        """Handle incoming command messages."""
+        if not update.message or not update.message.text:
+            return
+        
+        event = self._build_message_event(update.message, MessageType.COMMAND)
+        await self.handle_message(event)
+    
+    async def _handle_media_message(self, update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
+        """Handle incoming media messages."""
+        if not update.message:
+            return
+        
+        msg = update.message
+        
+        # Determine media type
+        if msg.photo:
+            msg_type = MessageType.PHOTO
+        elif msg.video:
+            msg_type = MessageType.VIDEO
+        elif msg.audio:
+            msg_type = MessageType.AUDIO
+        elif msg.voice:
+            msg_type = MessageType.VOICE
+        else:
+            msg_type = MessageType.DOCUMENT
+        
+        event = self._build_message_event(msg, msg_type)
+        
+        # Add caption as text
+        if msg.caption:
+            event.text = msg.caption
+        
+        await self.handle_message(event)
+    
+    def _build_message_event(self, message: Message, msg_type: MessageType) -> MessageEvent:
+        """Build a MessageEvent from a Telegram message."""
+        chat = message.chat
+        user = message.from_user
+        
+        # Determine chat type
+        chat_type = "dm"
+        if chat.type in (ChatType.GROUP, ChatType.SUPERGROUP):
+            chat_type = "group"
+        elif chat.type == ChatType.CHANNEL:
+            chat_type = "channel"
+        
+        # Build source
+        source = self.build_source(
+            chat_id=str(chat.id),
+            chat_name=chat.title or (chat.full_name if hasattr(chat, "full_name") else None),
+            chat_type=chat_type,
+            user_id=str(user.id) if user else None,
+            user_name=user.full_name if user else None,
+            thread_id=str(message.message_thread_id) if message.message_thread_id else None,
+        )
+        
+        return MessageEvent(
+            text=message.text or "",
+            message_type=msg_type,
+            source=source,
+            raw_message=message,
+            message_id=str(message.message_id),
+            timestamp=message.date,
+        )
--- a/gateway/platforms/whatsapp.py
+++ b/gateway/platforms/whatsapp.py
@@ -0,0 +1,327 @@
+"""
+WhatsApp platform adapter.
+
+WhatsApp integration is more complex than Telegram/Discord because:
+- No official bot API for personal accounts
+- Business API requires Meta Business verification
+- Most solutions use web-based automation
+
+This adapter supports multiple backends:
+1. WhatsApp Business API (requires Meta verification)
+2. whatsapp-web.js (via Node.js subprocess) - for personal accounts
+3. Baileys (via Node.js subprocess) - alternative for personal accounts
+
+For simplicity, we'll implement a generic interface that can work
+with different backends via a bridge pattern.
+"""
+
+import asyncio
+import json
+import subprocess
+from pathlib import Path
+from typing import Dict, List, Optional, Any
+
+import sys
+sys.path.insert(0, str(__file__).rsplit("/", 3)[0])
+
+from gateway.config import Platform, PlatformConfig
+from gateway.platforms.base import (
+    BasePlatformAdapter,
+    MessageEvent,
+    MessageType,
+    SendResult,
+)
+
+
+def check_whatsapp_requirements() -> bool:
+    """
+    Check if WhatsApp dependencies are available.
+    
+    WhatsApp requires a Node.js bridge for most implementations.
+    """
+    # Check for Node.js
+    try:
+        result = subprocess.run(
+            ["node", "--version"],
+            capture_output=True,
+            text=True,
+            timeout=5
+        )
+        return result.returncode == 0
+    except Exception:
+        return False
+
+
+class WhatsAppAdapter(BasePlatformAdapter):
+    """
+    WhatsApp adapter.
+    
+    This implementation uses a simple HTTP bridge pattern where:
+    1. A Node.js process runs the WhatsApp Web client
+    2. Messages are forwarded via HTTP/IPC to this Python adapter
+    3. Responses are sent back through the bridge
+    
+    The actual Node.js bridge implementation can vary:
+    - whatsapp-web.js based
+    - Baileys based
+    - Business API based
+    
+    Configuration:
+    - bridge_script: Path to the Node.js bridge script
+    - bridge_port: Port for HTTP communication (default: 3000)
+    - session_path: Path to store WhatsApp session data
+    """
+    
+    # WhatsApp message limits
+    MAX_MESSAGE_LENGTH = 65536  # WhatsApp allows longer messages
+    
+    def __init__(self, config: PlatformConfig):
+        super().__init__(config, Platform.WHATSAPP)
+        self._bridge_process: Optional[subprocess.Popen] = None
+        self._bridge_port: int = config.extra.get("bridge_port", 3000)
+        self._bridge_script: Optional[str] = config.extra.get("bridge_script")
+        self._session_path: Path = Path(config.extra.get(
+            "session_path",
+            Path.home() / ".hermes" / "whatsapp" / "session"
+        ))
+        self._message_queue: asyncio.Queue = asyncio.Queue()
+    
+    async def connect(self) -> bool:
+        """
+        Start the WhatsApp bridge.
+        
+        This launches the Node.js bridge process and waits for it to be ready.
+        """
+        if not check_whatsapp_requirements():
+            print(f"[{self.name}] Node.js not found. WhatsApp requires Node.js.")
+            return False
+        
+        if not self._bridge_script:
+            print(f"[{self.name}] No bridge script configured.")
+            print(f"[{self.name}] Set 'bridge_script' in whatsapp.extra config.")
+            print(f"[{self.name}] See docs/messaging.md for WhatsApp setup instructions.")
+            return False
+        
+        bridge_path = Path(self._bridge_script)
+        if not bridge_path.exists():
+            print(f"[{self.name}] Bridge script not found: {bridge_path}")
+            return False
+        
+        try:
+            # Ensure session directory exists
+            self._session_path.mkdir(parents=True, exist_ok=True)
+            
+            # Start the bridge process
+            self._bridge_process = subprocess.Popen(
+                [
+                    "node",
+                    str(bridge_path),
+                    "--port", str(self._bridge_port),
+                    "--session", str(self._session_path),
+                ],
+                stdout=subprocess.PIPE,
+                stderr=subprocess.PIPE,
+                text=True,
+            )
+            
+            # Wait for bridge to be ready (look for ready signal)
+            # This is a simplified version - real implementation would
+            # wait for an HTTP health check or specific stdout message
+            await asyncio.sleep(5)
+            
+            if self._bridge_process.poll() is not None:
+                stderr = self._bridge_process.stderr.read() if self._bridge_process.stderr else ""
+                print(f"[{self.name}] Bridge process died: {stderr}")
+                return False
+            
+            # Start message polling task
+            asyncio.create_task(self._poll_messages())
+            
+            self._running = True
+            print(f"[{self.name}] Bridge started on port {self._bridge_port}")
+            print(f"[{self.name}] Scan QR code if prompted (check bridge output)")
+            return True
+            
+        except Exception as e:
+            print(f"[{self.name}] Failed to start bridge: {e}")
+            return False
+    
+    async def disconnect(self) -> None:
+        """Stop the WhatsApp bridge."""
+        if self._bridge_process:
+            try:
+                self._bridge_process.terminate()
+                await asyncio.sleep(1)
+                if self._bridge_process.poll() is None:
+                    self._bridge_process.kill()
+            except Exception as e:
+                print(f"[{self.name}] Error stopping bridge: {e}")
+        
+        self._running = False
+        self._bridge_process = None
+        print(f"[{self.name}] Disconnected")
+    
+    async def send(
+        self,
+        chat_id: str,
+        content: str,
+        reply_to: Optional[str] = None,
+        metadata: Optional[Dict[str, Any]] = None
+    ) -> SendResult:
+        """Send a message via the WhatsApp bridge."""
+        if not self._running:
+            return SendResult(success=False, error="Not connected")
+        
+        try:
+            import aiohttp
+            
+            async with aiohttp.ClientSession() as session:
+                payload = {
+                    "chatId": chat_id,
+                    "message": content,
+                }
+                if reply_to:
+                    payload["replyTo"] = reply_to
+                
+                async with session.post(
+                    f"http://localhost:{self._bridge_port}/send",
+                    json=payload,
+                    timeout=aiohttp.ClientTimeout(total=30)
+                ) as resp:
+                    if resp.status == 200:
+                        data = await resp.json()
+                        return SendResult(
+                            success=True,
+                            message_id=data.get("messageId"),
+                            raw_response=data
+                        )
+                    else:
+                        error = await resp.text()
+                        return SendResult(success=False, error=error)
+                        
+        except ImportError:
+            return SendResult(
+                success=False, 
+                error="aiohttp not installed. Run: pip install aiohttp"
+            )
+        except Exception as e:
+            return SendResult(success=False, error=str(e))
+    
+    async def send_typing(self, chat_id: str) -> None:
+        """Send typing indicator via bridge."""
+        if not self._running:
+            return
+        
+        try:
+            import aiohttp
+            
+            async with aiohttp.ClientSession() as session:
+                await session.post(
+                    f"http://localhost:{self._bridge_port}/typing",
+                    json={"chatId": chat_id},
+                    timeout=aiohttp.ClientTimeout(total=5)
+                )
+        except Exception:
+            pass  # Ignore typing indicator failures
+    
+    async def get_chat_info(self, chat_id: str) -> Dict[str, Any]:
+        """Get information about a WhatsApp chat."""
+        if not self._running:
+            return {"name": "Unknown", "type": "dm"}
+        
+        try:
+            import aiohttp
+            
+            async with aiohttp.ClientSession() as session:
+                async with session.get(
+                    f"http://localhost:{self._bridge_port}/chat/{chat_id}",
+                    timeout=aiohttp.ClientTimeout(total=10)
+                ) as resp:
+                    if resp.status == 200:
+                        data = await resp.json()
+                        return {
+                            "name": data.get("name", chat_id),
+                            "type": "group" if data.get("isGroup") else "dm",
+                            "participants": data.get("participants", []),
+                        }
+        except Exception:
+            pass
+        
+        return {"name": chat_id, "type": "dm"}
+    
+    async def _poll_messages(self) -> None:
+        """Poll the bridge for incoming messages."""
+        try:
+            import aiohttp
+        except ImportError:
+            print(f"[{self.name}] aiohttp not installed, message polling disabled")
+            return
+        
+        while self._running:
+            try:
+                async with aiohttp.ClientSession() as session:
+                    async with session.get(
+                        f"http://localhost:{self._bridge_port}/messages",
+                        timeout=aiohttp.ClientTimeout(total=30)
+                    ) as resp:
+                        if resp.status == 200:
+                            messages = await resp.json()
+                            for msg_data in messages:
+                                event = self._build_message_event(msg_data)
+                                if event:
+                                    await self.handle_message(event)
+            except asyncio.CancelledError:
+                break
+            except Exception as e:
+                print(f"[{self.name}] Poll error: {e}")
+                await asyncio.sleep(5)
+            
+            await asyncio.sleep(1)  # Poll interval
+    
+    def _build_message_event(self, data: Dict[str, Any]) -> Optional[MessageEvent]:
+        """Build a MessageEvent from bridge message data."""
+        try:
+            # Determine message type
+            msg_type = MessageType.TEXT
+            if data.get("hasMedia"):
+                media_type = data.get("mediaType", "")
+                if "image" in media_type:
+                    msg_type = MessageType.PHOTO
+                elif "video" in media_type:
+                    msg_type = MessageType.VIDEO
+                elif "audio" in media_type or "ptt" in media_type:  # ptt = voice note
+                    msg_type = MessageType.VOICE
+                else:
+                    msg_type = MessageType.DOCUMENT
+            
+            # Determine chat type
+            is_group = data.get("isGroup", False)
+            chat_type = "group" if is_group else "dm"
+            
+            # Build source
+            source = self.build_source(
+                chat_id=data.get("chatId", ""),
+                chat_name=data.get("chatName"),
+                chat_type=chat_type,
+                user_id=data.get("senderId"),
+                user_name=data.get("senderName"),
+            )
+            
+            return MessageEvent(
+                text=data.get("body", ""),
+                message_type=msg_type,
+                source=source,
+                raw_message=data,
+                message_id=data.get("messageId"),
+                media_urls=data.get("mediaUrls", []),
+            )
+        except Exception as e:
+            print(f"[{self.name}] Error building event: {e}")
+            return None
+
+
+# Note: A reference Node.js bridge script would be provided in scripts/whatsapp-bridge/
+# It would use whatsapp-web.js or Baileys to:
+# 1. Handle WhatsApp Web authentication (QR code)
+# 2. Listen for incoming messages
+# 3. Expose HTTP endpoints for send/receive/status
--- a/gateway/run.py
+++ b/gateway/run.py
@@ -0,0 +1,666 @@
+"""
+Gateway runner - entry point for messaging platform integrations.
+
+This module provides:
+- start_gateway(): Start all configured platform adapters
+- GatewayRunner: Main class managing the gateway lifecycle
+
+Usage:
+    # Start the gateway
+    python -m gateway.run
+    
+    # Or from CLI
+    python cli.py --gateway
+"""
+
+import asyncio
+import os
+import sys
+import signal
+from pathlib import Path
+from datetime import datetime
+from typing import Dict, Optional, Any, List
+
+# Add parent directory to path
+sys.path.insert(0, str(Path(__file__).parent.parent))
+
+# Load environment variables from ~/.hermes/.env first
+from dotenv import load_dotenv
+_env_path = Path.home() / '.hermes' / '.env'
+if _env_path.exists():
+    load_dotenv(_env_path)
+# Also try project .env as fallback
+load_dotenv()
+
+# Gateway runs in quiet mode - suppress debug output and use cwd directly (no temp dirs)
+os.environ["HERMES_QUIET"] = "1"
+
+# Set terminal working directory for messaging platforms
+# Uses MESSAGING_CWD if set, otherwise defaults to home directory
+# This is separate from CLI which uses the directory where `hermes` is run
+messaging_cwd = os.getenv("MESSAGING_CWD") or str(Path.home())
+os.environ["TERMINAL_CWD"] = messaging_cwd
+
+from gateway.config import (
+    Platform,
+    GatewayConfig,
+    load_gateway_config,
+)
+from gateway.session import (
+    SessionStore,
+    SessionSource,
+    SessionContext,
+    build_session_context,
+    build_session_context_prompt,
+)
+from gateway.delivery import DeliveryRouter, DeliveryTarget
+from gateway.platforms.base import BasePlatformAdapter, MessageEvent
+
+
+class GatewayRunner:
+    """
+    Main gateway controller.
+    
+    Manages the lifecycle of all platform adapters and routes
+    messages to/from the agent.
+    """
+    
+    def __init__(self, config: Optional[GatewayConfig] = None):
+        self.config = config or load_gateway_config()
+        self.adapters: Dict[Platform, BasePlatformAdapter] = {}
+        self.session_store = SessionStore(self.config.sessions_dir, self.config)
+        self.delivery_router = DeliveryRouter(self.config)
+        self._running = False
+        self._shutdown_event = asyncio.Event()
+        
+        # Track running agents per session for interrupt support
+        # Key: session_key, Value: AIAgent instance
+        self._running_agents: Dict[str, Any] = {}
+        self._pending_messages: Dict[str, str] = {}  # Queued messages during interrupt
+    
+    async def start(self) -> bool:
+        """
+        Start the gateway and all configured platform adapters.
+        
+        Returns True if at least one adapter connected successfully.
+        """
+        print("[gateway] Starting Hermes Gateway...")
+        print(f"[gateway] Session storage: {self.config.sessions_dir}")
+        
+        connected_count = 0
+        
+        # Initialize and connect each configured platform
+        for platform, platform_config in self.config.platforms.items():
+            if not platform_config.enabled:
+                continue
+            
+            adapter = self._create_adapter(platform, platform_config)
+            if not adapter:
+                print(f"[gateway] No adapter available for {platform.value}")
+                continue
+            
+            # Set up message handler
+            adapter.set_message_handler(self._handle_message)
+            
+            # Try to connect
+            print(f"[gateway] Connecting to {platform.value}...")
+            try:
+                success = await adapter.connect()
+                if success:
+                    self.adapters[platform] = adapter
+                    connected_count += 1
+                    print(f"[gateway] ✓ {platform.value} connected")
+                else:
+                    print(f"[gateway] ✗ {platform.value} failed to connect")
+            except Exception as e:
+                print(f"[gateway] ✗ {platform.value} error: {e}")
+        
+        if connected_count == 0:
+            print("[gateway] No platforms connected. Check your configuration.")
+            return False
+        
+        # Update delivery router with adapters
+        self.delivery_router.adapters = self.adapters
+        
+        self._running = True
+        print(f"[gateway] Gateway running with {connected_count} platform(s)")
+        print("[gateway] Press Ctrl+C to stop")
+        
+        return True
+    
+    async def stop(self) -> None:
+        """Stop the gateway and disconnect all adapters."""
+        print("[gateway] Stopping gateway...")
+        self._running = False
+        
+        for platform, adapter in self.adapters.items():
+            try:
+                await adapter.disconnect()
+                print(f"[gateway] ✓ {platform.value} disconnected")
+            except Exception as e:
+                print(f"[gateway] ✗ {platform.value} disconnect error: {e}")
+        
+        self.adapters.clear()
+        self._shutdown_event.set()
+        print("[gateway] Gateway stopped")
+    
+    async def wait_for_shutdown(self) -> None:
+        """Wait for shutdown signal."""
+        await self._shutdown_event.wait()
+    
+    def _create_adapter(
+        self, 
+        platform: Platform, 
+        config: Any
+    ) -> Optional[BasePlatformAdapter]:
+        """Create the appropriate adapter for a platform."""
+        if platform == Platform.TELEGRAM:
+            from gateway.platforms.telegram import TelegramAdapter, check_telegram_requirements
+            if not check_telegram_requirements():
+                print(f"[gateway] Telegram: python-telegram-bot not installed")
+                return None
+            return TelegramAdapter(config)
+        
+        elif platform == Platform.DISCORD:
+            from gateway.platforms.discord import DiscordAdapter, check_discord_requirements
+            if not check_discord_requirements():
+                print(f"[gateway] Discord: discord.py not installed")
+                return None
+            return DiscordAdapter(config)
+        
+        elif platform == Platform.WHATSAPP:
+            from gateway.platforms.whatsapp import WhatsAppAdapter, check_whatsapp_requirements
+            if not check_whatsapp_requirements():
+                print(f"[gateway] WhatsApp: Node.js not installed or bridge not configured")
+                return None
+            return WhatsAppAdapter(config)
+        
+        return None
+    
+    def _is_user_authorized(self, source: SessionSource) -> bool:
+        """
+        Check if a user is authorized to use the bot.
+        
+        Authorization is checked via environment variables:
+        - GATEWAY_ALLOWED_USERS: Comma-separated list of user IDs (all platforms)
+        - TELEGRAM_ALLOWED_USERS: Telegram-specific user IDs
+        - DISCORD_ALLOWED_USERS: Discord-specific user IDs
+        
+        If no allowlist is configured, all users are allowed (open access).
+        """
+        user_id = source.user_id
+        if not user_id:
+            return False  # Can't verify unknown users
+        
+        # Check platform-specific allowlist first
+        platform_env_map = {
+            Platform.TELEGRAM: "TELEGRAM_ALLOWED_USERS",
+            Platform.DISCORD: "DISCORD_ALLOWED_USERS",
+            Platform.WHATSAPP: "WHATSAPP_ALLOWED_USERS",
+        }
+        
+        platform_allowlist = os.getenv(platform_env_map.get(source.platform, ""))
+        global_allowlist = os.getenv("GATEWAY_ALLOWED_USERS", "")
+        
+        # If no allowlists configured, allow all (backward compatible)
+        if not platform_allowlist and not global_allowlist:
+            return True
+        
+        # Check if user is in any allowlist
+        allowed_ids = set()
+        if platform_allowlist:
+            allowed_ids.update(uid.strip() for uid in platform_allowlist.split(","))
+        if global_allowlist:
+            allowed_ids.update(uid.strip() for uid in global_allowlist.split(","))
+        
+        return user_id in allowed_ids
+    
+    async def _handle_message(self, event: MessageEvent) -> Optional[str]:
+        """
+        Handle an incoming message from any platform.
+        
+        This is the core message processing pipeline:
+        1. Check user authorization
+        2. Check for commands (/new, /reset, etc.)
+        3. Check for running agent and interrupt if needed
+        4. Get or create session
+        5. Build context for agent
+        6. Run agent conversation
+        7. Return response
+        """
+        source = event.source
+        
+        # Check if user is authorized
+        if not self._is_user_authorized(source):
+            print(f"[gateway] Unauthorized user: {source.user_id} ({source.user_name}) on {source.platform.value}")
+            return None  # Silently ignore unauthorized users
+        
+        # Check for commands
+        command = event.get_command()
+        if command in ["new", "reset"]:
+            return await self._handle_reset_command(event)
+        
+        if command == "status":
+            return await self._handle_status_command(event)
+        
+        if command == "stop":
+            return await self._handle_stop_command(event)
+        
+        # Get or create session
+        session_entry = self.session_store.get_or_create_session(source)
+        session_key = session_entry.session_key
+        
+        # Check if there's already a running agent for this session
+        if session_key in self._running_agents:
+            running_agent = self._running_agents[session_key]
+            print(f"[gateway] ⚡ Interrupting running agent for session {session_key[:20]}...")
+            running_agent.interrupt(event.text)
+            # Store the new message to be processed after current agent finishes
+            self._pending_messages[session_key] = event.text
+            return None  # Don't respond yet - let the interrupt handle it
+        
+        # Build session context
+        context = build_session_context(source, self.config, session_entry)
+        
+        # Set environment variables for tools
+        self._set_session_env(context)
+        
+        # Build the context prompt to inject
+        context_prompt = build_session_context_prompt(context)
+        
+        # Load conversation history from transcript
+        history = self.session_store.load_transcript(session_entry.session_id)
+        
+        try:
+            # Run the agent
+            response = await self._run_agent(
+                message=event.text,
+                context_prompt=context_prompt,
+                history=history,
+                source=source,
+                session_id=session_entry.session_id,
+                session_key=session_key
+            )
+            
+            # Append to transcript
+            self.session_store.append_to_transcript(
+                session_entry.session_id,
+                {"role": "user", "content": event.text, "timestamp": datetime.now().isoformat()}
+            )
+            self.session_store.append_to_transcript(
+                session_entry.session_id,
+                {"role": "assistant", "content": response, "timestamp": datetime.now().isoformat()}
+            )
+            
+            # Update session
+            self.session_store.update_session(session_entry.session_key)
+            
+            return response
+            
+        except Exception as e:
+            print(f"[gateway] Agent error: {e}")
+            return f"Sorry, I encountered an error: {str(e)}"
+        finally:
+            # Clear session env
+            self._clear_session_env()
+    
+    async def _handle_reset_command(self, event: MessageEvent) -> str:
+        """Handle /new or /reset command."""
+        source = event.source
+        
+        # Get existing session key
+        session_key = f"agent:main:{source.platform.value}:" + \
+                      (f"dm" if source.chat_type == "dm" else f"{source.chat_type}:{source.chat_id}")
+        
+        # Reset the session
+        new_entry = self.session_store.reset_session(session_key)
+        
+        if new_entry:
+            return "✨ Session reset! I've started fresh with no memory of our previous conversation."
+        else:
+            # No existing session, just create one
+            self.session_store.get_or_create_session(source, force_new=True)
+            return "✨ New session started!"
+    
+    async def _handle_status_command(self, event: MessageEvent) -> str:
+        """Handle /status command."""
+        source = event.source
+        session_entry = self.session_store.get_or_create_session(source)
+        
+        connected_platforms = [p.value for p in self.adapters.keys()]
+        
+        # Check if there's an active agent
+        session_key = session_entry.session_key
+        is_running = session_key in self._running_agents
+        
+        lines = [
+            "📊 **Hermes Gateway Status**",
+            "",
+            f"**Session ID:** `{session_entry.session_id[:12]}...`",
+            f"**Created:** {session_entry.created_at.strftime('%Y-%m-%d %H:%M')}",
+            f"**Last Activity:** {session_entry.updated_at.strftime('%Y-%m-%d %H:%M')}",
+            f"**Tokens:** {session_entry.total_tokens:,}",
+            f"**Agent Running:** {'Yes ⚡' if is_running else 'No'}",
+            "",
+            f"**Connected Platforms:** {', '.join(connected_platforms)}",
+        ]
+        
+        return "\n".join(lines)
+    
+    async def _handle_stop_command(self, event: MessageEvent) -> str:
+        """Handle /stop command - interrupt a running agent."""
+        source = event.source
+        session_entry = self.session_store.get_or_create_session(source)
+        session_key = session_entry.session_key
+        
+        if session_key in self._running_agents:
+            agent = self._running_agents[session_key]
+            agent.interrupt()
+            return "⚡ Stopping the current task... The agent will finish its current step and respond."
+        else:
+            return "No active task to stop."
+    
+    def _set_session_env(self, context: SessionContext) -> None:
+        """Set environment variables for the current session."""
+        os.environ["HERMES_SESSION_PLATFORM"] = context.source.platform.value
+        os.environ["HERMES_SESSION_CHAT_ID"] = context.source.chat_id
+        if context.source.chat_name:
+            os.environ["HERMES_SESSION_CHAT_NAME"] = context.source.chat_name
+    
+    def _clear_session_env(self) -> None:
+        """Clear session environment variables."""
+        for var in ["HERMES_SESSION_PLATFORM", "HERMES_SESSION_CHAT_ID", "HERMES_SESSION_CHAT_NAME"]:
+            if var in os.environ:
+                del os.environ[var]
+    
+    async def _run_agent(
+        self,
+        message: str,
+        context_prompt: str,
+        history: List[Dict[str, Any]],
+        source: SessionSource,
+        session_id: str,
+        session_key: str = None
+    ) -> str:
+        """
+        Run the agent with the given message and context.
+        
+        This is run in a thread pool to not block the event loop.
+        Supports interruption via new messages.
+        """
+        from run_agent import AIAgent
+        import queue
+        
+        # Determine toolset based on platform
+        toolset_map = {
+            Platform.LOCAL: "hermes-cli",
+            Platform.TELEGRAM: "hermes-telegram",
+            Platform.DISCORD: "hermes-discord",
+            Platform.WHATSAPP: "hermes-whatsapp",
+        }
+        toolset = toolset_map.get(source.platform, "hermes-telegram")
+        
+        # Check if tool progress notifications are enabled
+        tool_progress_enabled = os.getenv("HERMES_TOOL_PROGRESS", "").lower() in ("1", "true", "yes")
+        progress_mode = os.getenv("HERMES_TOOL_PROGRESS_MODE", "new")  # "all" or "new" (only new tools)
+        
+        # Queue for progress messages (thread-safe)
+        progress_queue = queue.Queue() if tool_progress_enabled else None
+        last_tool = [None]  # Mutable container for tracking in closure
+        
+        def progress_callback(tool_name: str, preview: str = None):
+            """Callback invoked by agent when a tool is called."""
+            if not progress_queue:
+                return
+            
+            # "new" mode: only report when tool changes
+            if progress_mode == "new" and tool_name == last_tool[0]:
+                return
+            last_tool[0] = tool_name
+            
+            # Build progress message
+            tool_emojis = {
+                "terminal": "💻",
+                "web_search": "🔍",
+                "web_extract": "📄",
+                "read_file": "📖",
+                "write_file": "✍️",
+                "list_directory": "📂",
+                "image_generate": "🎨",
+                "browser_navigate": "🌐",
+                "browser_click": "👆",
+                "moa_query": "🧠",
+            }
+            emoji = tool_emojis.get(tool_name, "⚙️")
+            
+            if tool_name == "terminal" and preview:
+                msg = f"{emoji} `{preview}`..."
+            else:
+                msg = f"{emoji} {tool_name}..."
+            
+            progress_queue.put(msg)
+        
+        # Background task to send progress messages
+        async def send_progress_messages():
+            if not progress_queue:
+                return
+            
+            adapter = self.adapters.get(source.platform)
+            if not adapter:
+                return
+            
+            while True:
+                try:
+                    # Non-blocking check with small timeout
+                    msg = progress_queue.get_nowait()
+                    await adapter.send(chat_id=source.chat_id, content=msg)
+                    # Restore typing indicator after sending progress message
+                    await asyncio.sleep(0.3)
+                    await adapter.send_typing(source.chat_id)
+                except queue.Empty:
+                    await asyncio.sleep(0.3)  # Check again soon
+                except asyncio.CancelledError:
+                    # Drain remaining messages
+                    while not progress_queue.empty():
+                        try:
+                            msg = progress_queue.get_nowait()
+                            await adapter.send(chat_id=source.chat_id, content=msg)
+                        except:
+                            break
+                    return
+                except Exception as e:
+                    print(f"[Gateway] Progress message error: {e}")
+                    await asyncio.sleep(1)
+        
+        # We need to share the agent instance for interrupt support
+        agent_holder = [None]  # Mutable container for the agent instance
+        result_holder = [None]  # Mutable container for the result
+        
+        def run_sync():
+            # Read from env var or use default (same as CLI)
+            max_iterations = int(os.getenv("HERMES_MAX_ITERATIONS", "60"))
+            
+            agent = AIAgent(
+                model=os.getenv("HERMES_MODEL", "anthropic/claude-sonnet-4"),
+                max_iterations=max_iterations,
+                quiet_mode=True,
+                enabled_toolsets=[toolset],
+                ephemeral_system_prompt=context_prompt,
+                session_id=session_id,
+                tool_progress_callback=progress_callback if tool_progress_enabled else None,
+            )
+            
+            # Store agent reference for interrupt support
+            agent_holder[0] = agent
+            
+            # Convert transcript history to agent format
+            # Transcript has timestamps; agent expects {"role": ..., "content": ...}
+            agent_history = []
+            for msg in history:
+                role = msg.get("role")
+                content = msg.get("content")
+                if role and content:
+                    agent_history.append({"role": role, "content": content})
+            
+            result = agent.run_conversation(message, conversation_history=agent_history)
+            result_holder[0] = result
+            
+            # Return final response, or a message if something went wrong
+            final_response = result.get("final_response")
+            if final_response:
+                return final_response
+            elif result.get("error"):
+                # Agent couldn't recover - show the error
+                return f"⚠️ {result['error']}"
+            else:
+                return "(No response generated)"
+        
+        # Start progress message sender if enabled
+        progress_task = None
+        if tool_progress_enabled:
+            progress_task = asyncio.create_task(send_progress_messages())
+        
+        # Track this agent as running for this session (for interrupt support)
+        # We do this in a callback after the agent is created
+        async def track_agent():
+            # Wait for agent to be created
+            while agent_holder[0] is None:
+                await asyncio.sleep(0.05)
+            if session_key:
+                self._running_agents[session_key] = agent_holder[0]
+        
+        tracking_task = asyncio.create_task(track_agent())
+        
+        # Monitor for interrupts from the adapter (new messages arriving)
+        async def monitor_for_interrupt():
+            adapter = self.adapters.get(source.platform)
+            if not adapter:
+                return
+            
+            chat_id = source.chat_id
+            while True:
+                await asyncio.sleep(0.2)  # Check every 200ms
+                # Check if adapter has a pending interrupt for this session
+                if hasattr(adapter, 'has_pending_interrupt') and adapter.has_pending_interrupt(chat_id):
+                    agent = agent_holder[0]
+                    if agent:
+                        pending_event = adapter.get_pending_message(chat_id)
+                        pending_text = pending_event.text if pending_event else None
+                        print(f"[gateway] ⚡ Interrupt detected from adapter, signaling agent...")
+                        agent.interrupt(pending_text)
+                        break
+        
+        interrupt_monitor = asyncio.create_task(monitor_for_interrupt())
+        
+        try:
+            # Run in thread pool to not block
+            loop = asyncio.get_event_loop()
+            response = await loop.run_in_executor(None, run_sync)
+            
+            # Check if we were interrupted and have a pending message
+            result = result_holder[0]
+            adapter = self.adapters.get(source.platform)
+            
+            # Get pending message from adapter if interrupted
+            pending = None
+            if result and result.get("interrupted") and adapter:
+                pending_event = adapter.get_pending_message(source.chat_id)
+                if pending_event:
+                    pending = pending_event.text
+                elif result.get("interrupt_message"):
+                    pending = result.get("interrupt_message")
+            
+            if pending:
+                print(f"[gateway] 📨 Processing interrupted message: '{pending[:40]}...'")
+                # Add an indicator to the response
+                if response:
+                    response = response + "\n\n---\n_[Interrupted - processing your new message]_"
+                
+                # Send the interrupted response first
+                if adapter and response:
+                    await adapter.send(chat_id=source.chat_id, content=response)
+                
+                # Now process the pending message with updated history
+                updated_history = result.get("messages", history)
+                return await self._run_agent(
+                    message=pending,
+                    context_prompt=context_prompt,
+                    history=updated_history,
+                    source=source,
+                    session_id=session_id,
+                    session_key=session_key
+                )
+        finally:
+            # Stop progress sender and interrupt monitor
+            if progress_task:
+                progress_task.cancel()
+            interrupt_monitor.cancel()
+            
+            # Clean up tracking
+            tracking_task.cancel()
+            if session_key and session_key in self._running_agents:
+                del self._running_agents[session_key]
+            
+            # Wait for cancelled tasks
+            for task in [progress_task, interrupt_monitor, tracking_task]:
+                if task:
+                    try:
+                        await task
+                    except asyncio.CancelledError:
+                        pass
+        
+        return response
+
+
+async def start_gateway(config: Optional[GatewayConfig] = None) -> None:
+    """
+    Start the gateway and run until interrupted.
+    
+    This is the main entry point for running the gateway.
+    """
+    runner = GatewayRunner(config)
+    
+    # Set up signal handlers
+    def signal_handler():
+        asyncio.create_task(runner.stop())
+    
+    loop = asyncio.get_event_loop()
+    for sig in (signal.SIGINT, signal.SIGTERM):
+        try:
+            loop.add_signal_handler(sig, signal_handler)
+        except NotImplementedError:
+            # Windows doesn't support add_signal_handler
+            pass
+    
+    # Start the gateway
+    success = await runner.start()
+    if not success:
+        return
+    
+    # Wait for shutdown
+    await runner.wait_for_shutdown()
+
+
+def main():
+    """CLI entry point for the gateway."""
+    import argparse
+    
+    parser = argparse.ArgumentParser(description="Hermes Gateway - Multi-platform messaging")
+    parser.add_argument("--config", "-c", help="Path to gateway config file")
+    parser.add_argument("--verbose", "-v", action="store_true", help="Verbose output")
+    
+    args = parser.parse_args()
+    
+    config = None
+    if args.config:
+        import json
+        with open(args.config) as f:
+            data = json.load(f)
+            config = GatewayConfig.from_dict(data)
+    
+    # Run the gateway
+    asyncio.run(start_gateway(config))
+
+
+if __name__ == "__main__":
+    main()
--- a/gateway/session.py
+++ b/gateway/session.py
@@ -0,0 +1,522 @@
+"""
+Session management for the gateway.
+
+Handles:
+- Session context tracking (where messages come from)
+- Session storage (conversations persisted to disk)
+- Reset policy evaluation (when to start fresh)
+- Dynamic system prompt injection (agent knows its context)
+"""
+
+import os
+import json
+import uuid
+from pathlib import Path
+from datetime import datetime, timedelta
+from dataclasses import dataclass, field
+from typing import Dict, List, Optional, Any
+
+from .config import (
+    Platform,
+    GatewayConfig,
+    SessionResetPolicy,
+    HomeChannel,
+)
+
+
+@dataclass
+class SessionSource:
+    """
+    Describes where a message originated from.
+    
+    This information is used to:
+    1. Route responses back to the right place
+    2. Inject context into the system prompt
+    3. Track origin for cron job delivery
+    """
+    platform: Platform
+    chat_id: str
+    chat_name: Optional[str] = None
+    chat_type: str = "dm"  # "dm", "group", "channel", "thread"
+    user_id: Optional[str] = None
+    user_name: Optional[str] = None
+    thread_id: Optional[str] = None  # For forum topics, Discord threads, etc.
+    
+    @property
+    def description(self) -> str:
+        """Human-readable description of the source."""
+        if self.platform == Platform.LOCAL:
+            return "CLI terminal"
+        
+        parts = []
+        if self.chat_type == "dm":
+            parts.append(f"DM with {self.user_name or self.user_id or 'user'}")
+        elif self.chat_type == "group":
+            parts.append(f"group: {self.chat_name or self.chat_id}")
+        elif self.chat_type == "channel":
+            parts.append(f"channel: {self.chat_name or self.chat_id}")
+        else:
+            parts.append(self.chat_name or self.chat_id)
+        
+        if self.thread_id:
+            parts.append(f"thread: {self.thread_id}")
+        
+        return ", ".join(parts)
+    
+    def to_dict(self) -> Dict[str, Any]:
+        return {
+            "platform": self.platform.value,
+            "chat_id": self.chat_id,
+            "chat_name": self.chat_name,
+            "chat_type": self.chat_type,
+            "user_id": self.user_id,
+            "user_name": self.user_name,
+            "thread_id": self.thread_id,
+        }
+    
+    @classmethod
+    def from_dict(cls, data: Dict[str, Any]) -> "SessionSource":
+        return cls(
+            platform=Platform(data["platform"]),
+            chat_id=str(data["chat_id"]),
+            chat_name=data.get("chat_name"),
+            chat_type=data.get("chat_type", "dm"),
+            user_id=data.get("user_id"),
+            user_name=data.get("user_name"),
+            thread_id=data.get("thread_id"),
+        )
+    
+    @classmethod
+    def local_cli(cls) -> "SessionSource":
+        """Create a source representing the local CLI."""
+        return cls(
+            platform=Platform.LOCAL,
+            chat_id="cli",
+            chat_name="CLI terminal",
+            chat_type="dm",
+        )
+
+
+@dataclass
+class SessionContext:
+    """
+    Full context for a session, used for dynamic system prompt injection.
+    
+    The agent receives this information to understand:
+    - Where messages are coming from
+    - What platforms are available
+    - Where it can deliver scheduled task outputs
+    """
+    source: SessionSource
+    connected_platforms: List[Platform]
+    home_channels: Dict[Platform, HomeChannel]
+    
+    # Session metadata
+    session_key: str = ""
+    session_id: str = ""
+    created_at: Optional[datetime] = None
+    updated_at: Optional[datetime] = None
+    
+    def to_dict(self) -> Dict[str, Any]:
+        return {
+            "source": self.source.to_dict(),
+            "connected_platforms": [p.value for p in self.connected_platforms],
+            "home_channels": {
+                p.value: hc.to_dict() for p, hc in self.home_channels.items()
+            },
+            "session_key": self.session_key,
+            "session_id": self.session_id,
+            "created_at": self.created_at.isoformat() if self.created_at else None,
+            "updated_at": self.updated_at.isoformat() if self.updated_at else None,
+        }
+
+
+def build_session_context_prompt(context: SessionContext) -> str:
+    """
+    Build the dynamic system prompt section that tells the agent about its context.
+    
+    This is injected into the system prompt so the agent knows:
+    - Where messages are coming from
+    - What platforms are connected
+    - Where it can deliver scheduled task outputs
+    """
+    lines = [
+        "## Current Session Context",
+        "",
+    ]
+    
+    # Source info
+    platform_name = context.source.platform.value.title()
+    if context.source.platform == Platform.LOCAL:
+        lines.append(f"**Source:** {platform_name} (the machine running this agent)")
+    else:
+        lines.append(f"**Source:** {platform_name} ({context.source.description})")
+    
+    # Connected platforms
+    platforms_list = ["local (files on this machine)"]
+    for p in context.connected_platforms:
+        if p != Platform.LOCAL:
+            platforms_list.append(f"{p.value}: Connected ✓")
+    
+    lines.append(f"**Connected Platforms:** {', '.join(platforms_list)}")
+    
+    # Home channels
+    if context.home_channels:
+        lines.append("")
+        lines.append("**Home Channels (default destinations):**")
+        for platform, home in context.home_channels.items():
+            lines.append(f"  - {platform.value}: {home.name} (ID: {home.chat_id})")
+    
+    # Delivery options for scheduled tasks
+    lines.append("")
+    lines.append("**Delivery options for scheduled tasks:**")
+    
+    # Origin delivery
+    if context.source.platform == Platform.LOCAL:
+        lines.append("- `\"origin\"` → Local output (saved to files)")
+    else:
+        lines.append(f"- `\"origin\"` → Back to this chat ({context.source.chat_name or context.source.chat_id})")
+    
+    # Local always available
+    lines.append("- `\"local\"` → Save to local files only (~/.hermes/cron/output/)")
+    
+    # Platform home channels
+    for platform, home in context.home_channels.items():
+        lines.append(f"- `\"{platform.value}\"` → Home channel ({home.name})")
+    
+    # Note about explicit targeting
+    lines.append("")
+    lines.append("*For explicit targeting, use `\"platform:chat_id\"` format if the user provides a specific chat ID.*")
+    
+    return "\n".join(lines)
+
+
+@dataclass
+class SessionEntry:
+    """
+    Entry in the session store.
+    
+    Maps a session key to its current session ID and metadata.
+    """
+    session_key: str
+    session_id: str
+    created_at: datetime
+    updated_at: datetime
+    
+    # Origin metadata for delivery routing
+    origin: Optional[SessionSource] = None
+    
+    # Display metadata
+    display_name: Optional[str] = None
+    platform: Optional[Platform] = None
+    chat_type: str = "dm"
+    
+    # Token tracking
+    input_tokens: int = 0
+    output_tokens: int = 0
+    total_tokens: int = 0
+    
+    def to_dict(self) -> Dict[str, Any]:
+        result = {
+            "session_key": self.session_key,
+            "session_id": self.session_id,
+            "created_at": self.created_at.isoformat(),
+            "updated_at": self.updated_at.isoformat(),
+            "display_name": self.display_name,
+            "platform": self.platform.value if self.platform else None,
+            "chat_type": self.chat_type,
+            "input_tokens": self.input_tokens,
+            "output_tokens": self.output_tokens,
+            "total_tokens": self.total_tokens,
+        }
+        if self.origin:
+            result["origin"] = self.origin.to_dict()
+        return result
+    
+    @classmethod
+    def from_dict(cls, data: Dict[str, Any]) -> "SessionEntry":
+        origin = None
+        if "origin" in data and data["origin"]:
+            origin = SessionSource.from_dict(data["origin"])
+        
+        platform = None
+        if data.get("platform"):
+            try:
+                platform = Platform(data["platform"])
+            except ValueError:
+                pass
+        
+        return cls(
+            session_key=data["session_key"],
+            session_id=data["session_id"],
+            created_at=datetime.fromisoformat(data["created_at"]),
+            updated_at=datetime.fromisoformat(data["updated_at"]),
+            origin=origin,
+            display_name=data.get("display_name"),
+            platform=platform,
+            chat_type=data.get("chat_type", "dm"),
+            input_tokens=data.get("input_tokens", 0),
+            output_tokens=data.get("output_tokens", 0),
+            total_tokens=data.get("total_tokens", 0),
+        )
+
+
+class SessionStore:
+    """
+    Manages session storage and retrieval.
+    
+    Sessions are stored in:
+    - sessions.json: Index mapping session keys to session IDs
+    - {session_id}.jsonl: Conversation transcripts
+    """
+    
+    def __init__(self, sessions_dir: Path, config: GatewayConfig):
+        self.sessions_dir = sessions_dir
+        self.config = config
+        self._entries: Dict[str, SessionEntry] = {}
+        self._loaded = False
+    
+    def _ensure_loaded(self) -> None:
+        """Load sessions from disk if not already loaded."""
+        if self._loaded:
+            return
+        
+        self.sessions_dir.mkdir(parents=True, exist_ok=True)
+        sessions_file = self.sessions_dir / "sessions.json"
+        
+        if sessions_file.exists():
+            try:
+                with open(sessions_file, "r") as f:
+                    data = json.load(f)
+                    for key, entry_data in data.items():
+                        self._entries[key] = SessionEntry.from_dict(entry_data)
+            except Exception as e:
+                print(f"[gateway] Warning: Failed to load sessions: {e}")
+        
+        self._loaded = True
+    
+    def _save(self) -> None:
+        """Save sessions index to disk."""
+        self.sessions_dir.mkdir(parents=True, exist_ok=True)
+        sessions_file = self.sessions_dir / "sessions.json"
+        
+        data = {key: entry.to_dict() for key, entry in self._entries.items()}
+        with open(sessions_file, "w") as f:
+            json.dump(data, f, indent=2)
+    
+    def _generate_session_key(self, source: SessionSource) -> str:
+        """Generate a session key from a source."""
+        platform = source.platform.value
+        
+        if source.chat_type == "dm":
+            # DMs share the main session per platform
+            return f"agent:main:{platform}:dm"
+        else:
+            # Groups/channels get their own keys
+            return f"agent:main:{platform}:{source.chat_type}:{source.chat_id}"
+    
+    def _should_reset(self, entry: SessionEntry, source: SessionSource) -> bool:
+        """
+        Check if a session should be reset based on policy.
+        
+        Returns True if the session is stale and should start fresh.
+        """
+        policy = self.config.get_reset_policy(
+            platform=source.platform,
+            session_type=source.chat_type
+        )
+        
+        now = datetime.now()
+        
+        # Check idle timeout
+        if policy.mode in ("idle", "both"):
+            idle_deadline = entry.updated_at + timedelta(minutes=policy.idle_minutes)
+            if now > idle_deadline:
+                return True
+        
+        # Check daily reset
+        if policy.mode in ("daily", "both"):
+            # Find the most recent reset boundary
+            today_reset = now.replace(
+                hour=policy.at_hour, 
+                minute=0, 
+                second=0, 
+                microsecond=0
+            )
+            if now.hour < policy.at_hour:
+                # Reset boundary was yesterday
+                today_reset -= timedelta(days=1)
+            
+            if entry.updated_at < today_reset:
+                return True
+        
+        return False
+    
+    def get_or_create_session(
+        self, 
+        source: SessionSource,
+        force_new: bool = False
+    ) -> SessionEntry:
+        """
+        Get an existing session or create a new one.
+        
+        Evaluates reset policy to determine if the existing session is stale.
+        """
+        self._ensure_loaded()
+        
+        session_key = self._generate_session_key(source)
+        now = datetime.now()
+        
+        # Check for existing session
+        if session_key in self._entries and not force_new:
+            entry = self._entries[session_key]
+            
+            # Check if session should be reset
+            if not self._should_reset(entry, source):
+                # Update timestamp and return existing
+                entry.updated_at = now
+                self._save()
+                return entry
+        
+        # Create new session
+        session_id = f"{now.strftime('%Y%m%d_%H%M%S')}_{uuid.uuid4().hex[:8]}"
+        
+        entry = SessionEntry(
+            session_key=session_key,
+            session_id=session_id,
+            created_at=now,
+            updated_at=now,
+            origin=source,
+            display_name=source.chat_name,
+            platform=source.platform,
+            chat_type=source.chat_type,
+        )
+        
+        self._entries[session_key] = entry
+        self._save()
+        
+        return entry
+    
+    def update_session(
+        self, 
+        session_key: str,
+        input_tokens: int = 0,
+        output_tokens: int = 0
+    ) -> None:
+        """Update a session's metadata after an interaction."""
+        self._ensure_loaded()
+        
+        if session_key in self._entries:
+            entry = self._entries[session_key]
+            entry.updated_at = datetime.now()
+            entry.input_tokens += input_tokens
+            entry.output_tokens += output_tokens
+            entry.total_tokens = entry.input_tokens + entry.output_tokens
+            self._save()
+    
+    def reset_session(self, session_key: str) -> Optional[SessionEntry]:
+        """Force reset a session, creating a new session ID."""
+        self._ensure_loaded()
+        
+        if session_key not in self._entries:
+            return None
+        
+        old_entry = self._entries[session_key]
+        now = datetime.now()
+        session_id = f"{now.strftime('%Y%m%d_%H%M%S')}_{uuid.uuid4().hex[:8]}"
+        
+        new_entry = SessionEntry(
+            session_key=session_key,
+            session_id=session_id,
+            created_at=now,
+            updated_at=now,
+            origin=old_entry.origin,
+            display_name=old_entry.display_name,
+            platform=old_entry.platform,
+            chat_type=old_entry.chat_type,
+        )
+        
+        self._entries[session_key] = new_entry
+        self._save()
+        
+        return new_entry
+    
+    def list_sessions(self, active_minutes: Optional[int] = None) -> List[SessionEntry]:
+        """
+        List all sessions, optionally filtered by activity.
+        
+        Args:
+            active_minutes: If provided, only return sessions updated within this many minutes
+        """
+        self._ensure_loaded()
+        
+        entries = list(self._entries.values())
+        
+        if active_minutes is not None:
+            cutoff = datetime.now() - timedelta(minutes=active_minutes)
+            entries = [e for e in entries if e.updated_at >= cutoff]
+        
+        # Sort by most recently updated
+        entries.sort(key=lambda e: e.updated_at, reverse=True)
+        
+        return entries
+    
+    def get_transcript_path(self, session_id: str) -> Path:
+        """Get the path to a session's transcript file."""
+        return self.sessions_dir / f"{session_id}.jsonl"
+    
+    def append_to_transcript(self, session_id: str, message: Dict[str, Any]) -> None:
+        """Append a message to a session's transcript."""
+        transcript_path = self.get_transcript_path(session_id)
+        
+        with open(transcript_path, "a") as f:
+            f.write(json.dumps(message, ensure_ascii=False) + "\n")
+    
+    def load_transcript(self, session_id: str) -> List[Dict[str, Any]]:
+        """Load all messages from a session's transcript."""
+        transcript_path = self.get_transcript_path(session_id)
+        
+        if not transcript_path.exists():
+            return []
+        
+        messages = []
+        with open(transcript_path, "r") as f:
+            for line in f:
+                line = line.strip()
+                if line:
+                    messages.append(json.loads(line))
+        
+        return messages
+
+
+def build_session_context(
+    source: SessionSource,
+    config: GatewayConfig,
+    session_entry: Optional[SessionEntry] = None
+) -> SessionContext:
+    """
+    Build a full session context from a source and config.
+    
+    This is used to inject context into the agent's system prompt.
+    """
+    connected = config.get_connected_platforms()
+    
+    home_channels = {}
+    for platform in connected:
+        home = config.get_home_channel(platform)
+        if home:
+            home_channels[platform] = home
+    
+    context = SessionContext(
+        source=source,
+        connected_platforms=connected,
+        home_channels=home_channels,
+    )
+    
+    if session_entry:
+        context.session_key = session_entry.session_key
+        context.session_id = session_entry.session_id
+        context.created_at = session_entry.created_at
+        context.updated_at = session_entry.updated_at
+    
+    return context
--- a/34
+++ b/34
@@ -7,40 +7,6 @@ Usage: ./hermes [options]
 """

 if __name__ == "__main__":
-    """
-    Fire (google/python-fire) does not support POSIX-style short flags like `-p`.
-    We translate the most common shorthands to their long equivalents so wrapper
-    scripts can reliably use:
-      - `-p "..."`  -> `--prompt "..."` (no TUI/banner; print result and exit)
-      - `-q "..."`  -> `--query "..."`  (single-shot with banner UX)
-    """
-
-    import sys
-
-    def _rewrite_short_flags(argv: list[str]) -> list[str]:
-        rewritten: list[str] = []
-        i = 0
-        while i < len(argv):
-            arg = argv[i]
-            if arg == "-p":
-                rewritten.append("--prompt")
-                if i + 1 < len(argv):
-                    rewritten.append(argv[i + 1])
-                    i += 2
-                    continue
-            if arg == "-q":
-                rewritten.append("--query")
-                if i + 1 < len(argv):
-                    rewritten.append(argv[i + 1])
-                    i += 2
-                    continue
-            rewritten.append(arg)
-            i += 1
-        return rewritten
-
-    sys.argv = [sys.argv[0]] + _rewrite_short_flags(sys.argv[1:])
-
    from cli import main
    import fire
-
    fire.Fire(main)
--- a/hermes_agent.egg-info/PKG-INFO
+++ b/hermes_agent.egg-info/PKG-INFO
@@ -13,7 +13,6 @@ Requires-Dist: httpx
 Requires-Dist: rich
 Requires-Dist: tenacity
 Requires-Dist: pyyaml
-Requires-Dist: prompt_toolkit
 Requires-Dist: requests
 Requires-Dist: jinja2
 Requires-Dist: pydantic>=2.0
@@ -28,12 +27,15 @@ Requires-Dist: boto3; extra == "modal"
 Provides-Extra: dev
 Requires-Dist: pytest; extra == "dev"
 Requires-Dist: pytest-asyncio; extra == "dev"
-Provides-Extra: atropos
-Requires-Dist: atroposlib @ git+https://github.com/NousResearch/atropos.git ; extra == "atropos"
-Requires-Dist: aiohttp; extra == "atropos"
-Requires-Dist: fastapi; extra == "atropos"
-Requires-Dist: uvicorn; extra == "atropos"
-Requires-Dist: pyte; extra == "atropos"
+Provides-Extra: messaging
+Requires-Dist: python-telegram-bot>=20.0; extra == "messaging"
+Requires-Dist: discord.py>=2.0; extra == "messaging"
+Provides-Extra: cron
+Requires-Dist: croniter; extra == "cron"
+Provides-Extra: all
+Requires-Dist: croniter; extra == "all"
+Requires-Dist: python-telegram-bot>=20.0; extra == "all"
+Requires-Dist: discord.py>=2.0; extra == "all"

 # Hermes Agent

@@ -42,6 +44,7 @@ An AI agent with advanced tool-calling capabilities, featuring a flexible toolse
 ## Features

 - **Interactive CLI**: Beautiful terminal interface with animated feedback, personalities, and session management
+- **Messaging Gateway**: Connect to Telegram, Discord, and WhatsApp for conversational AI anywhere
 - **Web Tools**: Search, extract content, and crawl websites
 - **Terminal Tools**: Execute commands via local, Docker, Singularity, Modal, or SSH backends
 - **Browser Tools**: Automate web browsers to navigate, click, type, and extract content
@@ -50,13 +53,85 @@ An AI agent with advanced tool-calling capabilities, featuring a flexible toolse
 - **Creative Tools**: Generate images from text prompts
 - **Skills Tools**: On-demand knowledge documents with progressive disclosure
 - **Toolsets System**: Organize tools into logical groups for different scenarios
+- **Scheduled Tasks**: Cron jobs for automated agent tasks with delivery to platforms
+- **Context Compression**: Automatic summarization when approaching context limits
 - **Batch Processing**: Process datasets in parallel with checkpointing and statistics tracking
 - **Ephemeral System Prompts**: Guide model behavior without polluting training datasets

-## Quick Start (CLI)
+## Installation
+
+### Quick Install (Recommended)
+
+**Linux/macOS:**
+```bash
+curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
+```
+
+**Windows (PowerShell):**
+```powershell
+irm https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.ps1 | iex
+```
+
+This installer will:
+- Clone the repository to `~/.hermes-agent`
+- Create a virtual environment and install dependencies
+- Set up the `hermes` command in your PATH
+- Run an interactive setup wizard to configure API keys
+
+### Manual Installation
+
+If you prefer to install manually:

 ```bash
-# After setup (see below), just run:
+# Clone with submodules
+git clone --recurse-submodules https://github.com/NousResearch/Hermes-Agent.git
+cd Hermes-Agent
+
+# Run the setup script
+./setup-hermes.sh
+```
+
+Or step-by-step:
+
+```bash
+# Create and activate virtual environment
+python3 -m venv venv
+source venv/bin/activate  # Windows: venv\Scripts\activate
+
+# Install in editable mode with all extras
+pip install -e ".[all]"
+
+# Or install dependencies manually
+pip install -r requirements.txt
+pip install -e ./mini-swe-agent
+
+# Copy and configure environment
+cp .env.example .env
+# Edit .env with your API keys
+
+# Run the setup wizard
+hermes setup
+```
+
+## Quick Start
+
+Once installed, the `hermes` command is your main entry point:
+
+```bash
+hermes                    # Interactive chat (default)
+hermes chat               # Same as above
+hermes chat -q "Hello"    # Single query, then exit
+hermes setup              # Configure API keys and settings
+hermes status             # Show configuration status
+hermes doctor             # Diagnose issues
+hermes gateway            # Start messaging gateway (Telegram/Discord/WhatsApp)
+hermes cron daemon        # Run cron job scheduler
+hermes version            # Show version info
+```
+
+**Legacy `./hermes` script:**
+```bash
+# The old CLI script still works:
 ./hermes

 # Or with options:
@@ -70,35 +145,9 @@ The CLI provides:
 - Customizable personalities (`/personality kawaii`, `/personality pirate`, etc.)
 - Persistent configuration via `cli-config.yaml`

-## Setup
+## Configuration

-### 1. Clone the Repository
-```bash
-# Clone with submodules (recommended)
-git clone --recurse-submodules https://github.com/NousResearch/Hermes-Agent.git
-cd Hermes-Agent
-
-# Or if already cloned without submodules:
-git submodule update --init --recursive
-```
-
-### 2. Install Dependencies
-```bash
-# Create and activate virtual environment (recommended)
-python3 -m venv venv
-source venv/bin/activate  # On Windows: venv\Scripts\activate
-
-# Install Python packages
-pip install -r requirements.txt
-
-# Install mini-swe-agent for terminal tools
-pip install -e ./mini-swe-agent
-
-# Install Node.js dependencies for browser tools (requires Node.js)
-npm install
-```
-
-### 3. Configure Environment Variables
+### Environment Variables
 ```bash
 # Copy the example environment file
 cp .env.example .env
@@ -327,6 +376,169 @@ logs/
 - **Trajectory Format**: Uses the same format as batch processing for consistency
 - **Git Ignored**: `logs/` is in `.gitignore` so logs aren't committed

+## Context Compression
+
+Long conversations can exceed the model's context limit. Hermes Agent automatically compresses context when approaching the limit:
+
+**How it works:**
+1. Tracks actual token usage from API responses (`usage.prompt_tokens`)
+2. When tokens reach 85% of model's context limit, triggers compression
+3. Protects first 3 turns (system prompt, initial request, first response)
+4. Protects last 4 turns (recent context is most relevant)
+5. Summarizes middle turns using a fast/cheap model (Gemini Flash)
+6. Inserts summary as a user message, conversation continues seamlessly
+
+**Configuration (`cli-config.yaml`):**
+```yaml
+compression:
+  enabled: true                    # Enable auto-compression (default)
+  threshold: 0.85                  # Compress at 85% of context limit
+  summary_model: "google/gemini-2.0-flash-001"
+```
+
+**Or via environment variables:**
+```bash
+CONTEXT_COMPRESSION_ENABLED=true
+CONTEXT_COMPRESSION_THRESHOLD=0.85
+CONTEXT_COMPRESSION_MODEL=google/gemini-2.0-flash-001
+```
+
+**When compression triggers, you'll see:**
+```
+📦 Context compression triggered (170,000 tokens ≥ 170,000 threshold)
+   📊 Model context limit: 200,000 tokens (85% = 170,000)
+   🗜️  Summarizing turns 4-15 (12 turns)
+   ✅ Compressed: 20 → 9 messages (~45,000 tokens saved)
+```
+
+## Scheduled Tasks (Cron Jobs)
+
+Hermes Agent can schedule automated tasks to run in the future - either one-time reminders or recurring jobs.
+
+### CLI Commands
+
+```bash
+# List scheduled jobs
+/cron
+
+# Add a one-shot reminder (runs once in 30 minutes)
+/cron add 30m Remind me to check the build status
+
+# Add a recurring job (every 2 hours)
+/cron add "every 2h" Check server status at 192.168.1.100 and report any issues
+
+# Add a cron expression (daily at 9am)
+/cron add "0 9 * * *" Generate a morning briefing summarizing GitHub notifications
+
+# Remove a job
+/cron remove abc123def456
+```
+
+### Agent Self-Scheduling
+
+The agent can also schedule its own follow-up tasks using tools:
+
+```python
+# Available when using hermes-cli toolset (default for CLI)
+schedule_cronjob(prompt="...", schedule="30m", repeat=1)  # One-shot
+schedule_cronjob(prompt="...", schedule="every 2h")       # Recurring
+list_cronjobs()                                            # View all jobs
+remove_cronjob(job_id="...")                              # Cancel a job
+```
+
+**⚠️ Important:** Cronjobs run in **isolated sessions with NO prior context**. The prompt must be completely self-contained with all necessary information (file paths, URLs, server addresses, etc.). The future agent will not remember anything from the current conversation.
+
+### Schedule Formats
+
+| Format | Example | Description |
+|--------|---------|-------------|
+| Duration | `30m`, `2h`, `1d` | One-shot delay from now |
+| Interval | `every 30m`, `every 2h` | Recurring at fixed intervals |
+| Cron | `0 9 * * *` | Cron expression (requires `croniter`) |
+| Timestamp | `2026-02-03T14:00` | One-shot at specific time |
+
+### Repeat Options
+
+| repeat | Behavior |
+|--------|----------|
+| (omitted) | One-shot schedules run once; intervals/cron run forever |
+| `1` | Run once then auto-delete |
+| `N` | Run N times then auto-delete |
+
+### Running the Cron Daemon
+
+Jobs are stored in `~/.hermes/cron/jobs.json` and executed by a scheduler:
+
+```bash
+# Option 1: Built-in daemon (checks every 60 seconds)
+python cli.py --cron-daemon
+
+# Option 2: System cron integration (run once per minute)
+# Add to crontab: crontab -e
+*/1 * * * * cd ~/hermes-agent && python cli.py --cron-tick-once >> ~/.hermes/cron/cron.log 2>&1
+```
+
+### Job Output
+
+Job outputs are saved to `~/.hermes/cron/output/{job_id}/{timestamp}.md` for review.
+
+## Messaging Gateway (Telegram, Discord, WhatsApp)
+
+Connect Hermes Agent to messaging platforms so you can chat from anywhere.
+
+### Quick Start
+
+```bash
+# 1. Add your bot token to .env
+echo 'TELEGRAM_BOT_TOKEN="your_token"' >> .env
+
+# 2. Test the gateway (foreground)
+./scripts/hermes-gateway run
+
+# 3. Install as a background service
+./scripts/hermes-gateway install
+
+# 4. Manage the service
+./scripts/hermes-gateway start   # Start
+./scripts/hermes-gateway stop    # Stop
+./scripts/hermes-gateway status  # Check status
+```
+
+### Supported Platforms
+
+| Platform | Setup | Toolset |
+|----------|-------|---------|
+| Telegram | Bot via @BotFather | `hermes-telegram` |
+| Discord | Bot via Developer Portal | `hermes-discord` |
+| WhatsApp | Node.js bridge | `hermes-whatsapp` |
+
+### Session Management
+
+- Sessions persist across messages (agent remembers context)
+- Reset policies: daily (4am), idle (2 hours), or both
+- Manual reset: send `/new` or `/reset`
+
+### Cron Job Delivery
+
+Schedule tasks that deliver to specific platforms:
+
+```python
+schedule_cronjob(
+    prompt="Check server status...",
+    schedule="every 1h",
+    deliver="telegram"  # or "origin", "discord", etc.
+)
+```
+
+### CLI Commands
+
+| Command | Description |
+|---------|-------------|
+| `/platforms` | Show gateway configuration status |
+| `--gateway` | Start the gateway (CLI flag) |
+
+See [docs/messaging.md](docs/messaging.md) for full setup instructions.
+
 ## Interactive CLI

 The CLI provides a rich interactive experience for working with the agent.
@@ -359,6 +571,8 @@ The CLI provides a rich interactive experience for working with the agent.
 | `/history` | Show conversation history |
 | `/save` | Save current conversation to file |
 | `/config` | Show current configuration |
+| `/cron` | Manage scheduled tasks (list, add, remove) |
+| `/platforms` | Show gateway/messaging platform status |
 | `/quit` | Exit the CLI |

 ### Configuration
@@ -616,6 +830,11 @@ All environment variables can be configured in the `.env` file (copy from `.env.
 - `TERMINAL_SSH_PORT`: SSH port (default: `22`)
 - `TERMINAL_SSH_KEY`: Path to SSH private key (optional, uses ssh-agent if not set)

+**Context Compression (auto-shrinks long conversations):**
+- `CONTEXT_COMPRESSION_ENABLED`: Enable auto-compression (default: `true`)
+- `CONTEXT_COMPRESSION_THRESHOLD`: Compress at this % of context limit (default: `0.85`)
+- `CONTEXT_COMPRESSION_MODEL`: Model for generating summaries (default: `google/gemini-2.0-flash-001`)
+
 **Browser Tool Configuration (agent-browser + Browserbase):**
 - `BROWSERBASE_API_KEY`: Browserbase API key for cloud browser execution
 - `BROWSERBASE_PROJECT_ID`: Browserbase project ID
@@ -647,13 +866,3 @@ All environment variables can be configured in the `.env` file (copy from `.env.
 | `skills/` | On-demand knowledge documents |
 | `docs/` | Documentation |
 | `configs/` | Example batch run scripts |
-
-# Atropos Integrations & RL Training
-
-## Nomad Setup
-Follow this: https://developer.hashicorp.com/nomad/docs/deploy
-
-## Atropos dependencies
-python3 -m venv .venv
-source .venv/bin/activate
-pip install -e '.[atropos]'
--- a/hermes_agent.egg-info/SOURCES.txt
+++ b/hermes_agent.egg-info/SOURCES.txt
@@ -1,66 +1,43 @@
 README.md
-atropos_compatible_agent.py
 batch_runner.py
-local_server.py
+cli.py
 model_tools.py
 pyproject.toml
 run_agent.py
 toolset_distributions.py
 toolsets.py
 trajectory_compressor.py
-atropos/__init__.py
-atropos/sandbox_server.py
-atropos/agent/__init__.py
-atropos/agent/atropos_agent.py
-atropos/api/__init__.py
-atropos/api/tool_executor_server.py
-atropos/api/tool_server.py
-atropos/backends/__init__.py
-atropos/backends/base.py
-atropos/backends/modal_backend.py
-atropos/backends/nomad_backend.py
-atropos/envs/__init__.py
-atropos/envs/agent_env.py
-atropos/envs/hermes_compat_test_env.py
-atropos/envs/sandbox_terminal_smoke_env.py
-atropos/envs/swe_smith_oracle_env.py
-atropos/envs/test_env.py
-atropos/envs/toolserver_smoke_env.py
-atropos/nomad/__init__.py
-atropos/nomad/client.py
-atropos/slots/__init__.py
-atropos/slots/executor.py
-atropos/slots/pool.py
-atropos/slots/slot.py
-atropos/terminal/__init__.py
-atropos/terminal/asciinema_stream.py
-atropos/tools/__init__.py
-atropos/tools/base.py
-atropos/tools/build_registry.py
-atropos/tools/hermes_external_tools.py
-atropos/tools/sandbox_stubs.py
-atropos/tools/terminal_stateful_tool.py
-atropos/tools/tmux_tool.py
-atropos/tools/tool_executor.py
-atropos/tools/toolset_resolver.py
+cron/__init__.py
+cron/jobs.py
+cron/scheduler.py
+gateway/__init__.py
+gateway/config.py
+gateway/delivery.py
+gateway/run.py
+gateway/session.py
 hermes_agent.egg-info/PKG-INFO
 hermes_agent.egg-info/SOURCES.txt
 hermes_agent.egg-info/dependency_links.txt
 hermes_agent.egg-info/entry_points.txt
 hermes_agent.egg-info/requires.txt
 hermes_agent.egg-info/top_level.txt
+hermes_cli/__init__.py
+hermes_cli/cron.py
+hermes_cli/doctor.py
+hermes_cli/gateway.py
+hermes_cli/main.py
+hermes_cli/setup.py
+hermes_cli/status.py
 tests/test_batch_runner.py
 tests/test_checkpoint_resumption.py
-tests/test_modal_integration.py
-tests/test_modal_stress.py
 tests/test_modal_terminal.py
 tests/test_nous_api_limits.py
 tests/test_nous_api_pattern.py
 tests/test_temperature_fix.py
-tests/test_tool_call_parsing.py
 tests/test_web_tools.py
 tools/__init__.py
 tools/browser_tool.py
+tools/cronjob_tools.py
 tools/image_generation_tool.py
 tools/mixture_of_agents_tool.py
 tools/skills_tool.py
--- a/hermes_agent.egg-info/entry_points.txt
+++ b/hermes_agent.egg-info/entry_points.txt
@@ -1,4 +1,3 @@
 [console_scripts]
+hermes = hermes_cli.main:main
 hermes-agent = run_agent:main
-hermes-atropos-sandbox-smoke = atropos.envs.sandbox_terminal_smoke_env:SandboxTerminalSmokeEnv.cli
-hermes-atropos-toolserver-smoke = atropos.envs.toolserver_smoke_env:ToolServerSmokeEnv.cli
--- a/hermes_agent.egg-info/requires.txt
+++ b/hermes_agent.egg-info/requires.txt
@@ -5,7 +5,6 @@ httpx
 rich
 tenacity
 pyyaml
-prompt_toolkit
 requests
 jinja2
 pydantic>=2.0
@@ -15,17 +14,22 @@ litellm>=1.75.5
 typer
 platformdirs

-[atropos]
-atroposlib @ git+https://github.com/NousResearch/atropos.git
-aiohttp
-fastapi
-uvicorn
-pyte
+[all]
+croniter
+python-telegram-bot>=20.0
+discord.py>=2.0
+
+[cron]
+croniter

 [dev]
 pytest
 pytest-asyncio

+[messaging]
+python-telegram-bot>=20.0
+discord.py>=2.0
+
 [modal]
 modal
 boto3
--- a/hermes_agent.egg-info/top_level.txt
+++ b/hermes_agent.egg-info/top_level.txt
@@ -1,7 +1,8 @@
-atropos
-atropos_compatible_agent
 batch_runner
-local_server
+cli
+cron
+gateway
+hermes_cli
 model_tools
 run_agent
 tools
--- a/hermes_cli/init.py
+++ b/hermes_cli/init.py
@@ -0,0 +1,14 @@
+"""
+Hermes CLI - Unified command-line interface for Hermes Agent.
+
+Provides subcommands for:
+- hermes chat          - Interactive chat (same as ./hermes)
+- hermes gateway       - Run gateway in foreground
+- hermes gateway start - Start gateway service
+- hermes gateway stop  - Stop gateway service  
+- hermes setup         - Interactive setup wizard
+- hermes status        - Show status of all components
+- hermes cron          - Manage cron jobs
+"""
+
+__version__ = "0.1.0"
--- a/hermes_cli/config.py
+++ b/hermes_cli/config.py
@@ -0,0 +1,785 @@
+"""
+Configuration management for Hermes Agent.
+
+Config files are stored in ~/.hermes/ for easy access:
+- ~/.hermes/config.yaml  - All settings (model, toolsets, terminal, etc.)
+- ~/.hermes/.env         - API keys and secrets
+
+This module provides:
+- hermes config          - Show current configuration
+- hermes config edit     - Open config in editor
+- hermes config set      - Set a specific value
+- hermes config wizard   - Re-run setup wizard
+"""
+
+import os
+import sys
+import subprocess
+from pathlib import Path
+from typing import Dict, Any, Optional, List, Tuple
+
+import yaml
+
+# ANSI colors
+class Colors:
+    RESET = "\033[0m"
+    BOLD = "\033[1m"
+    DIM = "\033[2m"
+    RED = "\033[31m"
+    GREEN = "\033[32m"
+    YELLOW = "\033[33m"
+    BLUE = "\033[34m"
+    MAGENTA = "\033[35m"
+    CYAN = "\033[36m"
+
+def color(text: str, *codes) -> str:
+    if not sys.stdout.isatty():
+        return text
+    return "".join(codes) + text + Colors.RESET
+
+
+# =============================================================================
+# Config paths
+# =============================================================================
+
+def get_hermes_home() -> Path:
+    """Get the Hermes home directory (~/.hermes)."""
+    return Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
+
+def get_config_path() -> Path:
+    """Get the main config file path."""
+    return get_hermes_home() / "config.yaml"
+
+def get_env_path() -> Path:
+    """Get the .env file path (for API keys)."""
+    return get_hermes_home() / ".env"
+
+def get_project_root() -> Path:
+    """Get the project installation directory."""
+    return Path(__file__).parent.parent.resolve()
+
+def ensure_hermes_home():
+    """Ensure ~/.hermes directory structure exists."""
+    home = get_hermes_home()
+    (home / "cron").mkdir(parents=True, exist_ok=True)
+    (home / "sessions").mkdir(parents=True, exist_ok=True)
+    (home / "logs").mkdir(parents=True, exist_ok=True)
+
+
+# =============================================================================
+# Config loading/saving
+# =============================================================================
+
+DEFAULT_CONFIG = {
+    "model": "anthropic/claude-sonnet-4.5",
+    "toolsets": ["hermes-cli"],
+    "max_turns": 100,
+    
+    "terminal": {
+        "backend": "local",
+        "cwd": ".",  # Use current directory
+        "timeout": 180,
+        "docker_image": "nikolaik/python-nodejs:python3.11-nodejs20",
+        "singularity_image": "docker://nikolaik/python-nodejs:python3.11-nodejs20",
+        "modal_image": "nikolaik/python-nodejs:python3.11-nodejs20",
+    },
+    
+    "browser": {
+        "inactivity_timeout": 120,
+    },
+    
+    "compression": {
+        "enabled": True,
+        "threshold": 0.85,
+        "summary_model": "google/gemini-2.0-flash-001",
+    },
+    
+    "display": {
+        "compact": False,
+        "personality": "kawaii",
+    },
+    
+    # Permanently allowed dangerous command patterns (added via "always" approval)
+    "command_allowlist": [],
+    
+    # Config schema version - bump this when adding new required fields
+    "_config_version": 1,
+}
+
+# =============================================================================
+# Config Migration System
+# =============================================================================
+
+# Required environment variables with metadata for migration prompts
+REQUIRED_ENV_VARS = {
+    "OPENROUTER_API_KEY": {
+        "description": "OpenRouter API key (required for vision, web scraping, and tools)",
+        "prompt": "OpenRouter API key",
+        "url": "https://openrouter.ai/keys",
+        "required": True,
+        "password": True,
+    },
+}
+
+# Optional environment variables that enhance functionality
+OPTIONAL_ENV_VARS = {
+    "FIRECRAWL_API_KEY": {
+        "description": "Firecrawl API key for web search and scraping",
+        "prompt": "Firecrawl API key",
+        "url": "https://firecrawl.dev/",
+        "tools": ["web_search", "web_extract"],
+        "password": True,
+    },
+    "BROWSERBASE_API_KEY": {
+        "description": "Browserbase API key for browser automation",
+        "prompt": "Browserbase API key", 
+        "url": "https://browserbase.com/",
+        "tools": ["browser_navigate", "browser_click", "etc."],
+        "password": True,
+    },
+    "BROWSERBASE_PROJECT_ID": {
+        "description": "Browserbase project ID",
+        "prompt": "Browserbase project ID",
+        "url": "https://browserbase.com/",
+        "tools": ["browser_navigate", "browser_click", "etc."],
+        "password": False,
+    },
+    "FAL_KEY": {
+        "description": "FAL API key for image generation",
+        "prompt": "FAL API key",
+        "url": "https://fal.ai/",
+        "tools": ["image_generate"],
+        "password": True,
+    },
+    "TINKER_API_KEY": {
+        "description": "Tinker API key for RL training",
+        "prompt": "Tinker API key",
+        "url": "https://tinker-console.thinkingmachines.ai/keys",
+        "tools": ["rl_start_training", "rl_check_status", "rl_stop_training"],
+        "password": True,
+    },
+    "WANDB_API_KEY": {
+        "description": "Weights & Biases API key for experiment tracking",
+        "prompt": "WandB API key",
+        "url": "https://wandb.ai/authorize",
+        "tools": ["rl_get_results", "rl_check_status"],
+        "password": True,
+    },
+    "OPENAI_BASE_URL": {
+        "description": "Custom OpenAI-compatible API endpoint URL",
+        "prompt": "API base URL (e.g., https://api.example.com/v1)",
+        "url": None,
+        "password": False,
+    },
+    "OPENAI_API_KEY": {
+        "description": "API key for custom OpenAI-compatible endpoint",
+        "prompt": "API key for custom endpoint",
+        "url": None,
+        "password": True,
+    },
+    # Messaging platform tokens
+    "TELEGRAM_BOT_TOKEN": {
+        "description": "Telegram bot token from @BotFather",
+        "prompt": "Telegram bot token",
+        "url": "https://t.me/BotFather",
+        "password": True,
+    },
+    "TELEGRAM_ALLOWED_USERS": {
+        "description": "Comma-separated Telegram user IDs allowed to use the bot (get ID from @userinfobot)",
+        "prompt": "Allowed Telegram user IDs (comma-separated)",
+        "url": "https://t.me/userinfobot",
+        "password": False,
+    },
+    "DISCORD_BOT_TOKEN": {
+        "description": "Discord bot token from Developer Portal",
+        "prompt": "Discord bot token",
+        "url": "https://discord.com/developers/applications",
+        "password": True,
+    },
+    "DISCORD_ALLOWED_USERS": {
+        "description": "Comma-separated Discord user IDs allowed to use the bot",
+        "prompt": "Allowed Discord user IDs (comma-separated)",
+        "url": None,
+        "password": False,
+    },
+    # Terminal configuration
+    "MESSAGING_CWD": {
+        "description": "Working directory for terminal commands via messaging (Telegram/Discord/etc). CLI always uses current directory.",
+        "prompt": "Messaging working directory (default: home)",
+        "url": None,
+        "password": False,
+    },
+    "SUDO_PASSWORD": {
+        "description": "Sudo password for terminal commands requiring root access",
+        "prompt": "Sudo password",
+        "url": None,
+        "password": True,
+    },
+    # Agent configuration
+    "HERMES_MAX_ITERATIONS": {
+        "description": "Maximum tool-calling iterations per conversation (default: 60)",
+        "prompt": "Max iterations",
+        "url": None,
+        "password": False,
+    },
+    "HERMES_TOOL_PROGRESS": {
+        "description": "Send tool progress messages in messaging channels (true/false)",
+        "prompt": "Enable tool progress messages",
+        "url": None,
+        "password": False,
+    },
+    "HERMES_TOOL_PROGRESS_MODE": {
+        "description": "Progress mode: 'all' (every tool) or 'new' (only when tool changes)",
+        "prompt": "Progress mode (all/new)",
+        "url": None,
+        "password": False,
+    },
+}
+
+
+def get_missing_env_vars(required_only: bool = False) -> List[Dict[str, Any]]:
+    """
+    Check which environment variables are missing.
+    
+    Returns list of dicts with var info for missing variables.
+    """
+    missing = []
+    
+    # Check required vars
+    for var_name, info in REQUIRED_ENV_VARS.items():
+        if not get_env_value(var_name):
+            missing.append({"name": var_name, **info, "is_required": True})
+    
+    # Check optional vars (if not required_only)
+    if not required_only:
+        for var_name, info in OPTIONAL_ENV_VARS.items():
+            if not get_env_value(var_name):
+                missing.append({"name": var_name, **info, "is_required": False})
+    
+    return missing
+
+
+def get_missing_config_fields() -> List[Dict[str, Any]]:
+    """
+    Check which config fields are missing or outdated.
+    
+    Returns list of missing/outdated fields.
+    """
+    config = load_config()
+    missing = []
+    
+    # Check for new top-level keys in DEFAULT_CONFIG
+    for key, default_value in DEFAULT_CONFIG.items():
+        if key.startswith('_'):
+            continue  # Skip internal keys
+        if key not in config:
+            missing.append({
+                "key": key,
+                "default": default_value,
+                "description": f"New config section: {key}",
+            })
+        elif isinstance(default_value, dict):
+            # Check nested keys
+            for subkey, subvalue in default_value.items():
+                if subkey not in config.get(key, {}):
+                    missing.append({
+                        "key": f"{key}.{subkey}",
+                        "default": subvalue,
+                        "description": f"New config option: {key}.{subkey}",
+                    })
+    
+    return missing
+
+
+def check_config_version() -> Tuple[int, int]:
+    """
+    Check config version.
+    
+    Returns (current_version, latest_version).
+    """
+    config = load_config()
+    current = config.get("_config_version", 0)
+    latest = DEFAULT_CONFIG.get("_config_version", 1)
+    return current, latest
+
+
+def migrate_config(interactive: bool = True, quiet: bool = False) -> Dict[str, Any]:
+    """
+    Migrate config to latest version, prompting for new required fields.
+    
+    Args:
+        interactive: If True, prompt user for missing values
+        quiet: If True, suppress output
+        
+    Returns:
+        Dict with migration results: {"env_added": [...], "config_added": [...], "warnings": [...]}
+    """
+    results = {"env_added": [], "config_added": [], "warnings": []}
+    
+    # Check config version
+    current_ver, latest_ver = check_config_version()
+    
+    if current_ver < latest_ver and not quiet:
+        print(f"Config version: {current_ver} → {latest_ver}")
+    
+    # Check for missing required env vars
+    missing_env = get_missing_env_vars(required_only=True)
+    
+    if missing_env and not quiet:
+        print("\n⚠️  Missing required environment variables:")
+        for var in missing_env:
+            print(f"   • {var['name']}: {var['description']}")
+    
+    if interactive and missing_env:
+        print("\nLet's configure them now:\n")
+        for var in missing_env:
+            if var.get("url"):
+                print(f"  Get your key at: {var['url']}")
+            
+            if var.get("password"):
+                import getpass
+                value = getpass.getpass(f"  {var['prompt']}: ")
+            else:
+                value = input(f"  {var['prompt']}: ").strip()
+            
+            if value:
+                save_env_value(var["name"], value)
+                results["env_added"].append(var["name"])
+                print(f"  ✓ Saved {var['name']}")
+            else:
+                results["warnings"].append(f"Skipped {var['name']} - some features may not work")
+            print()
+    
+    # Check for missing config fields
+    missing_config = get_missing_config_fields()
+    
+    if missing_config:
+        config = load_config()
+        
+        for field in missing_config:
+            key = field["key"]
+            default = field["default"]
+            
+            # Add with default value
+            if "." in key:
+                # Nested key
+                parent, child = key.split(".", 1)
+                if parent not in config:
+                    config[parent] = {}
+                config[parent][child] = default
+            else:
+                config[key] = default
+            
+            results["config_added"].append(key)
+            if not quiet:
+                print(f"  ✓ Added {key} = {default}")
+        
+        # Update version and save
+        config["_config_version"] = latest_ver
+        save_config(config)
+    elif current_ver < latest_ver:
+        # Just update version
+        config = load_config()
+        config["_config_version"] = latest_ver
+        save_config(config)
+    
+    return results
+
+
+def load_config() -> Dict[str, Any]:
+    """Load configuration from ~/.hermes/config.yaml."""
+    config_path = get_config_path()
+    
+    config = DEFAULT_CONFIG.copy()
+    
+    if config_path.exists():
+        try:
+            with open(config_path) as f:
+                user_config = yaml.safe_load(f) or {}
+            
+            # Deep merge
+            for key, value in user_config.items():
+                if isinstance(value, dict) and key in config and isinstance(config[key], dict):
+                    config[key].update(value)
+                else:
+                    config[key] = value
+        except Exception as e:
+            print(f"Warning: Failed to load config: {e}")
+    
+    return config
+
+
+def save_config(config: Dict[str, Any]):
+    """Save configuration to ~/.hermes/config.yaml."""
+    ensure_hermes_home()
+    config_path = get_config_path()
+    
+    with open(config_path, 'w') as f:
+        yaml.dump(config, f, default_flow_style=False, sort_keys=False)
+
+
+def load_env() -> Dict[str, str]:
+    """Load environment variables from ~/.hermes/.env."""
+    env_path = get_env_path()
+    env_vars = {}
+    
+    if env_path.exists():
+        with open(env_path) as f:
+            for line in f:
+                line = line.strip()
+                if line and not line.startswith('#') and '=' in line:
+                    key, _, value = line.partition('=')
+                    env_vars[key.strip()] = value.strip().strip('"\'')
+    
+    return env_vars
+
+
+def save_env_value(key: str, value: str):
+    """Save or update a value in ~/.hermes/.env."""
+    ensure_hermes_home()
+    env_path = get_env_path()
+    
+    # Load existing
+    lines = []
+    if env_path.exists():
+        with open(env_path) as f:
+            lines = f.readlines()
+    
+    # Find and update or append
+    found = False
+    for i, line in enumerate(lines):
+        if line.strip().startswith(f"{key}="):
+            lines[i] = f"{key}={value}\n"
+            found = True
+            break
+    
+    if not found:
+        lines.append(f"{key}={value}\n")
+    
+    with open(env_path, 'w') as f:
+        f.writelines(lines)
+
+
+def get_env_value(key: str) -> Optional[str]:
+    """Get a value from ~/.hermes/.env or environment."""
+    # Check environment first
+    if key in os.environ:
+        return os.environ[key]
+    
+    # Then check .env file
+    env_vars = load_env()
+    return env_vars.get(key)
+
+
+# =============================================================================
+# Config display
+# =============================================================================
+
+def redact_key(key: str) -> str:
+    """Redact an API key for display."""
+    if not key:
+        return color("(not set)", Colors.DIM)
+    if len(key) < 12:
+        return "***"
+    return key[:4] + "..." + key[-4:]
+
+
+def show_config():
+    """Display current configuration."""
+    config = load_config()
+    env_vars = load_env()
+    
+    print()
+    print(color("┌─────────────────────────────────────────────────────────┐", Colors.CYAN))
+    print(color("│              🦋 Hermes Configuration                    │", Colors.CYAN))
+    print(color("└─────────────────────────────────────────────────────────┘", Colors.CYAN))
+    
+    # Paths
+    print()
+    print(color("◆ Paths", Colors.CYAN, Colors.BOLD))
+    print(f"  Config:       {get_config_path()}")
+    print(f"  Secrets:      {get_env_path()}")
+    print(f"  Install:      {get_project_root()}")
+    
+    # API Keys
+    print()
+    print(color("◆ API Keys", Colors.CYAN, Colors.BOLD))
+    
+    keys = [
+        ("OPENROUTER_API_KEY", "OpenRouter"),
+        ("ANTHROPIC_API_KEY", "Anthropic"),
+        ("OPENAI_API_KEY", "OpenAI"),
+        ("FIRECRAWL_API_KEY", "Firecrawl"),
+        ("BROWSERBASE_API_KEY", "Browserbase"),
+        ("FAL_KEY", "FAL"),
+    ]
+    
+    for env_key, name in keys:
+        value = get_env_value(env_key)
+        print(f"  {name:<14} {redact_key(value)}")
+    
+    # Model settings
+    print()
+    print(color("◆ Model", Colors.CYAN, Colors.BOLD))
+    print(f"  Model:        {config.get('model', 'not set')}")
+    print(f"  Max turns:    {config.get('max_turns', 100)}")
+    print(f"  Toolsets:     {', '.join(config.get('toolsets', ['all']))}")
+    
+    # Terminal
+    print()
+    print(color("◆ Terminal", Colors.CYAN, Colors.BOLD))
+    terminal = config.get('terminal', {})
+    print(f"  Backend:      {terminal.get('backend', 'local')}")
+    print(f"  Working dir:  {terminal.get('cwd', '.')}")
+    print(f"  Timeout:      {terminal.get('timeout', 60)}s")
+    
+    if terminal.get('backend') == 'docker':
+        print(f"  Docker image: {terminal.get('docker_image', 'python:3.11-slim')}")
+    elif terminal.get('backend') == 'singularity':
+        print(f"  Image:        {terminal.get('singularity_image', 'docker://python:3.11')}")
+    elif terminal.get('backend') == 'modal':
+        print(f"  Modal image:  {terminal.get('modal_image', 'python:3.11')}")
+        modal_token = get_env_value('MODAL_TOKEN_ID')
+        print(f"  Modal token:  {'configured' if modal_token else '(not set)'}")
+    elif terminal.get('backend') == 'ssh':
+        ssh_host = get_env_value('TERMINAL_SSH_HOST')
+        ssh_user = get_env_value('TERMINAL_SSH_USER')
+        print(f"  SSH host:     {ssh_host or '(not set)'}")
+        print(f"  SSH user:     {ssh_user or '(not set)'}")
+    
+    # Compression
+    print()
+    print(color("◆ Context Compression", Colors.CYAN, Colors.BOLD))
+    compression = config.get('compression', {})
+    enabled = compression.get('enabled', True)
+    print(f"  Enabled:      {'yes' if enabled else 'no'}")
+    if enabled:
+        print(f"  Threshold:    {compression.get('threshold', 0.85) * 100:.0f}%")
+        print(f"  Model:        {compression.get('summary_model', 'google/gemini-2.0-flash-001')}")
+    
+    # Messaging
+    print()
+    print(color("◆ Messaging Platforms", Colors.CYAN, Colors.BOLD))
+    
+    telegram_token = get_env_value('TELEGRAM_BOT_TOKEN')
+    discord_token = get_env_value('DISCORD_BOT_TOKEN')
+    
+    print(f"  Telegram:     {'configured' if telegram_token else color('not configured', Colors.DIM)}")
+    print(f"  Discord:      {'configured' if discord_token else color('not configured', Colors.DIM)}")
+    
+    print()
+    print(color("─" * 60, Colors.DIM))
+    print(color("  hermes config edit     # Edit config file", Colors.DIM))
+    print(color("  hermes config set KEY VALUE", Colors.DIM))
+    print(color("  hermes setup           # Run setup wizard", Colors.DIM))
+    print()
+
+
+def edit_config():
+    """Open config file in user's editor."""
+    config_path = get_config_path()
+    
+    # Ensure config exists
+    if not config_path.exists():
+        save_config(DEFAULT_CONFIG)
+        print(f"Created {config_path}")
+    
+    # Find editor
+    editor = os.getenv('EDITOR') or os.getenv('VISUAL')
+    
+    if not editor:
+        # Try common editors
+        for cmd in ['nano', 'vim', 'vi', 'code', 'notepad']:
+            import shutil
+            if shutil.which(cmd):
+                editor = cmd
+                break
+    
+    if not editor:
+        print(f"No editor found. Config file is at:")
+        print(f"  {config_path}")
+        return
+    
+    print(f"Opening {config_path} in {editor}...")
+    subprocess.run([editor, str(config_path)])
+
+
+def set_config_value(key: str, value: str):
+    """Set a configuration value."""
+    # Check if it's an API key (goes to .env)
+    api_keys = [
+        'OPENROUTER_API_KEY', 'ANTHROPIC_API_KEY', 'OPENAI_API_KEY',
+        'FIRECRAWL_API_KEY', 'BROWSERBASE_API_KEY', 'BROWSERBASE_PROJECT_ID',
+        'FAL_KEY', 'TELEGRAM_BOT_TOKEN', 'DISCORD_BOT_TOKEN',
+        'TERMINAL_SSH_HOST', 'TERMINAL_SSH_USER', 'TERMINAL_SSH_KEY',
+        'SUDO_PASSWORD'
+    ]
+    
+    if key.upper() in api_keys or key.upper().startswith('TERMINAL_SSH'):
+        save_env_value(key.upper(), value)
+        print(f"✓ Set {key} in {get_env_path()}")
+        return
+    
+    # Otherwise it goes to config.yaml
+    config = load_config()
+    
+    # Handle nested keys (e.g., "terminal.backend")
+    parts = key.split('.')
+    current = config
+    
+    for part in parts[:-1]:
+        if part not in current:
+            current[part] = {}
+        current = current[part]
+    
+    # Convert value to appropriate type
+    if value.lower() in ('true', 'yes', 'on'):
+        value = True
+    elif value.lower() in ('false', 'no', 'off'):
+        value = False
+    elif value.isdigit():
+        value = int(value)
+    elif value.replace('.', '', 1).isdigit():
+        value = float(value)
+    
+    current[parts[-1]] = value
+    save_config(config)
+    print(f"✓ Set {key} = {value} in {get_config_path()}")
+
+
+# =============================================================================
+# Command handler
+# =============================================================================
+
+def config_command(args):
+    """Handle config subcommands."""
+    subcmd = getattr(args, 'config_command', None)
+    
+    if subcmd is None or subcmd == "show":
+        show_config()
+    
+    elif subcmd == "edit":
+        edit_config()
+    
+    elif subcmd == "set":
+        key = getattr(args, 'key', None)
+        value = getattr(args, 'value', None)
+        if not key or not value:
+            print("Usage: hermes config set KEY VALUE")
+            print()
+            print("Examples:")
+            print("  hermes config set model anthropic/claude-sonnet-4")
+            print("  hermes config set terminal.backend docker")
+            print("  hermes config set OPENROUTER_API_KEY sk-or-...")
+            sys.exit(1)
+        set_config_value(key, value)
+    
+    elif subcmd == "path":
+        print(get_config_path())
+    
+    elif subcmd == "env-path":
+        print(get_env_path())
+    
+    elif subcmd == "migrate":
+        print()
+        print(color("🔄 Checking configuration for updates...", Colors.CYAN, Colors.BOLD))
+        print()
+        
+        # Check what's missing
+        missing_env = get_missing_env_vars(required_only=False)
+        missing_config = get_missing_config_fields()
+        current_ver, latest_ver = check_config_version()
+        
+        if not missing_env and not missing_config and current_ver >= latest_ver:
+            print(color("✓ Configuration is up to date!", Colors.GREEN))
+            print()
+            return
+        
+        # Show what needs to be updated
+        if current_ver < latest_ver:
+            print(f"  Config version: {current_ver} → {latest_ver}")
+        
+        if missing_config:
+            print(f"\n  {len(missing_config)} new config option(s) will be added with defaults")
+        
+        required_missing = [v for v in missing_env if v.get("is_required")]
+        optional_missing = [v for v in missing_env if not v.get("is_required")]
+        
+        if required_missing:
+            print(f"\n  ⚠️  {len(required_missing)} required API key(s) missing:")
+            for var in required_missing:
+                print(f"     • {var['name']}")
+        
+        if optional_missing:
+            print(f"\n  ℹ️  {len(optional_missing)} optional API key(s) not configured:")
+            for var in optional_missing:
+                tools = var.get("tools", [])
+                tools_str = f" (enables: {', '.join(tools[:2])})" if tools else ""
+                print(f"     • {var['name']}{tools_str}")
+        
+        print()
+        
+        # Run migration
+        results = migrate_config(interactive=True, quiet=False)
+        
+        print()
+        if results["env_added"] or results["config_added"]:
+            print(color("✓ Configuration updated!", Colors.GREEN))
+        
+        if results["warnings"]:
+            print()
+            for warning in results["warnings"]:
+                print(color(f"  ⚠️  {warning}", Colors.YELLOW))
+        
+        print()
+    
+    elif subcmd == "check":
+        # Non-interactive check for what's missing
+        print()
+        print(color("📋 Configuration Status", Colors.CYAN, Colors.BOLD))
+        print()
+        
+        current_ver, latest_ver = check_config_version()
+        if current_ver >= latest_ver:
+            print(f"  Config version: {current_ver} ✓")
+        else:
+            print(color(f"  Config version: {current_ver} → {latest_ver} (update available)", Colors.YELLOW))
+        
+        print()
+        print(color("  Required:", Colors.BOLD))
+        for var_name in REQUIRED_ENV_VARS:
+            if get_env_value(var_name):
+                print(f"    ✓ {var_name}")
+            else:
+                print(color(f"    ✗ {var_name} (missing)", Colors.RED))
+        
+        print()
+        print(color("  Optional:", Colors.BOLD))
+        for var_name, info in OPTIONAL_ENV_VARS.items():
+            if get_env_value(var_name):
+                print(f"    ✓ {var_name}")
+            else:
+                tools = info.get("tools", [])
+                tools_str = f" → {', '.join(tools[:2])}" if tools else ""
+                print(color(f"    ○ {var_name}{tools_str}", Colors.DIM))
+        
+        missing_config = get_missing_config_fields()
+        if missing_config:
+            print()
+            print(color(f"  {len(missing_config)} new config option(s) available", Colors.YELLOW))
+            print(f"    Run 'hermes config migrate' to add them")
+        
+        print()
+    
+    else:
+        print(f"Unknown config command: {subcmd}")
+        print()
+        print("Available commands:")
+        print("  hermes config           Show current configuration")
+        print("  hermes config edit      Open config in editor")
+        print("  hermes config set K V   Set a config value")
+        print("  hermes config check     Check for missing/outdated config")
+        print("  hermes config migrate   Update config with new options")
+        print("  hermes config path      Show config file path")
+        print("  hermes config env-path  Show .env file path")
+        sys.exit(1)
--- a/hermes_cli/cron.py
+++ b/hermes_cli/cron.py
@@ -0,0 +1,131 @@
+"""
+Cron subcommand for hermes CLI.
+
+Handles: hermes cron [list|daemon|tick]
+"""
+
+import json
+import sys
+import time
+from pathlib import Path
+from datetime import datetime
+
+PROJECT_ROOT = Path(__file__).parent.parent.resolve()
+sys.path.insert(0, str(PROJECT_ROOT))
+
+# ANSI colors
+class Colors:
+    RESET = "\033[0m"
+    BOLD = "\033[1m"
+    DIM = "\033[2m"
+    RED = "\033[31m"
+    GREEN = "\033[32m"
+    YELLOW = "\033[33m"
+    CYAN = "\033[36m"
+
+def color(text: str, *codes) -> str:
+    if not sys.stdout.isatty():
+        return text
+    return "".join(codes) + text + Colors.RESET
+
+
+def cron_list(show_all: bool = False):
+    """List all scheduled jobs."""
+    from cron.jobs import list_jobs
+    
+    jobs = list_jobs(include_disabled=show_all)
+    
+    if not jobs:
+        print(color("No scheduled jobs.", Colors.DIM))
+        print(color("Create one with: hermes cron add <schedule> <prompt>", Colors.DIM))
+        return
+    
+    print()
+    print(color("┌─────────────────────────────────────────────────────────────────────────┐", Colors.CYAN))
+    print(color("│                         Scheduled Jobs                                  │", Colors.CYAN))
+    print(color("└─────────────────────────────────────────────────────────────────────────┘", Colors.CYAN))
+    print()
+    
+    for job in jobs:
+        job_id = job.get("id", "?")[:8]
+        name = job.get("name", "(unnamed)")
+        schedule = job.get("schedule_display", job.get("schedule", {}).get("value", "?"))
+        enabled = job.get("enabled", True)
+        next_run = job.get("next_run_at", "?")
+        
+        # Repeat info
+        repeat_info = job.get("repeat", {})
+        repeat_times = repeat_info.get("times")
+        repeat_completed = repeat_info.get("completed", 0)
+        
+        if repeat_times:
+            repeat_str = f"{repeat_completed}/{repeat_times}"
+        else:
+            repeat_str = "∞"
+        
+        # Delivery targets
+        deliver = job.get("deliver", ["local"])
+        if isinstance(deliver, str):
+            deliver = [deliver]
+        deliver_str = ", ".join(deliver)
+        
+        # Status indicator
+        if not enabled:
+            status = color("[disabled]", Colors.RED)
+        else:
+            status = color("[active]", Colors.GREEN)
+        
+        print(f"  {color(job_id, Colors.YELLOW)} {status}")
+        print(f"    Name:      {name}")
+        print(f"    Schedule:  {schedule}")
+        print(f"    Repeat:    {repeat_str}")
+        print(f"    Next run:  {next_run}")
+        print(f"    Deliver:   {deliver_str}")
+        print()
+
+
+def cron_daemon(interval: int = 60):
+    """Run the cron daemon."""
+    from cron.scheduler import start_daemon
+    
+    print(color("┌─────────────────────────────────────────────────────────┐", Colors.CYAN))
+    print(color("│              🦋 Hermes Cron Daemon                      │", Colors.CYAN))
+    print(color("├─────────────────────────────────────────────────────────┤", Colors.CYAN))
+    print(color("│  Press Ctrl+C to stop                                   │", Colors.CYAN))
+    print(color("└─────────────────────────────────────────────────────────┘", Colors.CYAN))
+    print()
+    
+    try:
+        start_daemon(interval=interval)
+    except KeyboardInterrupt:
+        print()
+        print(color("Cron daemon stopped.", Colors.YELLOW))
+
+
+def cron_tick():
+    """Run due jobs once (for system cron integration)."""
+    from cron.scheduler import tick
+    
+    print(f"[{datetime.now().isoformat()}] Running cron tick...")
+    tick()
+
+
+def cron_command(args):
+    """Handle cron subcommands."""
+    subcmd = getattr(args, 'cron_command', None)
+    
+    if subcmd is None or subcmd == "list":
+        show_all = getattr(args, 'all', False)
+        cron_list(show_all)
+    
+    elif subcmd == "daemon":
+        interval = getattr(args, 'interval', 60)
+        cron_daemon(interval)
+    
+    elif subcmd == "tick":
+        cron_tick()
+    
+    else:
+        print(f"Unknown cron command: {subcmd}")
+        print("Usage: hermes cron [list|daemon|tick]")
+        sys.exit(1)
--- a/hermes_cli/doctor.py
+++ b/hermes_cli/doctor.py
@@ -0,0 +1,316 @@
+"""
+Doctor command for hermes CLI.
+
+Diagnoses issues with Hermes Agent setup.
+"""
+
+import os
+import sys
+import subprocess
+import shutil
+from pathlib import Path
+
+PROJECT_ROOT = Path(__file__).parent.parent.resolve()
+
+# ANSI colors
+class Colors:
+    RESET = "\033[0m"
+    BOLD = "\033[1m"
+    DIM = "\033[2m"
+    RED = "\033[31m"
+    GREEN = "\033[32m"
+    YELLOW = "\033[33m"
+    CYAN = "\033[36m"
+
+def color(text: str, *codes) -> str:
+    if not sys.stdout.isatty():
+        return text
+    return "".join(codes) + text + Colors.RESET
+
+def check_ok(text: str, detail: str = ""):
+    print(f"  {color('✓', Colors.GREEN)} {text}" + (f" {color(detail, Colors.DIM)}" if detail else ""))
+
+def check_warn(text: str, detail: str = ""):
+    print(f"  {color('⚠', Colors.YELLOW)} {text}" + (f" {color(detail, Colors.DIM)}" if detail else ""))
+
+def check_fail(text: str, detail: str = ""):
+    print(f"  {color('✗', Colors.RED)} {text}" + (f" {color(detail, Colors.DIM)}" if detail else ""))
+
+def check_info(text: str):
+    print(f"    {color('→', Colors.CYAN)} {text}")
+
+
+def run_doctor(args):
+    """Run diagnostic checks."""
+    should_fix = getattr(args, 'fix', False)
+    
+    issues = []
+    
+    print()
+    print(color("┌─────────────────────────────────────────────────────────┐", Colors.CYAN))
+    print(color("│                 🩺 Hermes Doctor                        │", Colors.CYAN))
+    print(color("└─────────────────────────────────────────────────────────┘", Colors.CYAN))
+    
+    # =========================================================================
+    # Check: Python version
+    # =========================================================================
+    print()
+    print(color("◆ Python Environment", Colors.CYAN, Colors.BOLD))
+    
+    py_version = sys.version_info
+    if py_version >= (3, 10):
+        check_ok(f"Python {py_version.major}.{py_version.minor}.{py_version.micro}")
+    elif py_version >= (3, 8):
+        check_warn(f"Python {py_version.major}.{py_version.minor}.{py_version.micro}", "(3.10+ recommended)")
+    else:
+        check_fail(f"Python {py_version.major}.{py_version.minor}.{py_version.micro}", "(3.10+ required)")
+        issues.append("Upgrade Python to 3.10+")
+    
+    # Check if in virtual environment
+    in_venv = sys.prefix != sys.base_prefix
+    if in_venv:
+        check_ok("Virtual environment active")
+    else:
+        check_warn("Not in virtual environment", "(recommended)")
+    
+    # =========================================================================
+    # Check: Required packages
+    # =========================================================================
+    print()
+    print(color("◆ Required Packages", Colors.CYAN, Colors.BOLD))
+    
+    required_packages = [
+        ("openai", "OpenAI SDK"),
+        ("rich", "Rich (terminal UI)"),
+        ("dotenv", "python-dotenv"),
+        ("yaml", "PyYAML"),
+        ("httpx", "HTTPX"),
+    ]
+    
+    optional_packages = [
+        ("croniter", "Croniter (cron expressions)"),
+        ("browserbase", "Browserbase SDK"),
+        ("telegram", "python-telegram-bot"),
+        ("discord", "discord.py"),
+    ]
+    
+    for module, name in required_packages:
+        try:
+            __import__(module)
+            check_ok(name)
+        except ImportError:
+            check_fail(name, "(missing)")
+            issues.append(f"Install {name}: pip install {module}")
+    
+    for module, name in optional_packages:
+        try:
+            __import__(module)
+            check_ok(name, "(optional)")
+        except ImportError:
+            check_warn(name, "(optional, not installed)")
+    
+    # =========================================================================
+    # Check: Configuration files
+    # =========================================================================
+    print()
+    print(color("◆ Configuration Files", Colors.CYAN, Colors.BOLD))
+    
+    env_path = PROJECT_ROOT / '.env'
+    if env_path.exists():
+        check_ok(".env file exists")
+        
+        # Check for common issues
+        content = env_path.read_text()
+        if "OPENROUTER_API_KEY" in content or "ANTHROPIC_API_KEY" in content:
+            check_ok("API key configured")
+        else:
+            check_warn("No API key found in .env")
+            issues.append("Run 'hermes setup' to configure API keys")
+    else:
+        check_fail(".env file missing")
+        check_info("Run 'hermes setup' to create one")
+        issues.append("Run 'hermes setup' to create .env")
+    
+    config_path = PROJECT_ROOT / 'cli-config.yaml'
+    if config_path.exists():
+        check_ok("cli-config.yaml exists")
+    else:
+        check_warn("cli-config.yaml not found", "(using defaults)")
+    
+    # =========================================================================
+    # Check: Directory structure
+    # =========================================================================
+    print()
+    print(color("◆ Directory Structure", Colors.CYAN, Colors.BOLD))
+    
+    hermes_home = Path.home() / ".hermes"
+    if hermes_home.exists():
+        check_ok("~/.hermes directory exists")
+    else:
+        check_warn("~/.hermes not found", "(will be created on first use)")
+    
+    logs_dir = PROJECT_ROOT / "logs"
+    if logs_dir.exists():
+        check_ok("logs/ directory exists")
+    else:
+        check_warn("logs/ not found", "(will be created on first use)")
+    
+    # =========================================================================
+    # Check: External tools
+    # =========================================================================
+    print()
+    print(color("◆ External Tools", Colors.CYAN, Colors.BOLD))
+    
+    # Git
+    if shutil.which("git"):
+        check_ok("git")
+    else:
+        check_warn("git not found", "(optional)")
+    
+    # ripgrep (optional, for faster file search)
+    if shutil.which("rg"):
+        check_ok("ripgrep (rg)", "(faster file search)")
+    else:
+        check_warn("ripgrep (rg) not found", "(file search uses grep fallback)")
+        check_info("Install for faster search: sudo apt install ripgrep")
+    
+    # Docker (optional)
+    terminal_env = os.getenv("TERMINAL_ENV", "local")
+    if terminal_env == "docker":
+        if shutil.which("docker"):
+            # Check if docker daemon is running
+            result = subprocess.run(["docker", "info"], capture_output=True)
+            if result.returncode == 0:
+                check_ok("docker", "(daemon running)")
+            else:
+                check_fail("docker daemon not running")
+                issues.append("Start Docker daemon")
+        else:
+            check_fail("docker not found", "(required for TERMINAL_ENV=docker)")
+            issues.append("Install Docker or change TERMINAL_ENV")
+    else:
+        if shutil.which("docker"):
+            check_ok("docker", "(optional)")
+        else:
+            check_warn("docker not found", "(optional)")
+    
+    # SSH (if using ssh backend)
+    if terminal_env == "ssh":
+        ssh_host = os.getenv("TERMINAL_SSH_HOST")
+        if ssh_host:
+            # Try to connect
+            result = subprocess.run(
+                ["ssh", "-o", "ConnectTimeout=5", "-o", "BatchMode=yes", ssh_host, "echo ok"],
+                capture_output=True,
+                text=True
+            )
+            if result.returncode == 0:
+                check_ok(f"SSH connection to {ssh_host}")
+            else:
+                check_fail(f"SSH connection to {ssh_host}")
+                issues.append(f"Check SSH configuration for {ssh_host}")
+        else:
+            check_fail("TERMINAL_SSH_HOST not set", "(required for TERMINAL_ENV=ssh)")
+            issues.append("Set TERMINAL_SSH_HOST in .env")
+    
+    # =========================================================================
+    # Check: API connectivity
+    # =========================================================================
+    print()
+    print(color("◆ API Connectivity", Colors.CYAN, Colors.BOLD))
+    
+    openrouter_key = os.getenv("OPENROUTER_API_KEY")
+    if openrouter_key:
+        try:
+            import httpx
+            response = httpx.get(
+                "https://openrouter.ai/api/v1/models",
+                headers={"Authorization": f"Bearer {openrouter_key}"},
+                timeout=10
+            )
+            if response.status_code == 200:
+                check_ok("OpenRouter API")
+            elif response.status_code == 401:
+                check_fail("OpenRouter API", "(invalid API key)")
+                issues.append("Check OPENROUTER_API_KEY in .env")
+            else:
+                check_fail("OpenRouter API", f"(HTTP {response.status_code})")
+        except Exception as e:
+            check_fail("OpenRouter API", f"({e})")
+            issues.append("Check network connectivity")
+    else:
+        check_warn("OpenRouter API", "(not configured)")
+    
+    anthropic_key = os.getenv("ANTHROPIC_API_KEY")
+    if anthropic_key:
+        try:
+            import httpx
+            response = httpx.get(
+                "https://api.anthropic.com/v1/models",
+                headers={
+                    "x-api-key": anthropic_key,
+                    "anthropic-version": "2023-06-01"
+                },
+                timeout=10
+            )
+            if response.status_code == 200:
+                check_ok("Anthropic API")
+            elif response.status_code == 401:
+                check_fail("Anthropic API", "(invalid API key)")
+            else:
+                # Note: Anthropic may not have /models endpoint
+                check_warn("Anthropic API", "(couldn't verify)")
+        except Exception as e:
+            check_warn("Anthropic API", f"({e})")
+    
+    # =========================================================================
+    # Check: Tool Availability
+    # =========================================================================
+    print()
+    print(color("◆ Tool Availability", Colors.CYAN, Colors.BOLD))
+    
+    try:
+        # Add project root to path for imports
+        sys.path.insert(0, str(PROJECT_ROOT))
+        from model_tools import check_tool_availability, TOOLSET_REQUIREMENTS
+        
+        available, unavailable = check_tool_availability()
+        
+        for tid in available:
+            info = TOOLSET_REQUIREMENTS.get(tid, {})
+            check_ok(info.get("name", tid))
+        
+        for item in unavailable:
+            if item["missing_vars"]:
+                vars_str = ", ".join(item["missing_vars"])
+                check_warn(item["name"], f"(missing {vars_str})")
+            else:
+                check_warn(item["name"], "(system dependency not met)")
+        
+        # Count disabled tools with API key requirements
+        api_disabled = [u for u in unavailable if u["missing_vars"]]
+        if api_disabled:
+            issues.append("Run 'hermes setup' to configure missing API keys for full tool access")
+    except Exception as e:
+        check_warn("Could not check tool availability", f"({e})")
+    
+    # =========================================================================
+    # Summary
+    # =========================================================================
+    print()
+    if issues:
+        print(color("─" * 60, Colors.YELLOW))
+        print(color(f"  Found {len(issues)} issue(s) to address:", Colors.YELLOW, Colors.BOLD))
+        print()
+        for i, issue in enumerate(issues, 1):
+            print(f"  {i}. {issue}")
+        print()
+        
+        if should_fix:
+            print(color("  Attempting auto-fix is not yet implemented.", Colors.DIM))
+            print(color("  Please resolve issues manually.", Colors.DIM))
+    else:
+        print(color("─" * 60, Colors.GREEN))
+        print(color("  All checks passed! 🎉", Colors.GREEN, Colors.BOLD))
+    
+    print()
--- a/hermes_cli/gateway.py
+++ b/hermes_cli/gateway.py
@@ -0,0 +1,487 @@
+"""
+Gateway subcommand for hermes CLI.
+
+Handles: hermes gateway [run|start|stop|restart|status|install|uninstall]
+"""
+
+import asyncio
+import os
+import signal
+import subprocess
+import sys
+from pathlib import Path
+
+PROJECT_ROOT = Path(__file__).parent.parent.resolve()
+
+
+# =============================================================================
+# Process Management (for manual gateway runs)
+# =============================================================================
+
+def find_gateway_pids() -> list:
+    """Find PIDs of running gateway processes."""
+    pids = []
+    try:
+        # Look for gateway processes with multiple patterns
+        patterns = [
+            "hermes_cli.main gateway",
+            "hermes gateway",
+            "gateway/run.py",
+        ]
+        
+        result = subprocess.run(
+            ["ps", "aux"],
+            capture_output=True,
+            text=True
+        )
+        
+        for line in result.stdout.split('\n'):
+            # Skip grep and current process
+            if 'grep' in line or str(os.getpid()) in line:
+                continue
+            
+            for pattern in patterns:
+                if pattern in line:
+                    parts = line.split()
+                    if len(parts) > 1:
+                        try:
+                            pid = int(parts[1])
+                            if pid not in pids:
+                                pids.append(pid)
+                        except ValueError:
+                            continue
+                    break
+    except Exception:
+        pass
+    
+    return pids
+
+
+def kill_gateway_processes(force: bool = False) -> int:
+    """Kill any running gateway processes. Returns count killed."""
+    pids = find_gateway_pids()
+    killed = 0
+    
+    for pid in pids:
+        try:
+            if force:
+                os.kill(pid, signal.SIGKILL)
+            else:
+                os.kill(pid, signal.SIGTERM)
+            killed += 1
+        except ProcessLookupError:
+            # Process already gone
+            pass
+        except PermissionError:
+            print(f"⚠ Permission denied to kill PID {pid}")
+    
+    return killed
+
+
+def is_linux() -> bool:
+    return sys.platform.startswith('linux')
+
+def is_macos() -> bool:
+    return sys.platform == 'darwin'
+
+def is_windows() -> bool:
+    return sys.platform == 'win32'
+
+
+# =============================================================================
+# Service Configuration
+# =============================================================================
+
+SERVICE_NAME = "hermes-gateway"
+SERVICE_DESCRIPTION = "Hermes Agent Gateway - Messaging Platform Integration"
+
+def get_systemd_unit_path() -> Path:
+    return Path.home() / ".config" / "systemd" / "user" / f"{SERVICE_NAME}.service"
+
+def get_launchd_plist_path() -> Path:
+    return Path.home() / "Library" / "LaunchAgents" / "ai.hermes.gateway.plist"
+
+def get_python_path() -> str:
+    venv_python = PROJECT_ROOT / "venv" / "bin" / "python"
+    if venv_python.exists():
+        return str(venv_python)
+    return sys.executable
+
+def get_hermes_cli_path() -> str:
+    """Get the path to the hermes CLI."""
+    # Check if installed via pip
+    import shutil
+    hermes_bin = shutil.which("hermes")
+    if hermes_bin:
+        return hermes_bin
+    
+    # Fallback to direct module execution
+    return f"{get_python_path()} -m hermes_cli.main"
+
+
+# =============================================================================
+# Systemd (Linux)
+# =============================================================================
+
+def generate_systemd_unit() -> str:
+    python_path = get_python_path()
+    working_dir = str(PROJECT_ROOT)
+    
+    return f"""[Unit]
+Description={SERVICE_DESCRIPTION}
+After=network.target
+
+[Service]
+Type=simple
+ExecStart={python_path} -m hermes_cli.main gateway run
+WorkingDirectory={working_dir}
+Restart=on-failure
+RestartSec=10
+StandardOutput=journal
+StandardError=journal
+
+[Install]
+WantedBy=default.target
+"""
+
+def systemd_install(force: bool = False):
+    unit_path = get_systemd_unit_path()
+    
+    if unit_path.exists() and not force:
+        print(f"Service already installed at: {unit_path}")
+        print("Use --force to reinstall")
+        return
+    
+    unit_path.parent.mkdir(parents=True, exist_ok=True)
+    print(f"Installing systemd service to: {unit_path}")
+    unit_path.write_text(generate_systemd_unit())
+    
+    subprocess.run(["systemctl", "--user", "daemon-reload"], check=True)
+    subprocess.run(["systemctl", "--user", "enable", SERVICE_NAME], check=True)
+    
+    print()
+    print("✓ Service installed and enabled!")
+    print()
+    print("Next steps:")
+    print(f"  hermes gateway start              # Start the service")
+    print(f"  hermes gateway status             # Check status")
+    print(f"  journalctl --user -u {SERVICE_NAME} -f  # View logs")
+    print()
+    print("To enable lingering (keeps running after logout):")
+    print("  sudo loginctl enable-linger $USER")
+
+def systemd_uninstall():
+    subprocess.run(["systemctl", "--user", "stop", SERVICE_NAME], check=False)
+    subprocess.run(["systemctl", "--user", "disable", SERVICE_NAME], check=False)
+    
+    unit_path = get_systemd_unit_path()
+    if unit_path.exists():
+        unit_path.unlink()
+        print(f"✓ Removed {unit_path}")
+    
+    subprocess.run(["systemctl", "--user", "daemon-reload"], check=True)
+    print("✓ Service uninstalled")
+
+def systemd_start():
+    subprocess.run(["systemctl", "--user", "start", SERVICE_NAME], check=True)
+    print("✓ Service started")
+
+def systemd_stop():
+    subprocess.run(["systemctl", "--user", "stop", SERVICE_NAME], check=True)
+    print("✓ Service stopped")
+
+def systemd_restart():
+    subprocess.run(["systemctl", "--user", "restart", SERVICE_NAME], check=True)
+    print("✓ Service restarted")
+
+def systemd_status(deep: bool = False):
+    # Check if service unit file exists
+    unit_path = get_systemd_unit_path()
+    if not unit_path.exists():
+        print("✗ Gateway service is not installed")
+        print("  Run: hermes gateway install")
+        return
+    
+    # Show detailed status first
+    subprocess.run(
+        ["systemctl", "--user", "status", SERVICE_NAME, "--no-pager"],
+        capture_output=False
+    )
+    
+    # Check if service is active
+    result = subprocess.run(
+        ["systemctl", "--user", "is-active", SERVICE_NAME],
+        capture_output=True,
+        text=True
+    )
+    
+    status = result.stdout.strip()
+    
+    if status == "active":
+        print("✓ Gateway service is running")
+    else:
+        print("✗ Gateway service is stopped")
+        print("  Run: hermes gateway start")
+    
+    if deep:
+        print()
+        print("Recent logs:")
+        subprocess.run([
+            "journalctl", "--user", "-u", SERVICE_NAME,
+            "-n", "20", "--no-pager"
+        ])
+
+
+# =============================================================================
+# Launchd (macOS)
+# =============================================================================
+
+def generate_launchd_plist() -> str:
+    python_path = get_python_path()
+    working_dir = str(PROJECT_ROOT)
+    log_dir = Path.home() / ".hermes" / "logs"
+    log_dir.mkdir(parents=True, exist_ok=True)
+    
+    return f"""<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
+<plist version="1.0">
+<dict>
+    <key>Label</key>
+    <string>ai.hermes.gateway</string>
+    
+    <key>ProgramArguments</key>
+    <array>
+        <string>{python_path}</string>
+        <string>-m</string>
+        <string>hermes_cli.main</string>
+        <string>gateway</string>
+        <string>run</string>
+    </array>
+    
+    <key>WorkingDirectory</key>
+    <string>{working_dir}</string>
+    
+    <key>RunAtLoad</key>
+    <true/>
+    
+    <key>KeepAlive</key>
+    <dict>
+        <key>SuccessfulExit</key>
+        <false/>
+    </dict>
+    
+    <key>StandardOutPath</key>
+    <string>{log_dir}/gateway.log</string>
+    
+    <key>StandardErrorPath</key>
+    <string>{log_dir}/gateway.error.log</string>
+</dict>
+</plist>
+"""
+
+def launchd_install(force: bool = False):
+    plist_path = get_launchd_plist_path()
+    
+    if plist_path.exists() and not force:
+        print(f"Service already installed at: {plist_path}")
+        print("Use --force to reinstall")
+        return
+    
+    plist_path.parent.mkdir(parents=True, exist_ok=True)
+    print(f"Installing launchd service to: {plist_path}")
+    plist_path.write_text(generate_launchd_plist())
+    
+    subprocess.run(["launchctl", "load", str(plist_path)], check=True)
+    
+    print()
+    print("✓ Service installed and loaded!")
+    print()
+    print("Next steps:")
+    print("  hermes gateway status             # Check status")
+    print("  tail -f ~/.hermes/logs/gateway.log  # View logs")
+
+def launchd_uninstall():
+    plist_path = get_launchd_plist_path()
+    subprocess.run(["launchctl", "unload", str(plist_path)], check=False)
+    
+    if plist_path.exists():
+        plist_path.unlink()
+        print(f"✓ Removed {plist_path}")
+    
+    print("✓ Service uninstalled")
+
+def launchd_start():
+    subprocess.run(["launchctl", "start", "ai.hermes.gateway"], check=True)
+    print("✓ Service started")
+
+def launchd_stop():
+    subprocess.run(["launchctl", "stop", "ai.hermes.gateway"], check=True)
+    print("✓ Service stopped")
+
+def launchd_restart():
+    launchd_stop()
+    launchd_start()
+
+def launchd_status(deep: bool = False):
+    result = subprocess.run(
+        ["launchctl", "list", "ai.hermes.gateway"],
+        capture_output=True,
+        text=True
+    )
+    
+    if result.returncode == 0:
+        print("✓ Gateway service is loaded")
+        print(result.stdout)
+    else:
+        print("✗ Gateway service is not loaded")
+    
+    if deep:
+        log_file = Path.home() / ".hermes" / "logs" / "gateway.log"
+        if log_file.exists():
+            print()
+            print("Recent logs:")
+            subprocess.run(["tail", "-20", str(log_file)])
+
+
+# =============================================================================
+# Gateway Runner
+# =============================================================================
+
+def run_gateway(verbose: bool = False):
+    """Run the gateway in foreground."""
+    sys.path.insert(0, str(PROJECT_ROOT))
+    
+    from gateway.run import start_gateway
+    
+    print("┌─────────────────────────────────────────────────────────┐")
+    print("│           🦋 Hermes Gateway Starting...                 │")
+    print("├─────────────────────────────────────────────────────────┤")
+    print("│  Press Ctrl+C to stop                                   │")
+    print("└─────────────────────────────────────────────────────────┘")
+    print()
+    
+    asyncio.run(start_gateway())
+
+
+# =============================================================================
+# Main Command Handler
+# =============================================================================
+
+def gateway_command(args):
+    """Handle gateway subcommands."""
+    subcmd = getattr(args, 'gateway_command', None)
+    
+    # Default to run if no subcommand
+    if subcmd is None or subcmd == "run":
+        verbose = getattr(args, 'verbose', False)
+        run_gateway(verbose)
+        return
+    
+    # Service management commands
+    if subcmd == "install":
+        force = getattr(args, 'force', False)
+        if is_linux():
+            systemd_install(force)
+        elif is_macos():
+            launchd_install(force)
+        else:
+            print("Service installation not supported on this platform.")
+            print("Run manually: hermes gateway run")
+            sys.exit(1)
+    
+    elif subcmd == "uninstall":
+        if is_linux():
+            systemd_uninstall()
+        elif is_macos():
+            launchd_uninstall()
+        else:
+            print("Not supported on this platform.")
+            sys.exit(1)
+    
+    elif subcmd == "start":
+        if is_linux():
+            systemd_start()
+        elif is_macos():
+            launchd_start()
+        else:
+            print("Not supported on this platform.")
+            sys.exit(1)
+    
+    elif subcmd == "stop":
+        # Try service first, fall back to killing processes directly
+        service_available = False
+        
+        if is_linux() and get_systemd_unit_path().exists():
+            try:
+                systemd_stop()
+                service_available = True
+            except subprocess.CalledProcessError:
+                pass  # Fall through to process kill
+        elif is_macos() and get_launchd_plist_path().exists():
+            try:
+                launchd_stop()
+                service_available = True
+            except subprocess.CalledProcessError:
+                pass
+        
+        if not service_available:
+            # Kill gateway processes directly
+            killed = kill_gateway_processes()
+            if killed:
+                print(f"✓ Stopped {killed} gateway process(es)")
+            else:
+                print("✗ No gateway processes found")
+    
+    elif subcmd == "restart":
+        # Try service first, fall back to killing and restarting
+        service_available = False
+        
+        if is_linux() and get_systemd_unit_path().exists():
+            try:
+                systemd_restart()
+                service_available = True
+            except subprocess.CalledProcessError:
+                pass
+        elif is_macos() and get_launchd_plist_path().exists():
+            try:
+                launchd_restart()
+                service_available = True
+            except subprocess.CalledProcessError:
+                pass
+        
+        if not service_available:
+            # Manual restart: kill existing processes
+            killed = kill_gateway_processes()
+            if killed:
+                print(f"✓ Stopped {killed} gateway process(es)")
+            
+            import time
+            time.sleep(2)
+            
+            # Start fresh
+            print("Starting gateway...")
+            run_gateway(verbose=False)
+    
+    elif subcmd == "status":
+        deep = getattr(args, 'deep', False)
+        
+        # Check for service first
+        if is_linux() and get_systemd_unit_path().exists():
+            systemd_status(deep)
+        elif is_macos() and get_launchd_plist_path().exists():
+            launchd_status(deep)
+        else:
+            # Check for manually running processes
+            pids = find_gateway_pids()
+            if pids:
+                print(f"✓ Gateway is running (PID: {', '.join(map(str, pids))})")
+                print("  (Running manually, not as a system service)")
+                print()
+                print("To install as a service:")
+                print("  hermes gateway install")
+            else:
+                print("✗ Gateway is not running")
+                print()
+                print("To start:")
+                print("  hermes gateway          # Run in foreground")
+                print("  hermes gateway install  # Install as service")
--- a/hermes_cli/main.py
+++ b/hermes_cli/main.py
@@ -0,0 +1,507 @@
+#!/usr/bin/env python3
+"""
+Hermes CLI - Main entry point.
+
+Usage:
+    hermes                     # Interactive chat (default)
+    hermes chat                # Interactive chat
+    hermes gateway             # Run gateway in foreground
+    hermes gateway start       # Start gateway as service
+    hermes gateway stop        # Stop gateway service
+    hermes gateway status      # Show gateway status
+    hermes gateway install     # Install gateway service
+    hermes gateway uninstall   # Uninstall gateway service
+    hermes setup               # Interactive setup wizard
+    hermes status              # Show status of all components
+    hermes cron                # Manage cron jobs
+    hermes cron list           # List cron jobs
+    hermes cron daemon         # Run cron daemon
+    hermes doctor              # Check configuration and dependencies
+    hermes version             # Show version
+    hermes update              # Update to latest version
+    hermes uninstall           # Uninstall Hermes Agent
+"""
+
+import argparse
+import os
+import sys
+from pathlib import Path
+
+# Add project root to path
+PROJECT_ROOT = Path(__file__).parent.parent.resolve()
+sys.path.insert(0, str(PROJECT_ROOT))
+
+# Load .env file
+from dotenv import load_dotenv
+env_path = PROJECT_ROOT / '.env'
+if env_path.exists():
+    load_dotenv(dotenv_path=env_path)
+
+from hermes_cli import __version__
+
+
+def cmd_chat(args):
+    """Run interactive chat CLI."""
+    # Import and run the CLI
+    from cli import main as cli_main
+    
+    # Build kwargs from args
+    kwargs = {
+        "model": args.model,
+        "toolsets": args.toolsets,
+        "verbose": args.verbose,
+        "query": args.query,
+    }
+    # Filter out None values
+    kwargs = {k: v for k, v in kwargs.items() if v is not None}
+    
+    cli_main(**kwargs)
+
+
+def cmd_gateway(args):
+    """Gateway management commands."""
+    from hermes_cli.gateway import gateway_command
+    gateway_command(args)
+
+
+def cmd_setup(args):
+    """Interactive setup wizard."""
+    from hermes_cli.setup import run_setup_wizard
+    run_setup_wizard(args)
+
+
+def cmd_status(args):
+    """Show status of all components."""
+    from hermes_cli.status import show_status
+    show_status(args)
+
+
+def cmd_cron(args):
+    """Cron job management."""
+    from hermes_cli.cron import cron_command
+    cron_command(args)
+
+
+def cmd_doctor(args):
+    """Check configuration and dependencies."""
+    from hermes_cli.doctor import run_doctor
+    run_doctor(args)
+
+
+def cmd_config(args):
+    """Configuration management."""
+    from hermes_cli.config import config_command
+    config_command(args)
+
+
+def cmd_version(args):
+    """Show version."""
+    print(f"Hermes Agent v{__version__}")
+    print(f"Project: {PROJECT_ROOT}")
+    
+    # Show Python version
+    print(f"Python: {sys.version.split()[0]}")
+    
+    # Check for key dependencies
+    try:
+        import openai
+        print(f"OpenAI SDK: {openai.__version__}")
+    except ImportError:
+        print("OpenAI SDK: Not installed")
+
+
+def cmd_uninstall(args):
+    """Uninstall Hermes Agent."""
+    from hermes_cli.uninstall import run_uninstall
+    run_uninstall(args)
+
+
+def cmd_update(args):
+    """Update Hermes Agent to the latest version."""
+    import subprocess
+    
+    print("🦋 Updating Hermes Agent...")
+    print()
+    
+    # Check if we're in a git repo
+    git_dir = PROJECT_ROOT / '.git'
+    if not git_dir.exists():
+        print("✗ Not a git repository. Please reinstall:")
+        print("  curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash")
+        sys.exit(1)
+    
+    # Fetch and pull
+    try:
+        print("→ Fetching updates...")
+        subprocess.run(["git", "fetch", "origin"], cwd=PROJECT_ROOT, check=True)
+        
+        # Get current branch
+        result = subprocess.run(
+            ["git", "rev-parse", "--abbrev-ref", "HEAD"],
+            cwd=PROJECT_ROOT,
+            capture_output=True,
+            text=True,
+            check=True
+        )
+        branch = result.stdout.strip()
+        
+        # Check if there are updates
+        result = subprocess.run(
+            ["git", "rev-list", f"HEAD..origin/{branch}", "--count"],
+            cwd=PROJECT_ROOT,
+            capture_output=True,
+            text=True,
+            check=True
+        )
+        commit_count = int(result.stdout.strip())
+        
+        if commit_count == 0:
+            print("✓ Already up to date!")
+            return
+        
+        print(f"→ Found {commit_count} new commit(s)")
+        print("→ Pulling updates...")
+        subprocess.run(["git", "pull", "origin", branch], cwd=PROJECT_ROOT, check=True)
+        
+        # Reinstall Python dependencies
+        print("→ Updating Python dependencies...")
+        venv_pip = PROJECT_ROOT / "venv" / "bin" / "pip"
+        if venv_pip.exists():
+            subprocess.run([str(venv_pip), "install", "-e", ".", "--quiet"], cwd=PROJECT_ROOT, check=True)
+        else:
+            subprocess.run(["pip", "install", "-e", ".", "--quiet"], cwd=PROJECT_ROOT, check=True)
+        
+        # Check for Node.js deps
+        if (PROJECT_ROOT / "package.json").exists():
+            import shutil
+            if shutil.which("npm"):
+                print("→ Updating Node.js dependencies...")
+                subprocess.run(["npm", "install", "--silent"], cwd=PROJECT_ROOT, check=False)
+        
+        print()
+        print("✓ Code updated!")
+        
+        # Check for config migrations
+        print()
+        print("→ Checking configuration for new options...")
+        
+        from hermes_cli.config import (
+            get_missing_env_vars, get_missing_config_fields, 
+            check_config_version, migrate_config
+        )
+        
+        missing_env = get_missing_env_vars(required_only=True)
+        missing_config = get_missing_config_fields()
+        current_ver, latest_ver = check_config_version()
+        
+        needs_migration = missing_env or missing_config or current_ver < latest_ver
+        
+        if needs_migration:
+            print()
+            if missing_env:
+                print(f"  ⚠️  {len(missing_env)} new required setting(s) need configuration")
+            if missing_config:
+                print(f"  ℹ️  {len(missing_config)} new config option(s) available")
+            
+            print()
+            response = input("Would you like to configure them now? [Y/n]: ").strip().lower()
+            
+            if response in ('', 'y', 'yes'):
+                print()
+                results = migrate_config(interactive=True, quiet=False)
+                
+                if results["env_added"] or results["config_added"]:
+                    print()
+                    print("✓ Configuration updated!")
+            else:
+                print()
+                print("Skipped. Run 'hermes config migrate' later to configure.")
+        else:
+            print("  ✓ Configuration is up to date")
+        
+        print()
+        print("✓ Update complete!")
+        print()
+        print("Note: If you have the gateway service running, restart it:")
+        print("  hermes gateway restart")
+        
+    except subprocess.CalledProcessError as e:
+        print(f"✗ Update failed: {e}")
+        sys.exit(1)
+
+
+def main():
+    """Main entry point for hermes CLI."""
+    parser = argparse.ArgumentParser(
+        prog="hermes",
+        description="Hermes Agent - AI assistant with tool-calling capabilities",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Examples:
+    hermes                        Start interactive chat
+    hermes chat -q "Hello"        Single query mode
+    hermes setup                  Run setup wizard
+    hermes config                 View configuration
+    hermes config edit            Edit config in $EDITOR
+    hermes config set model gpt-4 Set a config value
+    hermes gateway                Run messaging gateway
+    hermes gateway install        Install as system service
+    hermes update                 Update to latest version
+
+For more help on a command:
+    hermes <command> --help
+"""
+    )
+    
+    parser.add_argument(
+        "--version", "-V",
+        action="store_true",
+        help="Show version and exit"
+    )
+    
+    subparsers = parser.add_subparsers(dest="command", help="Command to run")
+    
+    # =========================================================================
+    # chat command
+    # =========================================================================
+    chat_parser = subparsers.add_parser(
+        "chat",
+        help="Interactive chat with the agent",
+        description="Start an interactive chat session with Hermes Agent"
+    )
+    chat_parser.add_argument(
+        "-q", "--query",
+        help="Single query (non-interactive mode)"
+    )
+    chat_parser.add_argument(
+        "-m", "--model",
+        help="Model to use (e.g., anthropic/claude-sonnet-4)"
+    )
+    chat_parser.add_argument(
+        "-t", "--toolsets",
+        help="Comma-separated toolsets to enable"
+    )
+    chat_parser.add_argument(
+        "-v", "--verbose",
+        action="store_true",
+        help="Verbose output"
+    )
+    chat_parser.set_defaults(func=cmd_chat)
+    
+    # =========================================================================
+    # gateway command
+    # =========================================================================
+    gateway_parser = subparsers.add_parser(
+        "gateway",
+        help="Messaging gateway management",
+        description="Manage the messaging gateway (Telegram, Discord, WhatsApp)"
+    )
+    gateway_subparsers = gateway_parser.add_subparsers(dest="gateway_command")
+    
+    # gateway run (default)
+    gateway_run = gateway_subparsers.add_parser("run", help="Run gateway in foreground")
+    gateway_run.add_argument("-v", "--verbose", action="store_true")
+    
+    # gateway start
+    gateway_start = gateway_subparsers.add_parser("start", help="Start gateway service")
+    
+    # gateway stop
+    gateway_stop = gateway_subparsers.add_parser("stop", help="Stop gateway service")
+    
+    # gateway restart
+    gateway_restart = gateway_subparsers.add_parser("restart", help="Restart gateway service")
+    
+    # gateway status
+    gateway_status = gateway_subparsers.add_parser("status", help="Show gateway status")
+    gateway_status.add_argument("--deep", action="store_true", help="Deep status check")
+    
+    # gateway install
+    gateway_install = gateway_subparsers.add_parser("install", help="Install gateway as service")
+    gateway_install.add_argument("--force", action="store_true", help="Force reinstall")
+    
+    # gateway uninstall
+    gateway_uninstall = gateway_subparsers.add_parser("uninstall", help="Uninstall gateway service")
+    
+    gateway_parser.set_defaults(func=cmd_gateway)
+    
+    # =========================================================================
+    # setup command
+    # =========================================================================
+    setup_parser = subparsers.add_parser(
+        "setup",
+        help="Interactive setup wizard",
+        description="Configure Hermes Agent with an interactive wizard"
+    )
+    setup_parser.add_argument(
+        "--non-interactive",
+        action="store_true",
+        help="Non-interactive mode (use defaults/env vars)"
+    )
+    setup_parser.add_argument(
+        "--reset",
+        action="store_true",
+        help="Reset configuration to defaults"
+    )
+    setup_parser.set_defaults(func=cmd_setup)
+    
+    # =========================================================================
+    # status command
+    # =========================================================================
+    status_parser = subparsers.add_parser(
+        "status",
+        help="Show status of all components",
+        description="Display status of Hermes Agent components"
+    )
+    status_parser.add_argument(
+        "--all",
+        action="store_true",
+        help="Show all details (redacted for sharing)"
+    )
+    status_parser.add_argument(
+        "--deep",
+        action="store_true",
+        help="Run deep checks (may take longer)"
+    )
+    status_parser.set_defaults(func=cmd_status)
+    
+    # =========================================================================
+    # cron command
+    # =========================================================================
+    cron_parser = subparsers.add_parser(
+        "cron",
+        help="Cron job management",
+        description="Manage scheduled tasks"
+    )
+    cron_subparsers = cron_parser.add_subparsers(dest="cron_command")
+    
+    # cron list
+    cron_list = cron_subparsers.add_parser("list", help="List scheduled jobs")
+    cron_list.add_argument("--all", action="store_true", help="Include disabled jobs")
+    
+    # cron daemon
+    cron_daemon = cron_subparsers.add_parser("daemon", help="Run cron daemon")
+    cron_daemon.add_argument("--interval", type=int, default=60, help="Check interval in seconds")
+    
+    # cron tick
+    cron_tick = cron_subparsers.add_parser("tick", help="Run due jobs once (for system cron)")
+    
+    cron_parser.set_defaults(func=cmd_cron)
+    
+    # =========================================================================
+    # doctor command
+    # =========================================================================
+    doctor_parser = subparsers.add_parser(
+        "doctor",
+        help="Check configuration and dependencies",
+        description="Diagnose issues with Hermes Agent setup"
+    )
+    doctor_parser.add_argument(
+        "--fix",
+        action="store_true",
+        help="Attempt to fix issues automatically"
+    )
+    doctor_parser.set_defaults(func=cmd_doctor)
+    
+    # =========================================================================
+    # config command
+    # =========================================================================
+    config_parser = subparsers.add_parser(
+        "config",
+        help="View and edit configuration",
+        description="Manage Hermes Agent configuration"
+    )
+    config_subparsers = config_parser.add_subparsers(dest="config_command")
+    
+    # config show (default)
+    config_show = config_subparsers.add_parser("show", help="Show current configuration")
+    
+    # config edit
+    config_edit = config_subparsers.add_parser("edit", help="Open config file in editor")
+    
+    # config set
+    config_set = config_subparsers.add_parser("set", help="Set a configuration value")
+    config_set.add_argument("key", nargs="?", help="Configuration key (e.g., model, terminal.backend)")
+    config_set.add_argument("value", nargs="?", help="Value to set")
+    
+    # config path
+    config_path = config_subparsers.add_parser("path", help="Print config file path")
+    
+    # config env-path
+    config_env = config_subparsers.add_parser("env-path", help="Print .env file path")
+    
+    # config check
+    config_check = config_subparsers.add_parser("check", help="Check for missing/outdated config")
+    
+    # config migrate
+    config_migrate = config_subparsers.add_parser("migrate", help="Update config with new options")
+    
+    config_parser.set_defaults(func=cmd_config)
+    
+    # =========================================================================
+    # version command
+    # =========================================================================
+    version_parser = subparsers.add_parser(
+        "version",
+        help="Show version information"
+    )
+    version_parser.set_defaults(func=cmd_version)
+    
+    # =========================================================================
+    # update command
+    # =========================================================================
+    update_parser = subparsers.add_parser(
+        "update",
+        help="Update Hermes Agent to the latest version",
+        description="Pull the latest changes from git and reinstall dependencies"
+    )
+    update_parser.set_defaults(func=cmd_update)
+    
+    # =========================================================================
+    # uninstall command
+    # =========================================================================
+    uninstall_parser = subparsers.add_parser(
+        "uninstall",
+        help="Uninstall Hermes Agent",
+        description="Remove Hermes Agent from your system. Can keep configs/data for reinstall."
+    )
+    uninstall_parser.add_argument(
+        "--full",
+        action="store_true",
+        help="Full uninstall - remove everything including configs and data"
+    )
+    uninstall_parser.add_argument(
+        "--yes", "-y",
+        action="store_true",
+        help="Skip confirmation prompts"
+    )
+    uninstall_parser.set_defaults(func=cmd_uninstall)
+    
+    # =========================================================================
+    # Parse and execute
+    # =========================================================================
+    args = parser.parse_args()
+    
+    # Handle --version flag
+    if args.version:
+        cmd_version(args)
+        return
+    
+    # Default to chat if no command specified
+    if args.command is None:
+        # No command = run chat
+        args.query = None
+        args.model = None
+        args.toolsets = None
+        args.verbose = False
+        cmd_chat(args)
+        return
+    
+    # Execute the command
+    if hasattr(args, 'func'):
+        args.func(args)
+    else:
+        parser.print_help()
+
+
+if __name__ == "__main__":
+    main()
--- a/hermes_cli/setup.py
+++ b/hermes_cli/setup.py
@@ -0,0 +1,989 @@
+"""
+Interactive setup wizard for Hermes Agent.
+
+Guides users through:
+1. Installation directory confirmation
+2. API key configuration
+3. Model selection  
+4. Terminal backend selection
+5. Messaging platform setup
+6. Optional features
+
+Config files are stored in ~/.hermes/ for easy access.
+"""
+
+import os
+import sys
+from pathlib import Path
+from typing import Optional, Dict, Any
+
+PROJECT_ROOT = Path(__file__).parent.parent.resolve()
+
+# Import config helpers
+from hermes_cli.config import (
+    get_hermes_home, get_config_path, get_env_path,
+    load_config, save_config, save_env_value, get_env_value,
+    ensure_hermes_home, DEFAULT_CONFIG
+)
+
+# ANSI colors
+class Colors:
+    RESET = "\033[0m"
+    BOLD = "\033[1m"
+    DIM = "\033[2m"
+    RED = "\033[31m"
+    GREEN = "\033[32m"
+    YELLOW = "\033[33m"
+    BLUE = "\033[34m"
+    MAGENTA = "\033[35m"
+    CYAN = "\033[36m"
+
+def color(text: str, *codes) -> str:
+    """Apply color codes to text."""
+    if not sys.stdout.isatty():
+        return text
+    return "".join(codes) + text + Colors.RESET
+
+def print_header(title: str):
+    """Print a section header."""
+    print()
+    print(color(f"◆ {title}", Colors.CYAN, Colors.BOLD))
+
+def print_info(text: str):
+    """Print info text."""
+    print(color(f"  {text}", Colors.DIM))
+
+def print_success(text: str):
+    """Print success message."""
+    print(color(f"✓ {text}", Colors.GREEN))
+
+def print_warning(text: str):
+    """Print warning message."""
+    print(color(f"⚠ {text}", Colors.YELLOW))
+
+def print_error(text: str):
+    """Print error message."""
+    print(color(f"✗ {text}", Colors.RED))
+
+def prompt(question: str, default: str = None, password: bool = False) -> str:
+    """Prompt for input with optional default."""
+    if default:
+        display = f"{question} [{default}]: "
+    else:
+        display = f"{question}: "
+    
+    try:
+        if password:
+            import getpass
+            value = getpass.getpass(color(display, Colors.YELLOW))
+        else:
+            value = input(color(display, Colors.YELLOW))
+        
+        return value.strip() or default or ""
+    except (KeyboardInterrupt, EOFError):
+        print()
+        sys.exit(1)
+
+def prompt_choice(question: str, choices: list, default: int = 0) -> int:
+    """Prompt for a choice from a list with arrow key navigation."""
+    print(color(question, Colors.YELLOW))
+    
+    # Try to use interactive menu if available
+    try:
+        from simple_term_menu import TerminalMenu
+        
+        # Add visual indicators
+        menu_choices = [f"  {choice}" for choice in choices]
+        
+        terminal_menu = TerminalMenu(
+            menu_choices,
+            cursor_index=default,
+            menu_cursor="→ ",
+            menu_cursor_style=("fg_green", "bold"),
+            menu_highlight_style=("fg_green",),
+            cycle_cursor=True,
+            clear_screen=False,
+        )
+        
+        idx = terminal_menu.show()
+        if idx is None:  # User pressed Escape or Ctrl+C
+            print()
+            sys.exit(1)
+        print()  # Add newline after selection
+        return idx
+        
+    except ImportError:
+        # Fallback to number-based selection
+        for i, choice in enumerate(choices):
+            marker = "●" if i == default else "○"
+            if i == default:
+                print(color(f"  {marker} {choice}", Colors.GREEN))
+            else:
+                print(f"  {marker} {choice}")
+        
+        while True:
+            try:
+                value = input(color(f"  Select [1-{len(choices)}] ({default + 1}): ", Colors.DIM))
+                if not value:
+                    return default
+                idx = int(value) - 1
+                if 0 <= idx < len(choices):
+                    return idx
+                print_error(f"Please enter a number between 1 and {len(choices)}")
+            except ValueError:
+                print_error("Please enter a number")
+            except (KeyboardInterrupt, EOFError):
+                print()
+                sys.exit(1)
+
+def prompt_yes_no(question: str, default: bool = True) -> bool:
+    """Prompt for yes/no."""
+    default_str = "Y/n" if default else "y/N"
+    
+    while True:
+        value = input(color(f"{question} [{default_str}]: ", Colors.YELLOW)).strip().lower()
+        
+        if not value:
+            return default
+        if value in ('y', 'yes'):
+            return True
+        if value in ('n', 'no'):
+            return False
+        print_error("Please enter 'y' or 'n'")
+
+
+def _print_setup_summary(config: dict, hermes_home):
+    """Print the setup completion summary."""
+    # Tool availability summary
+    print()
+    print_header("Tool Availability Summary")
+    
+    tool_status = []
+    
+    # OpenRouter (required for vision, moa)
+    if get_env_value('OPENROUTER_API_KEY'):
+        tool_status.append(("Vision (image analysis)", True, None))
+        tool_status.append(("Mixture of Agents", True, None))
+    else:
+        tool_status.append(("Vision (image analysis)", False, "OPENROUTER_API_KEY"))
+        tool_status.append(("Mixture of Agents", False, "OPENROUTER_API_KEY"))
+    
+    # Firecrawl (web tools)
+    if get_env_value('FIRECRAWL_API_KEY'):
+        tool_status.append(("Web Search & Extract", True, None))
+    else:
+        tool_status.append(("Web Search & Extract", False, "FIRECRAWL_API_KEY"))
+    
+    # Browserbase (browser tools)
+    if get_env_value('BROWSERBASE_API_KEY'):
+        tool_status.append(("Browser Automation", True, None))
+    else:
+        tool_status.append(("Browser Automation", False, "BROWSERBASE_API_KEY"))
+    
+    # FAL (image generation)
+    if get_env_value('FAL_KEY'):
+        tool_status.append(("Image Generation", True, None))
+    else:
+        tool_status.append(("Image Generation", False, "FAL_KEY"))
+    
+    # Tinker + WandB (RL training)
+    if get_env_value('TINKER_API_KEY') and get_env_value('WANDB_API_KEY'):
+        tool_status.append(("RL Training (Tinker)", True, None))
+    elif get_env_value('TINKER_API_KEY'):
+        tool_status.append(("RL Training (Tinker)", False, "WANDB_API_KEY"))
+    else:
+        tool_status.append(("RL Training (Tinker)", False, "TINKER_API_KEY"))
+    
+    # Terminal (always available if system deps met)
+    tool_status.append(("Terminal/Commands", True, None))
+    
+    # Skills (always available if skills dir exists)
+    tool_status.append(("Skills Knowledge Base", True, None))
+    
+    # Print status
+    available_count = sum(1 for _, avail, _ in tool_status if avail)
+    total_count = len(tool_status)
+    
+    print_info(f"{available_count}/{total_count} tool categories available:")
+    print()
+    
+    for name, available, missing_var in tool_status:
+        if available:
+            print(f"   {color('✓', Colors.GREEN)} {name}")
+        else:
+            print(f"   {color('✗', Colors.RED)} {name} {color(f'(missing {missing_var})', Colors.DIM)}")
+    
+    print()
+    
+    disabled_tools = [(name, var) for name, avail, var in tool_status if not avail]
+    if disabled_tools:
+        print_warning("Some tools are disabled. Run 'hermes setup' again to configure them,")
+        print_warning("or edit ~/.hermes/.env directly to add the missing API keys.")
+        print()
+    
+    # Done banner
+    print()
+    print(color("┌─────────────────────────────────────────────────────────┐", Colors.GREEN))
+    print(color("│              ✓ Setup Complete!                          │", Colors.GREEN))
+    print(color("└─────────────────────────────────────────────────────────┘", Colors.GREEN))
+    print()
+    
+    # Show file locations prominently
+    print(color("📁 All your files are in ~/.hermes/:", Colors.CYAN, Colors.BOLD))
+    print()
+    print(f"   {color('Settings:', Colors.YELLOW)}  {get_config_path()}")
+    print(f"   {color('API Keys:', Colors.YELLOW)}  {get_env_path()}")
+    print(f"   {color('Data:', Colors.YELLOW)}      {hermes_home}/cron/, sessions/, logs/")
+    print()
+    
+    print(color("─" * 60, Colors.DIM))
+    print()
+    print(color("📝 To edit your configuration:", Colors.CYAN, Colors.BOLD))
+    print()
+    print(f"   {color('hermes config', Colors.GREEN)}        View current settings")
+    print(f"   {color('hermes config edit', Colors.GREEN)}   Open config in your editor")
+    print(f"   {color('hermes config set KEY VALUE', Colors.GREEN)}")
+    print(f"                         Set a specific value")
+    print()
+    print(f"   Or edit the files directly:")
+    print(f"   {color(f'nano {get_config_path()}', Colors.DIM)}")
+    print(f"   {color(f'nano {get_env_path()}', Colors.DIM)}")
+    print()
+    
+    print(color("─" * 60, Colors.DIM))
+    print()
+    print(color("🚀 Ready to go!", Colors.CYAN, Colors.BOLD))
+    print()
+    print(f"   {color('hermes', Colors.GREEN)}              Start chatting")
+    print(f"   {color('hermes gateway', Colors.GREEN)}      Start messaging gateway")
+    print(f"   {color('hermes doctor', Colors.GREEN)}       Check for issues")
+    print()
+
+
+def run_setup_wizard(args):
+    """Run the interactive setup wizard."""
+    ensure_hermes_home()
+    
+    config = load_config()
+    hermes_home = get_hermes_home()
+    
+    # Check if this is an existing installation with config
+    is_existing = get_env_value("OPENROUTER_API_KEY") is not None or get_config_path().exists()
+    
+    # Import migration helpers
+    from hermes_cli.config import (
+        get_missing_env_vars, get_missing_config_fields,
+        check_config_version, migrate_config,
+        REQUIRED_ENV_VARS, OPTIONAL_ENV_VARS
+    )
+    
+    # Check what's missing
+    missing_required = [v for v in get_missing_env_vars(required_only=False) if v.get("is_required")]
+    missing_optional = [v for v in get_missing_env_vars(required_only=False) if not v.get("is_required")]
+    missing_config = get_missing_config_fields()
+    current_ver, latest_ver = check_config_version()
+    
+    has_missing = missing_required or missing_optional or missing_config or current_ver < latest_ver
+    
+    print()
+    print(color("┌─────────────────────────────────────────────────────────┐", Colors.MAGENTA))
+    print(color("│             🦋 Hermes Agent Setup Wizard                │", Colors.MAGENTA))
+    print(color("├─────────────────────────────────────────────────────────┤", Colors.MAGENTA))
+    print(color("│  Let's configure your Hermes Agent installation.       │", Colors.MAGENTA))
+    print(color("│  Press Ctrl+C at any time to exit.                     │", Colors.MAGENTA))
+    print(color("└─────────────────────────────────────────────────────────┘", Colors.MAGENTA))
+    
+    # If existing installation, show what's missing and offer quick mode
+    quick_mode = False
+    if is_existing and has_missing:
+        print()
+        print_header("Existing Installation Detected")
+        print_success("You already have Hermes configured!")
+        print()
+        
+        if missing_required:
+            print_warning(f"  {len(missing_required)} required setting(s) missing:")
+            for var in missing_required:
+                print(f"     • {var['name']}")
+        
+        if missing_optional:
+            print_info(f"  {len(missing_optional)} optional tool(s) not configured:")
+            for var in missing_optional[:3]:  # Show first 3
+                tools = var.get("tools", [])
+                tools_str = f" → {', '.join(tools[:2])}" if tools else ""
+                print(f"     • {var['name']}{tools_str}")
+            if len(missing_optional) > 3:
+                print(f"     • ...and {len(missing_optional) - 3} more")
+        
+        if missing_config:
+            print_info(f"  {len(missing_config)} new config option(s) available")
+        
+        print()
+        
+        setup_choices = [
+            "Quick setup - just configure missing items",
+            "Full setup - reconfigure everything",
+            "Skip - exit setup"
+        ]
+        
+        choice = prompt_choice("What would you like to do?", setup_choices, 0)
+        
+        if choice == 0:
+            quick_mode = True
+        elif choice == 2:
+            print()
+            print_info("Exiting. Run 'hermes setup' again when ready.")
+            return
+        # choice == 1 continues with full setup
+        
+    elif is_existing and not has_missing:
+        print()
+        print_header("Configuration Status")
+        print_success("Your configuration is complete!")
+        print()
+        
+        if not prompt_yes_no("Would you like to reconfigure anyway?", False):
+            print()
+            print_info("Exiting. Your configuration is already set up.")
+            print_info(f"Config: {get_config_path()}")
+            print_info(f"Secrets: {get_env_path()}")
+            return
+    
+    # Quick mode: only configure missing items
+    if quick_mode:
+        print()
+        print_header("Quick Setup - Missing Items Only")
+        
+        # Handle missing required env vars
+        if missing_required:
+            for var in missing_required:
+                print()
+                print(color(f"  {var['name']}", Colors.CYAN))
+                print_info(f"  {var.get('description', '')}")
+                if var.get("url"):
+                    print_info(f"  Get key at: {var['url']}")
+                
+                if var.get("password"):
+                    value = prompt(f"  {var.get('prompt', var['name'])}", password=True)
+                else:
+                    value = prompt(f"  {var.get('prompt', var['name'])}")
+                
+                if value:
+                    save_env_value(var["name"], value)
+                    print_success(f"  Saved {var['name']}")
+                else:
+                    print_warning(f"  Skipped {var['name']}")
+        
+        # Handle missing optional env vars
+        if missing_optional:
+            print()
+            print_header("Optional Tools (Quick Setup)")
+            
+            for var in missing_optional:
+                tools = var.get("tools", [])
+                tools_str = f" (enables: {', '.join(tools[:2])})" if tools else ""
+                
+                if prompt_yes_no(f"Configure {var['name']}{tools_str}?", False):
+                    if var.get("url"):
+                        print_info(f"  Get key at: {var['url']}")
+                    
+                    if var.get("password"):
+                        value = prompt(f"  {var.get('prompt', var['name'])}", password=True)
+                    else:
+                        value = prompt(f"  {var.get('prompt', var['name'])}")
+                    
+                    if value:
+                        save_env_value(var["name"], value)
+                        print_success(f"  Saved")
+        
+        # Handle missing config fields
+        if missing_config:
+            print()
+            print_info(f"Adding {len(missing_config)} new config option(s) with defaults...")
+            for field in missing_config:
+                print_success(f"  Added {field['key']} = {field['default']}")
+            
+            # Update config version
+            config["_config_version"] = latest_ver
+            save_config(config)
+        
+        # Jump to summary
+        _print_setup_summary(config, hermes_home)
+        return
+    
+    # =========================================================================
+    # Step 0: Show paths (full setup)
+    # =========================================================================
+    print_header("Configuration Location")
+    print_info(f"Config file:  {get_config_path()}")
+    print_info(f"Secrets file: {get_env_path()}")
+    print_info(f"Data folder:  {hermes_home}")
+    print_info(f"Install dir:  {PROJECT_ROOT}")
+    print()
+    print_info("You can edit these files directly or use 'hermes config edit'")
+    
+    # =========================================================================
+    # Step 1: OpenRouter API Key (Required for tools)
+    # =========================================================================
+    print_header("OpenRouter API Key (Required)")
+    print_info("OpenRouter is used for vision, web scraping, and tool operations")
+    print_info("even if you use a custom endpoint for your main agent.")
+    print_info("Get your API key at: https://openrouter.ai/keys")
+    
+    existing_or = get_env_value("OPENROUTER_API_KEY")
+    if existing_or:
+        print_info(f"Current: {existing_or[:8]}... (configured)")
+        if prompt_yes_no("Update OpenRouter API key?", False):
+            api_key = prompt("  OpenRouter API key", password=True)
+            if api_key:
+                save_env_value("OPENROUTER_API_KEY", api_key)
+                print_success("OpenRouter API key updated")
+    else:
+        api_key = prompt("  OpenRouter API key", password=True)
+        if api_key:
+            save_env_value("OPENROUTER_API_KEY", api_key)
+            print_success("OpenRouter API key saved")
+        else:
+            print_warning("Skipped - some tools (vision, web scraping) won't work without this")
+    
+    # =========================================================================
+    # Step 2: Main Agent Provider
+    # =========================================================================
+    print_header("Main Agent Provider")
+    print_info("Choose how to connect to your main chat model.")
+    
+    existing_custom = get_env_value("OPENAI_BASE_URL")
+    
+    provider_choices = [
+        "OpenRouter (use same key for agent - recommended)",
+        "Custom OpenAI-compatible endpoint (separate from OpenRouter)",
+        f"Keep current" + (f" ({existing_custom})" if existing_custom else " (OpenRouter)")
+    ]
+    
+    provider_idx = prompt_choice("Select your main agent provider:", provider_choices, 2)
+    
+    if provider_idx == 0:  # OpenRouter for agent too
+        # Clear any custom endpoint - will use OpenRouter
+        if existing_custom:
+            save_env_value("OPENAI_BASE_URL", "")
+            save_env_value("OPENAI_API_KEY", "")
+        print_success("Agent will use OpenRouter")
+    
+    elif provider_idx == 1:  # Custom endpoint
+        print_info("Custom OpenAI-Compatible Endpoint Configuration:")
+        print_info("Works with any API that follows OpenAI's chat completions spec")
+        
+        # Show current values if set
+        current_url = get_env_value("OPENAI_BASE_URL") or ""
+        current_key = get_env_value("OPENAI_API_KEY")
+        current_model = config.get('model', '')
+        
+        if current_url:
+            print_info(f"  Current URL: {current_url}")
+        if current_key:
+            print_info(f"  Current key: {current_key[:8]}... (configured)")
+        
+        base_url = prompt("  API base URL (e.g., https://api.example.com/v1)", current_url)
+        api_key = prompt("  API key", password=True)
+        model_name = prompt("  Model name (e.g., gpt-4, claude-3-opus)", current_model)
+        
+        if base_url:
+            save_env_value("OPENAI_BASE_URL", base_url)
+        if api_key:
+            save_env_value("OPENAI_API_KEY", api_key)
+        if model_name:
+            config['model'] = model_name
+        print_success("Custom endpoint configured")
+    # else: Keep current (provider_idx == 2)
+    
+    # =========================================================================
+    # Step 3: Model Selection
+    # =========================================================================
+    print_header("Default Model")
+    
+    current_model = config.get('model', 'anthropic/claude-sonnet-4')
+    print_info(f"Current: {current_model}")
+    
+    model_choices = [
+        "anthropic/claude-sonnet-4.5 (recommended)",
+        "anthropic/claude-opus-4.5",
+        "openai/gpt-5.2",
+        "openai/gpt-5.2-codex",
+        "google/gemini-3-pro-preview",
+        "google/gemini-3-flash-preview",
+        "z-ai/glm-4.7",
+        "moonshotai/kimi-k2.5",
+        "minimax/minimax-m2.1",
+        "Custom model",
+        f"Keep current ({current_model})"
+    ]
+    
+    model_idx = prompt_choice("Select default model:", model_choices, 10)  # Default: keep current
+    
+    model_map = {
+        0: "anthropic/claude-sonnet-4.5",
+        1: "anthropic/claude-opus-4.5",
+        2: "openai/gpt-5.2",
+        3: "openai/gpt-5.2-codex",
+        4: "google/gemini-3-pro-preview",
+        5: "google/gemini-3-flash-preview",
+        6: "z-ai/glm-4.7",
+        7: "moonshotai/kimi-k2.5",
+        8: "minimax/minimax-m2.1",
+    }
+    
+    if model_idx in model_map:
+        config['model'] = model_map[model_idx]
+    elif model_idx == 9:  # Custom
+        custom = prompt("Enter model name (e.g., anthropic/claude-sonnet-4.5)")
+        if custom:
+            config['model'] = custom
+    # else: Keep current (model_idx == 10)
+    
+    # =========================================================================
+    # Step 4: Terminal Backend
+    # =========================================================================
+    print_header("Terminal Backend")
+    print_info("The terminal tool allows the agent to run commands.")
+    
+    current_backend = config.get('terminal', {}).get('backend', 'local')
+    print_info(f"Current: {current_backend}")
+    
+    # Detect platform for backend availability
+    import platform
+    is_linux = platform.system() == "Linux"
+    is_macos = platform.system() == "Darwin"
+    is_windows = platform.system() == "Windows"
+    
+    # Build choices based on platform
+    terminal_choices = [
+        "Local (run commands on this machine - no isolation)",
+        "Docker (isolated containers - recommended for security)",
+    ]
+    
+    # Singularity/Apptainer is Linux-only (HPC)
+    if is_linux:
+        terminal_choices.append("Singularity/Apptainer (HPC clusters, shared compute)")
+    
+    terminal_choices.extend([
+        "Modal (cloud execution, GPU access, serverless)",
+        "SSH (run commands on a remote server)",
+        f"Keep current ({current_backend})"
+    ])
+    
+    # Build index map based on available choices
+    if is_linux:
+        backend_to_idx = {'local': 0, 'docker': 1, 'singularity': 2, 'modal': 3, 'ssh': 4}
+        idx_to_backend = {0: 'local', 1: 'docker', 2: 'singularity', 3: 'modal', 4: 'ssh'}
+        keep_current_idx = 5
+    else:
+        backend_to_idx = {'local': 0, 'docker': 1, 'modal': 2, 'ssh': 3}
+        idx_to_backend = {0: 'local', 1: 'docker', 2: 'modal', 3: 'ssh'}
+        keep_current_idx = 4
+        if current_backend == 'singularity':
+            print_warning("Singularity is only available on Linux - please select a different backend")
+    
+    # Default based on current
+    default_terminal = backend_to_idx.get(current_backend, 0)
+    
+    terminal_idx = prompt_choice("Select terminal backend:", terminal_choices, keep_current_idx)
+    
+    # Map index to backend name (handles platform differences)
+    selected_backend = idx_to_backend.get(terminal_idx)
+    
+    if selected_backend == 'local':
+        config.setdefault('terminal', {})['backend'] = 'local'
+        print_info("Local Execution Configuration:")
+        print_info("Commands run directly on this machine (no isolation)")
+        
+        if is_windows:
+            print_info("Note: On Windows, commands run via cmd.exe or PowerShell")
+        
+        # Messaging working directory configuration
+        print_info("")
+        print_info("Working Directory for Messaging (Telegram/Discord/etc):")
+        print_info("  The CLI always uses the directory you run 'hermes' from")
+        print_info("  But messaging bots need a static starting directory")
+        
+        current_cwd = get_env_value('MESSAGING_CWD') or str(Path.home())
+        print_info(f"  Current: {current_cwd}")
+        
+        cwd_input = prompt("  Messaging working directory", current_cwd)
+        # Expand ~ to full path
+        if cwd_input.startswith('~'):
+            cwd_expanded = str(Path.home()) + cwd_input[1:]
+        else:
+            cwd_expanded = cwd_input
+        save_env_value("MESSAGING_CWD", cwd_expanded)
+        
+        if prompt_yes_no("  Enable sudo support? (allows agent to run sudo commands)", False):
+            print_warning("  SECURITY WARNING: Sudo password will be stored in plaintext")
+            sudo_pass = prompt("  Sudo password (leave empty to skip)", password=True)
+            if sudo_pass:
+                save_env_value("SUDO_PASSWORD", sudo_pass)
+                print_success("  Sudo password saved")
+        
+        print_success("Terminal set to local")
+    
+    elif selected_backend == 'docker':
+        config.setdefault('terminal', {})['backend'] = 'docker'
+        default_docker = config.get('terminal', {}).get('docker_image', 'nikolaik/python-nodejs:python3.11-nodejs20')
+        print_info("Docker Configuration:")
+        if is_macos:
+            print_info("Requires Docker Desktop for Mac")
+        elif is_windows:
+            print_info("Requires Docker Desktop for Windows")
+        docker_image = prompt("  Docker image", default_docker)
+        config['terminal']['docker_image'] = docker_image
+        print_success("Terminal set to Docker")
+    
+    elif selected_backend == 'singularity':
+        config.setdefault('terminal', {})['backend'] = 'singularity'
+        default_singularity = config.get('terminal', {}).get('singularity_image', 'docker://nikolaik/python-nodejs:python3.11-nodejs20')
+        print_info("Singularity/Apptainer Configuration:")
+        print_info("Requires apptainer or singularity to be installed")
+        singularity_image = prompt("  Image (docker:// prefix for Docker Hub)", default_singularity)
+        config['terminal']['singularity_image'] = singularity_image
+        print_success("Terminal set to Singularity/Apptainer")
+    
+    elif selected_backend == 'modal':
+        config.setdefault('terminal', {})['backend'] = 'modal'
+        default_modal = config.get('terminal', {}).get('modal_image', 'nikolaik/python-nodejs:python3.11-nodejs20')
+        print_info("Modal Cloud Configuration:")
+        print_info("Get credentials at: https://modal.com/settings")
+        
+        # Always show current status and allow reconfiguration
+        current_token = get_env_value('MODAL_TOKEN_ID')
+        if current_token:
+            print_info(f"  Token ID: {current_token[:8]}... (configured)")
+        
+        modal_image = prompt("  Container image", default_modal)
+        config['terminal']['modal_image'] = modal_image
+        
+        token_id = prompt("  Modal token ID", current_token or "")
+        token_secret = prompt("  Modal token secret", password=True)
+        
+        if token_id:
+            save_env_value("MODAL_TOKEN_ID", token_id)
+        if token_secret:
+            save_env_value("MODAL_TOKEN_SECRET", token_secret)
+        
+        print_success("Terminal set to Modal")
+    
+    elif selected_backend == 'ssh':
+        config.setdefault('terminal', {})['backend'] = 'ssh'
+        print_info("SSH Remote Execution Configuration:")
+        print_info("Commands will run on a remote server over SSH")
+        
+        current_host = get_env_value('TERMINAL_SSH_HOST') or ''
+        current_user = get_env_value('TERMINAL_SSH_USER') or os.getenv("USER", "")
+        current_port = get_env_value('TERMINAL_SSH_PORT') or '22'
+        current_key = get_env_value('TERMINAL_SSH_KEY') or '~/.ssh/id_rsa'
+        
+        if current_host:
+            print_info(f"  Current host: {current_user}@{current_host}:{current_port}")
+        
+        ssh_host = prompt("  SSH host", current_host)
+        ssh_user = prompt("  SSH user", current_user)
+        ssh_port = prompt("  SSH port", current_port)
+        ssh_key = prompt("  SSH key path (or leave empty for ssh-agent)", current_key)
+        
+        if ssh_host:
+            save_env_value("TERMINAL_SSH_HOST", ssh_host)
+        if ssh_user:
+            save_env_value("TERMINAL_SSH_USER", ssh_user)
+        if ssh_port and ssh_port != '22':
+            save_env_value("TERMINAL_SSH_PORT", ssh_port)
+        if ssh_key:
+            save_env_value("TERMINAL_SSH_KEY", ssh_key)
+        
+        print_success("Terminal set to SSH")
+    # else: Keep current (selected_backend is None)
+    
+    # =========================================================================
+    # Step 5: Agent Settings
+    # =========================================================================
+    print_header("Agent Settings")
+    
+    # Max iterations
+    current_max = get_env_value('HERMES_MAX_ITERATIONS') or '60'
+    print_info("Maximum tool-calling iterations per conversation.")
+    print_info("Higher = more complex tasks, but costs more tokens.")
+    print_info("Recommended: 30-60 for most tasks, 100+ for open exploration.")
+    
+    max_iter_str = prompt("Max iterations", current_max)
+    try:
+        max_iter = int(max_iter_str)
+        if max_iter > 0:
+            save_env_value("HERMES_MAX_ITERATIONS", str(max_iter))
+            config['max_turns'] = max_iter
+            print_success(f"Max iterations set to {max_iter}")
+    except ValueError:
+        print_warning("Invalid number, keeping current value")
+    
+    # Tool progress notifications (for messaging)
+    print_info("")
+    print_info("Tool Progress Notifications (Messaging only)")
+    print_info("Send status messages when the agent uses tools.")
+    print_info("Example: '💻 ls -la...' or '🔍 web_search...'")
+    
+    current_progress = get_env_value('HERMES_TOOL_PROGRESS') or 'false'
+    if prompt_yes_no("Enable tool progress messages?", current_progress.lower() in ('1', 'true', 'yes')):
+        save_env_value("HERMES_TOOL_PROGRESS", "true")
+        
+        # Progress mode
+        current_mode = get_env_value('HERMES_TOOL_PROGRESS_MODE') or 'new'
+        print_info("  Mode options:")
+        print_info("    'new' - Only when switching tools (less spam)")
+        print_info("    'all' - Every tool call")
+        mode = prompt("  Progress mode", current_mode)
+        if mode.lower() in ('all', 'new'):
+            save_env_value("HERMES_TOOL_PROGRESS_MODE", mode.lower())
+        print_success("Tool progress enabled")
+    else:
+        save_env_value("HERMES_TOOL_PROGRESS", "false")
+    
+    # =========================================================================
+    # Step 6: Context Compression
+    # =========================================================================
+    print_header("Context Compression")
+    print_info("Automatically summarize old messages when context gets too long.")
+    
+    compression = config.get('compression', {})
+    current_enabled = compression.get('enabled', True)
+    
+    if prompt_yes_no(f"Enable context compression?", current_enabled):
+        config.setdefault('compression', {})['enabled'] = True
+        
+        current_threshold = compression.get('threshold', 0.85)
+        threshold_str = prompt(f"Compression threshold (0.5-0.95)", str(current_threshold))
+        try:
+            threshold = float(threshold_str)
+            if 0.5 <= threshold <= 0.95:
+                config['compression']['threshold'] = threshold
+        except ValueError:
+            pass
+        
+        print_success("Context compression enabled")
+    else:
+        config.setdefault('compression', {})['enabled'] = False
+    
+    # =========================================================================
+    # Step 7: Messaging Platforms (Optional)
+    # =========================================================================
+    print_header("Messaging Platforms (Optional)")
+    print_info("Connect to messaging platforms to chat with Hermes from anywhere.")
+    
+    # Telegram
+    existing_telegram = get_env_value('TELEGRAM_BOT_TOKEN')
+    if existing_telegram:
+        print_info("Telegram: already configured")
+        if prompt_yes_no("Reconfigure Telegram?", False):
+            existing_telegram = None
+    
+    if not existing_telegram and prompt_yes_no("Set up Telegram bot?", False):
+        print_info("Create a bot via @BotFather on Telegram")
+        token = prompt("Telegram bot token", password=True)
+        if token:
+            save_env_value("TELEGRAM_BOT_TOKEN", token)
+            print_success("Telegram token saved")
+            
+            # Allowed users (security)
+            print()
+            print_info("🔒 Security: Restrict who can use your bot")
+            print_info("   To find your Telegram user ID:")
+            print_info("   1. Message @userinfobot on Telegram")
+            print_info("   2. It will reply with your numeric ID (e.g., 123456789)")
+            print()
+            allowed_users = prompt("Allowed user IDs (comma-separated, leave empty for open access)")
+            if allowed_users:
+                save_env_value("TELEGRAM_ALLOWED_USERS", allowed_users.replace(" ", ""))
+                print_success("Telegram allowlist configured - only listed users can use the bot")
+            else:
+                print_info("⚠️  No allowlist set - anyone who finds your bot can use it!")
+            
+            home_channel = prompt("Home channel ID (optional, for cron delivery)")
+            if home_channel:
+                save_env_value("TELEGRAM_HOME_CHANNEL", home_channel)
+    
+    # Check/update existing Telegram allowlist
+    elif existing_telegram:
+        existing_allowlist = get_env_value('TELEGRAM_ALLOWED_USERS')
+        if not existing_allowlist:
+            print_info("⚠️  Telegram has no user allowlist - anyone can use your bot!")
+            if prompt_yes_no("Add allowed users now?", True):
+                print_info("   To find your Telegram user ID: message @userinfobot")
+                allowed_users = prompt("Allowed user IDs (comma-separated)")
+                if allowed_users:
+                    save_env_value("TELEGRAM_ALLOWED_USERS", allowed_users.replace(" ", ""))
+                    print_success("Telegram allowlist configured")
+    
+    # Discord
+    existing_discord = get_env_value('DISCORD_BOT_TOKEN')
+    if existing_discord:
+        print_info("Discord: already configured")
+        if prompt_yes_no("Reconfigure Discord?", False):
+            existing_discord = None
+    
+    if not existing_discord and prompt_yes_no("Set up Discord bot?", False):
+        print_info("Create a bot at https://discord.com/developers/applications")
+        token = prompt("Discord bot token", password=True)
+        if token:
+            save_env_value("DISCORD_BOT_TOKEN", token)
+            print_success("Discord token saved")
+            
+            # Allowed users (security)
+            print()
+            print_info("🔒 Security: Restrict who can use your bot")
+            print_info("   To find your Discord user ID:")
+            print_info("   1. Enable Developer Mode in Discord settings")
+            print_info("   2. Right-click your name → Copy ID")
+            print()
+            allowed_users = prompt("Allowed user IDs (comma-separated, leave empty for open access)")
+            if allowed_users:
+                save_env_value("DISCORD_ALLOWED_USERS", allowed_users.replace(" ", ""))
+                print_success("Discord allowlist configured")
+            else:
+                print_info("⚠️  No allowlist set - anyone in servers with your bot can use it!")
+            
+            home_channel = prompt("Home channel ID (optional, for cron delivery)")
+            if home_channel:
+                save_env_value("DISCORD_HOME_CHANNEL", home_channel)
+    
+    # Check/update existing Discord allowlist
+    elif existing_discord:
+        existing_allowlist = get_env_value('DISCORD_ALLOWED_USERS')
+        if not existing_allowlist:
+            print_info("⚠️  Discord has no user allowlist - anyone can use your bot!")
+            if prompt_yes_no("Add allowed users now?", True):
+                print_info("   To find Discord ID: Enable Developer Mode, right-click name → Copy ID")
+                allowed_users = prompt("Allowed user IDs (comma-separated)")
+                if allowed_users:
+                    save_env_value("DISCORD_ALLOWED_USERS", allowed_users.replace(" ", ""))
+                    print_success("Discord allowlist configured")
+    
+    # =========================================================================
+    # Step 8: Additional Tools (Optional)
+    # =========================================================================
+    print_header("Additional Tools (Optional)")
+    print_info("These tools extend the agent's capabilities.")
+    print_info("Without their API keys, the corresponding features will be disabled.")
+    print()
+    
+    # Firecrawl - Web scraping
+    print_info("─" * 50)
+    print(color("  Web Search & Scraping (Firecrawl)", Colors.CYAN))
+    print_info("  Enables: web_search, web_extract tools")
+    print_info("  Use case: Search the web, read webpage content")
+    if get_env_value('FIRECRAWL_API_KEY'):
+        print_success("  Status: Configured ✓")
+        if prompt_yes_no("  Update Firecrawl API key?", False):
+            api_key = prompt("    API key", password=True)
+            if api_key:
+                save_env_value("FIRECRAWL_API_KEY", api_key)
+                print_success("    Updated")
+    else:
+        print_warning("  Status: Not configured (tools will be disabled)")
+        if prompt_yes_no("  Set up Firecrawl?", False):
+            print_info("    Get your API key at: https://firecrawl.dev/")
+            api_key = prompt("    API key", password=True)
+            if api_key:
+                save_env_value("FIRECRAWL_API_KEY", api_key)
+                print_success("    Configured ✓")
+    print()
+    
+    # Browserbase - Browser automation
+    print_info("─" * 50)
+    print(color("  Browser Automation (Browserbase)", Colors.CYAN))
+    print_info("  Enables: browser_navigate, browser_click, etc.")
+    print_info("  Use case: Interact with web pages, fill forms, screenshots")
+    if get_env_value('BROWSERBASE_API_KEY'):
+        print_success("  Status: Configured ✓")
+        if prompt_yes_no("  Update Browserbase credentials?", False):
+            api_key = prompt("    API key", password=True)
+            project_id = prompt("    Project ID")
+            if api_key:
+                save_env_value("BROWSERBASE_API_KEY", api_key)
+            if project_id:
+                save_env_value("BROWSERBASE_PROJECT_ID", project_id)
+            print_success("    Updated")
+    else:
+        print_warning("  Status: Not configured (tools will be disabled)")
+        if prompt_yes_no("  Set up Browserbase?", False):
+            print_info("    Get credentials at: https://browserbase.com/")
+            api_key = prompt("    API key", password=True)
+            project_id = prompt("    Project ID")
+            if api_key:
+                save_env_value("BROWSERBASE_API_KEY", api_key)
+            if project_id:
+                save_env_value("BROWSERBASE_PROJECT_ID", project_id)
+            print_success("    Configured ✓")
+    print()
+    
+    # FAL - Image generation
+    print_info("─" * 50)
+    print(color("  Image Generation (FAL)", Colors.CYAN))
+    print_info("  Enables: image_generate tool")
+    print_info("  Use case: Generate images from text prompts (FLUX)")
+    if get_env_value('FAL_KEY'):
+        print_success("  Status: Configured ✓")
+        if prompt_yes_no("  Update FAL API key?", False):
+            api_key = prompt("    API key", password=True)
+            if api_key:
+                save_env_value("FAL_KEY", api_key)
+                print_success("    Updated")
+    else:
+        print_warning("  Status: Not configured (tool will be disabled)")
+        if prompt_yes_no("  Set up FAL?", False):
+            print_info("    Get your API key at: https://fal.ai/")
+            api_key = prompt("    API key", password=True)
+            if api_key:
+                save_env_value("FAL_KEY", api_key)
+                print_success("    Configured ✓")
+    print()
+    
+    # Tinker + WandB - RL Training
+    print_info("─" * 50)
+    print(color("  RL Training (Tinker + WandB)", Colors.CYAN))
+    print_info("  Enables: rl_start_training, rl_check_status, rl_get_results tools")
+    print_info("  Use case: Run reinforcement learning training via Tinker API")
+    tinker_configured = get_env_value('TINKER_API_KEY')
+    wandb_configured = get_env_value('WANDB_API_KEY')
+    
+    if tinker_configured and wandb_configured:
+        print_success("  Status: Configured ✓")
+        if prompt_yes_no("  Update RL training credentials?", False):
+            api_key = prompt("    Tinker API key", password=True)
+            if api_key:
+                save_env_value("TINKER_API_KEY", api_key)
+            wandb_key = prompt("    WandB API key", password=True)
+            if wandb_key:
+                save_env_value("WANDB_API_KEY", wandb_key)
+            print_success("    Updated")
+    else:
+        if tinker_configured:
+            print_warning("  Status: Tinker configured, WandB missing")
+        elif wandb_configured:
+            print_warning("  Status: WandB configured, Tinker missing")
+        else:
+            print_warning("  Status: Not configured (tools will be disabled)")
+        
+        if prompt_yes_no("  Set up RL Training?", False):
+            print_info("    Get Tinker key at: https://tinker-console.thinkingmachines.ai/keys")
+            print_info("    Get WandB key at: https://wandb.ai/authorize")
+            api_key = prompt("    Tinker API key", password=True)
+            if api_key:
+                save_env_value("TINKER_API_KEY", api_key)
+            wandb_key = prompt("    WandB API key", password=True)
+            if wandb_key:
+                save_env_value("WANDB_API_KEY", wandb_key)
+            if api_key and wandb_key:
+                print_success("    Configured ✓")
+            else:
+                print_warning("    Partially configured (both keys required)")
+    
+    # =========================================================================
+    # Save config and show summary
+    # =========================================================================
+    save_config(config)
+    _print_setup_summary(config, hermes_home)
--- a/hermes_cli/status.py
+++ b/hermes_cli/status.py
@@ -0,0 +1,241 @@
+"""
+Status command for hermes CLI.
+
+Shows the status of all Hermes Agent components.
+"""
+
+import os
+import sys
+import subprocess
+from pathlib import Path
+
+PROJECT_ROOT = Path(__file__).parent.parent.resolve()
+
+# ANSI colors
+class Colors:
+    RESET = "\033[0m"
+    BOLD = "\033[1m"
+    DIM = "\033[2m"
+    RED = "\033[31m"
+    GREEN = "\033[32m"
+    YELLOW = "\033[33m"
+    CYAN = "\033[36m"
+
+def color(text: str, *codes) -> str:
+    if not sys.stdout.isatty():
+        return text
+    return "".join(codes) + text + Colors.RESET
+
+def check_mark(ok: bool) -> str:
+    if ok:
+        return color("✓", Colors.GREEN)
+    return color("✗", Colors.RED)
+
+def redact_key(key: str) -> str:
+    """Redact an API key for display."""
+    if not key:
+        return "(not set)"
+    if len(key) < 12:
+        return "***"
+    return key[:4] + "..." + key[-4:]
+
+
+def show_status(args):
+    """Show status of all Hermes Agent components."""
+    show_all = getattr(args, 'all', False)
+    deep = getattr(args, 'deep', False)
+    
+    print()
+    print(color("┌─────────────────────────────────────────────────────────┐", Colors.CYAN))
+    print(color("│                 🦋 Hermes Agent Status                  │", Colors.CYAN))
+    print(color("└─────────────────────────────────────────────────────────┘", Colors.CYAN))
+    
+    # =========================================================================
+    # Environment
+    # =========================================================================
+    print()
+    print(color("◆ Environment", Colors.CYAN, Colors.BOLD))
+    print(f"  Project:      {PROJECT_ROOT}")
+    print(f"  Python:       {sys.version.split()[0]}")
+    
+    env_path = PROJECT_ROOT / '.env'
+    print(f"  .env file:    {check_mark(env_path.exists())} {'exists' if env_path.exists() else 'not found'}")
+    
+    # =========================================================================
+    # API Keys
+    # =========================================================================
+    print()
+    print(color("◆ API Keys", Colors.CYAN, Colors.BOLD))
+    
+    keys = {
+        "OpenRouter": "OPENROUTER_API_KEY",
+        "Anthropic": "ANTHROPIC_API_KEY", 
+        "OpenAI": "OPENAI_API_KEY",
+        "Firecrawl": "FIRECRAWL_API_KEY",
+        "Browserbase": "BROWSERBASE_API_KEY",
+        "FAL": "FAL_KEY",
+        "Tinker": "TINKER_API_KEY",
+        "WandB": "WANDB_API_KEY",
+    }
+    
+    for name, env_var in keys.items():
+        value = os.getenv(env_var, "")
+        has_key = bool(value)
+        display = redact_key(value) if not show_all else value
+        print(f"  {name:<12}  {check_mark(has_key)} {display}")
+    
+    # =========================================================================
+    # Terminal Configuration
+    # =========================================================================
+    print()
+    print(color("◆ Terminal Backend", Colors.CYAN, Colors.BOLD))
+    
+    terminal_env = os.getenv("TERMINAL_ENV", "local")
+    print(f"  Backend:      {terminal_env}")
+    
+    if terminal_env == "ssh":
+        ssh_host = os.getenv("TERMINAL_SSH_HOST", "")
+        ssh_user = os.getenv("TERMINAL_SSH_USER", "")
+        print(f"  SSH Host:     {ssh_host or '(not set)'}")
+        print(f"  SSH User:     {ssh_user or '(not set)'}")
+    elif terminal_env == "docker":
+        docker_image = os.getenv("TERMINAL_DOCKER_IMAGE", "python:3.11-slim")
+        print(f"  Docker Image: {docker_image}")
+    
+    sudo_password = os.getenv("SUDO_PASSWORD", "")
+    print(f"  Sudo:         {check_mark(bool(sudo_password))} {'enabled' if sudo_password else 'disabled'}")
+    
+    # =========================================================================
+    # Messaging Platforms
+    # =========================================================================
+    print()
+    print(color("◆ Messaging Platforms", Colors.CYAN, Colors.BOLD))
+    
+    platforms = {
+        "Telegram": ("TELEGRAM_BOT_TOKEN", "TELEGRAM_HOME_CHANNEL"),
+        "Discord": ("DISCORD_BOT_TOKEN", "DISCORD_HOME_CHANNEL"),
+        "WhatsApp": ("WHATSAPP_ENABLED", None),
+    }
+    
+    for name, (token_var, home_var) in platforms.items():
+        token = os.getenv(token_var, "")
+        has_token = bool(token)
+        
+        home_channel = ""
+        if home_var:
+            home_channel = os.getenv(home_var, "")
+        
+        status = "configured" if has_token else "not configured"
+        if home_channel:
+            status += f" (home: {home_channel})"
+        
+        print(f"  {name:<12}  {check_mark(has_token)} {status}")
+    
+    # =========================================================================
+    # Gateway Status
+    # =========================================================================
+    print()
+    print(color("◆ Gateway Service", Colors.CYAN, Colors.BOLD))
+    
+    if sys.platform.startswith('linux'):
+        result = subprocess.run(
+            ["systemctl", "--user", "is-active", "hermes-gateway"],
+            capture_output=True,
+            text=True
+        )
+        is_active = result.stdout.strip() == "active"
+        print(f"  Status:       {check_mark(is_active)} {'running' if is_active else 'stopped'}")
+        print(f"  Manager:      systemd (user)")
+        
+    elif sys.platform == 'darwin':
+        result = subprocess.run(
+            ["launchctl", "list", "ai.hermes.gateway"],
+            capture_output=True,
+            text=True
+        )
+        is_loaded = result.returncode == 0
+        print(f"  Status:       {check_mark(is_loaded)} {'loaded' if is_loaded else 'not loaded'}")
+        print(f"  Manager:      launchd")
+    else:
+        print(f"  Status:       {color('N/A', Colors.DIM)}")
+        print(f"  Manager:      (not supported on this platform)")
+    
+    # =========================================================================
+    # Cron Jobs
+    # =========================================================================
+    print()
+    print(color("◆ Scheduled Jobs", Colors.CYAN, Colors.BOLD))
+    
+    jobs_file = Path.home() / ".hermes" / "cron" / "jobs.json"
+    if jobs_file.exists():
+        import json
+        try:
+            with open(jobs_file) as f:
+                data = json.load(f)
+                jobs = data.get("jobs", [])
+                enabled_jobs = [j for j in jobs if j.get("enabled", True)]
+                print(f"  Jobs:         {len(enabled_jobs)} active, {len(jobs)} total")
+        except:
+            print(f"  Jobs:         (error reading jobs file)")
+    else:
+        print(f"  Jobs:         0")
+    
+    # =========================================================================
+    # Sessions
+    # =========================================================================
+    print()
+    print(color("◆ Sessions", Colors.CYAN, Colors.BOLD))
+    
+    sessions_file = Path.home() / ".hermes" / "sessions" / "sessions.json"
+    if sessions_file.exists():
+        import json
+        try:
+            with open(sessions_file) as f:
+                data = json.load(f)
+                print(f"  Active:       {len(data)} session(s)")
+        except:
+            print(f"  Active:       (error reading sessions file)")
+    else:
+        print(f"  Active:       0")
+    
+    # =========================================================================
+    # Deep checks
+    # =========================================================================
+    if deep:
+        print()
+        print(color("◆ Deep Checks", Colors.CYAN, Colors.BOLD))
+        
+        # Check OpenRouter connectivity
+        openrouter_key = os.getenv("OPENROUTER_API_KEY", "")
+        if openrouter_key:
+            try:
+                import httpx
+                response = httpx.get(
+                    "https://openrouter.ai/api/v1/models",
+                    headers={"Authorization": f"Bearer {openrouter_key}"},
+                    timeout=10
+                )
+                ok = response.status_code == 200
+                print(f"  OpenRouter:   {check_mark(ok)} {'reachable' if ok else f'error ({response.status_code})'}")
+            except Exception as e:
+                print(f"  OpenRouter:   {check_mark(False)} error: {e}")
+        
+        # Check gateway port
+        try:
+            import socket
+            sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+            sock.settimeout(1)
+            result = sock.connect_ex(('127.0.0.1', 18789))
+            sock.close()
+            # Port in use = gateway likely running
+            port_in_use = result == 0
+            # This is informational, not necessarily bad
+            print(f"  Port 18789:   {'in use' if port_in_use else 'available'}")
+        except:
+            pass
+    
+    print()
+    print(color("─" * 60, Colors.DIM))
+    print(color("  Run 'hermes doctor' for detailed diagnostics", Colors.DIM))
+    print(color("  Run 'hermes setup' to configure", Colors.DIM))
+    print()
--- a/hermes_cli/uninstall.py
+++ b/hermes_cli/uninstall.py
@@ -0,0 +1,341 @@
+"""
+Hermes Agent Uninstaller.
+
+Provides options for:
+- Full uninstall: Remove everything including configs and data
+- Keep data: Remove code but keep ~/.hermes/ (configs, sessions, logs)
+"""
+
+import os
+import sys
+import shutil
+import subprocess
+from pathlib import Path
+from typing import Optional
+
+# ANSI colors
+class Colors:
+    RESET = "\033[0m"
+    BOLD = "\033[1m"
+    DIM = "\033[2m"
+    RED = "\033[31m"
+    GREEN = "\033[32m"
+    YELLOW = "\033[33m"
+    BLUE = "\033[34m"
+    MAGENTA = "\033[35m"
+    CYAN = "\033[36m"
+
+def color(text: str, *codes) -> str:
+    """Apply color codes to text (only in TTY)."""
+    if not sys.stdout.isatty():
+        return text
+    return "".join(codes) + text + Colors.RESET
+
+def log_info(msg: str):
+    print(f"{color('→', Colors.CYAN)} {msg}")
+
+def log_success(msg: str):
+    print(f"{color('✓', Colors.GREEN)} {msg}")
+
+def log_warn(msg: str):
+    print(f"{color('⚠', Colors.YELLOW)} {msg}")
+
+def log_error(msg: str):
+    print(f"{color('✗', Colors.RED)} {msg}")
+
+
+def get_project_root() -> Path:
+    """Get the project installation directory."""
+    return Path(__file__).parent.parent.resolve()
+
+
+def get_hermes_home() -> Path:
+    """Get the Hermes home directory (~/.hermes)."""
+    return Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
+
+
+def find_shell_configs() -> list:
+    """Find shell configuration files that might have PATH entries."""
+    home = Path.home()
+    configs = []
+    
+    candidates = [
+        home / ".bashrc",
+        home / ".bash_profile",
+        home / ".profile",
+        home / ".zshrc",
+        home / ".zprofile",
+    ]
+    
+    for config in candidates:
+        if config.exists():
+            configs.append(config)
+    
+    return configs
+
+
+def remove_path_from_shell_configs():
+    """Remove Hermes PATH entries from shell configuration files."""
+    configs = find_shell_configs()
+    removed_from = []
+    
+    for config_path in configs:
+        try:
+            content = config_path.read_text()
+            original_content = content
+            
+            # Remove lines containing hermes-agent or hermes PATH entries
+            new_lines = []
+            skip_next = False
+            
+            for line in content.split('\n'):
+                # Skip the "# Hermes Agent" comment and following line
+                if '# Hermes Agent' in line or '# hermes-agent' in line:
+                    skip_next = True
+                    continue
+                if skip_next and ('hermes' in line.lower() and 'PATH' in line):
+                    skip_next = False
+                    continue
+                skip_next = False
+                
+                # Remove any PATH line containing hermes
+                if 'hermes' in line.lower() and ('PATH=' in line or 'path=' in line.lower()):
+                    continue
+                    
+                new_lines.append(line)
+            
+            new_content = '\n'.join(new_lines)
+            
+            # Clean up multiple blank lines
+            while '\n\n\n' in new_content:
+                new_content = new_content.replace('\n\n\n', '\n\n')
+            
+            if new_content != original_content:
+                config_path.write_text(new_content)
+                removed_from.append(config_path)
+                
+        except Exception as e:
+            log_warn(f"Could not update {config_path}: {e}")
+    
+    return removed_from
+
+
+def remove_wrapper_script():
+    """Remove the hermes wrapper script if it exists."""
+    wrapper_paths = [
+        Path.home() / ".local" / "bin" / "hermes",
+        Path("/usr/local/bin/hermes"),
+    ]
+    
+    removed = []
+    for wrapper in wrapper_paths:
+        if wrapper.exists():
+            try:
+                # Check if it's our wrapper (contains hermes_cli reference)
+                content = wrapper.read_text()
+                if 'hermes_cli' in content or 'hermes-agent' in content:
+                    wrapper.unlink()
+                    removed.append(wrapper)
+            except Exception as e:
+                log_warn(f"Could not remove {wrapper}: {e}")
+    
+    return removed
+
+
+def uninstall_gateway_service():
+    """Stop and uninstall the gateway service if running."""
+    import platform
+    
+    if platform.system() != "Linux":
+        return False
+    
+    service_file = Path.home() / ".config" / "systemd" / "user" / "hermes-gateway.service"
+    
+    if not service_file.exists():
+        return False
+    
+    try:
+        # Stop the service
+        subprocess.run(
+            ["systemctl", "--user", "stop", "hermes-gateway"],
+            capture_output=True,
+            check=False
+        )
+        
+        # Disable the service
+        subprocess.run(
+            ["systemctl", "--user", "disable", "hermes-gateway"],
+            capture_output=True,
+            check=False
+        )
+        
+        # Remove service file
+        service_file.unlink()
+        
+        # Reload systemd
+        subprocess.run(
+            ["systemctl", "--user", "daemon-reload"],
+            capture_output=True,
+            check=False
+        )
+        
+        return True
+        
+    except Exception as e:
+        log_warn(f"Could not fully remove gateway service: {e}")
+        return False
+
+
+def run_uninstall(args):
+    """
+    Run the uninstall process.
+    
+    Options:
+    - Full uninstall: removes code + ~/.hermes/ (configs, data, logs)
+    - Keep data: removes code but keeps ~/.hermes/ for future reinstall
+    """
+    project_root = get_project_root()
+    hermes_home = get_hermes_home()
+    
+    print()
+    print(color("┌─────────────────────────────────────────────────────────┐", Colors.MAGENTA, Colors.BOLD))
+    print(color("│            🦋 Hermes Agent Uninstaller                  │", Colors.MAGENTA, Colors.BOLD))
+    print(color("└─────────────────────────────────────────────────────────┘", Colors.MAGENTA, Colors.BOLD))
+    print()
+    
+    # Show what will be affected
+    print(color("Current Installation:", Colors.CYAN, Colors.BOLD))
+    print(f"  Code:    {project_root}")
+    print(f"  Config:  {hermes_home / 'config.yaml'}")
+    print(f"  Secrets: {hermes_home / '.env'}")
+    print(f"  Data:    {hermes_home / 'cron/'}, {hermes_home / 'sessions/'}, {hermes_home / 'logs/'}")
+    print()
+    
+    # Ask for confirmation
+    print(color("Uninstall Options:", Colors.YELLOW, Colors.BOLD))
+    print()
+    print("  1) " + color("Keep data", Colors.GREEN) + " - Remove code only, keep configs/sessions/logs")
+    print("     (Recommended - you can reinstall later with your settings intact)")
+    print()
+    print("  2) " + color("Full uninstall", Colors.RED) + " - Remove everything including all data")
+    print("     (Warning: This deletes all configs, sessions, and logs permanently)")
+    print()
+    print("  3) " + color("Cancel", Colors.CYAN) + " - Don't uninstall")
+    print()
+    
+    try:
+        choice = input(color("Select option [1/2/3]: ", Colors.BOLD)).strip()
+    except (KeyboardInterrupt, EOFError):
+        print()
+        print("Cancelled.")
+        return
+    
+    if choice == "3" or choice.lower() in ("c", "cancel", "q", "quit", "n", "no"):
+        print()
+        print("Uninstall cancelled.")
+        return
+    
+    full_uninstall = (choice == "2")
+    
+    # Final confirmation
+    print()
+    if full_uninstall:
+        print(color("⚠️  WARNING: This will permanently delete ALL Hermes data!", Colors.RED, Colors.BOLD))
+        print(color("   Including: configs, API keys, sessions, scheduled jobs, logs", Colors.RED))
+    else:
+        print("This will remove the Hermes code but keep your configuration and data.")
+    
+    print()
+    try:
+        confirm = input(f"Type '{color('yes', Colors.YELLOW)}' to confirm: ").strip().lower()
+    except (KeyboardInterrupt, EOFError):
+        print()
+        print("Cancelled.")
+        return
+    
+    if confirm != "yes":
+        print()
+        print("Uninstall cancelled.")
+        return
+    
+    print()
+    print(color("Uninstalling...", Colors.CYAN, Colors.BOLD))
+    print()
+    
+    # 1. Stop and uninstall gateway service
+    log_info("Checking for gateway service...")
+    if uninstall_gateway_service():
+        log_success("Gateway service stopped and removed")
+    else:
+        log_info("No gateway service found")
+    
+    # 2. Remove PATH entries from shell configs
+    log_info("Removing PATH entries from shell configs...")
+    removed_configs = remove_path_from_shell_configs()
+    if removed_configs:
+        for config in removed_configs:
+            log_success(f"Updated {config}")
+    else:
+        log_info("No PATH entries found to remove")
+    
+    # 3. Remove wrapper script
+    log_info("Removing hermes command...")
+    removed_wrappers = remove_wrapper_script()
+    if removed_wrappers:
+        for wrapper in removed_wrappers:
+            log_success(f"Removed {wrapper}")
+    else:
+        log_info("No wrapper script found")
+    
+    # 4. Remove installation directory (code)
+    log_info(f"Removing installation directory...")
+    
+    # Check if we're running from within the install dir
+    # We need to be careful here
+    try:
+        if project_root.exists():
+            # If the install is inside ~/.hermes/, just remove the hermes-agent subdir
+            if hermes_home in project_root.parents or project_root.parent == hermes_home:
+                shutil.rmtree(project_root)
+                log_success(f"Removed {project_root}")
+            else:
+                # Installation is somewhere else entirely
+                shutil.rmtree(project_root)
+                log_success(f"Removed {project_root}")
+    except Exception as e:
+        log_warn(f"Could not fully remove {project_root}: {e}")
+        log_info("You may need to manually remove it")
+    
+    # 5. Optionally remove ~/.hermes/ data directory
+    if full_uninstall:
+        log_info("Removing configuration and data...")
+        try:
+            if hermes_home.exists():
+                shutil.rmtree(hermes_home)
+                log_success(f"Removed {hermes_home}")
+        except Exception as e:
+            log_warn(f"Could not fully remove {hermes_home}: {e}")
+            log_info("You may need to manually remove it")
+    else:
+        log_info(f"Keeping configuration and data in {hermes_home}")
+    
+    # Done
+    print()
+    print(color("┌─────────────────────────────────────────────────────────┐", Colors.GREEN, Colors.BOLD))
+    print(color("│              ✓ Uninstall Complete!                      │", Colors.GREEN, Colors.BOLD))
+    print(color("└─────────────────────────────────────────────────────────┘", Colors.GREEN, Colors.BOLD))
+    print()
+    
+    if not full_uninstall:
+        print(color("Your configuration and data have been preserved:", Colors.CYAN))
+        print(f"  {hermes_home}/")
+        print()
+        print("To reinstall later with your existing settings:")
+        print(color("  curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash", Colors.DIM))
+        print()
+    
+    print(color("Reload your shell to complete the process:", Colors.YELLOW))
+    print("  source ~/.bashrc  # or ~/.zshrc")
+    print()
+    print("Thank you for using Hermes Agent! 🦋")
+    print()
--- a/local_server.py
+++ b/local_server.py
@@ -1,353 +0,0 @@
-"""
-Local OpenAI-compatible server implementation for Hermes-Agent (Atropos integration).
-
-Extends the Atropos APIServer to work with local OpenAI-compatible APIs (e.g. vLLM, SGLang),
-providing tokens_and_logprobs_completion support via client-side tokenization.
-"""
-
-import asyncio
-import os
-import warnings
-from typing import Any, List, Optional
-
-import openai
-from openai.types.chat.chat_completion import ChatCompletion
-from openai.types.completion import Completion
-
-from atroposlib.envs.server_handling.server_baseline import (
-    APIServer,
-    APIServerConfig,
-    ReasoningConfig,
-)
-
-
-class LocalServer(APIServer):
-    """
-    OpenAI-compatible local server with tokens_and_logprobs support.
-    
-    Uses an OpenAI-compatible API (typically at a /v1 endpoint) and handles
-    token extraction via client-side tokenization.
-    
-    Note: Many local servers don't return per-token logprobs in the standard API,
-    so this implementation uses placeholder logprobs (0.0) for PoC purposes.
-    For production training, use vLLM/SGLang servers that return real logprobs.
-    """
-
-    def __init__(
-        self,
-        config: APIServerConfig,
-        tokenizer: Optional[Any] = None,
-        tokenizer_name: str = "gpt2",
-        reasoning_config: Optional[ReasoningConfig] = None,
-    ):
-        """
-        Initialize the local server.
-        
-        Args:
-            config: Server configuration
-            tokenizer: Pre-initialized tokenizer (optional)
-            tokenizer_name: Name of tokenizer to load if tokenizer not provided
-            reasoning_config: Optional reasoning configuration
-        """
-        # Build the OpenAI client pointing to the server's /v1 endpoint
-        base_url = config.base_url
-        if base_url and not base_url.endswith("/v1"):
-            base_url = f"{base_url.rstrip('/')}/v1"
-        
-        self.openai = openai.AsyncClient(
-            api_key=config.api_key or "local",  # Local servers often ignore auth
-            base_url=base_url,
-            timeout=config.timeout,
-        )
-        
-        # Initialize tokenizer
-        if tokenizer is not None:
-            self.tokenizer = tokenizer
-        else:
-            try:
-                from transformers import AutoTokenizer  # type: ignore
-            except ModuleNotFoundError as exc:
-                raise ModuleNotFoundError(
-                    "Missing optional dependency 'transformers'. Pass a tokenizer instance to LocalServer, "
-                    "or install transformers to enable `tokenizer_name` auto-loading."
-                ) from exc
-            self.tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)
-            
-        # Add a simple chat template if the tokenizer doesn't have one
-        # This is needed for ManagedServer's chat_completion to work
-        if not hasattr(self.tokenizer, 'chat_template') or self.tokenizer.chat_template is None:
-            # Simple ChatML-style template
-            self.tokenizer.chat_template = (
-                "{% for message in messages %}"
-                "{% if message['role'] == 'system' %}<|im_start|>system\n{{ message['content'] }}<|im_end|>\n"
-                "{% elif message['role'] == 'user' %}<|im_start|>user\n{{ message['content'] }}<|im_end|>\n"
-                "{% elif message['role'] == 'assistant' %}<|im_start|>assistant\n{{ message['content'] }}<|im_end|>\n"
-                "{% endif %}"
-                "{% endfor %}"
-                "{% if add_generation_prompt %}<|im_start|>assistant\n{% endif %}"
-            )
-        
-        super().__init__(config, reasoning_config=reasoning_config)
-        # Local servers are treated as always-healthy unless a status task is enabled.
-        self.server_healthy = True
-
-    @classmethod
-    def from_env(
-        cls,
-        base_url: Optional[str] = None,
-        model: Optional[str] = None,
-        api_key: Optional[str] = None,
-        tokenizer_name: str = "gpt2",
-        **kwargs,
-    ) -> "LocalServer":
-        """
-        Create a LocalServer from environment variables (or explicit overrides).
-        
-        Env vars (checked in order):
-        - base URL: ATROPOS_SERVER_BASE_URL, OPENAI_BASE_URL, LOCAL_LLM_BASE_URL, LLM_BASE_URL
-        - model:    ATROPOS_SERVER_MODEL,    LLM_MODEL,       LOCAL_LLM_MODEL
-        - api key:  ATROPOS_SERVER_API_KEY,  OPENAI_API_KEY,  LOCAL_LLM_API_KEY, LLM_API_KEY
-        """
-        from dotenv import load_dotenv
-        load_dotenv()
-        
-        base_url = (
-            base_url
-            or os.getenv("ATROPOS_SERVER_BASE_URL")
-            or os.getenv("OPENAI_BASE_URL")
-            or os.getenv("LOCAL_LLM_BASE_URL")
-            or os.getenv("LLM_BASE_URL")
-            or "http://localhost:11434"
-        )
-        model = (
-            model
-            or os.getenv("ATROPOS_SERVER_MODEL")
-            or os.getenv("LLM_MODEL")
-            or os.getenv("LOCAL_LLM_MODEL")
-            or "hermes3:8b"
-        )
-        api_key = (
-            api_key
-            or os.getenv("ATROPOS_SERVER_API_KEY")
-            or os.getenv("OPENAI_API_KEY")
-            or os.getenv("LOCAL_LLM_API_KEY")
-            or os.getenv("LLM_API_KEY")
-        )
-        
-        config = APIServerConfig(
-            model_name=model,
-            base_url=base_url,
-            api_key=api_key or "local",
-            timeout=kwargs.get("timeout", 120),
-            num_max_requests_at_once=kwargs.get("num_max_requests_at_once", 4),
-            num_requests_for_eval=kwargs.get("num_requests_for_eval", 4),
-            health_check=False,  # Local dev servers often lack /health
-        )
-        
-        return cls(config, tokenizer_name=tokenizer_name)
-
-    async def check_server_status_task(self, chat_completion: bool = True):
-        """
-        Check if the server is healthy.
-        
-        For local development, we generally assume the server is healthy.
-        """
-        while True:
-            try:
-                # Simple health check via a minimal completion
-                if chat_completion:
-                    await self.openai.chat.completions.create(
-                        model=self.config.model_name,
-                        messages=[{"role": "user", "content": "hi"}],
-                        max_tokens=1,
-                    )
-                else:
-                    await self.openai.completions.create(
-                        model=self.config.model_name,
-                        prompt="hi",
-                        max_tokens=1,
-                    )
-                self.server_healthy = True
-            except Exception:
-                self.server_healthy = False
-            await asyncio.sleep(5)
-
-    async def _chat_completion_wrapper(self, **kwargs) -> ChatCompletion:
-        """
-        Wrapper for chat completion using an OpenAI-compatible API.
-        """
-        assert kwargs.get("model") is not None, "Model is required!"
-        assert kwargs.get("messages") is not None, "Messages are required!"
-        
-        n = kwargs.get("n", 1)
-        
-        # Some OpenAI-compatible servers don't support n > 1, so we make multiple requests.
-        if n > 1:
-            completion_list = await asyncio.gather(
-                *[self.openai.chat.completions.create(**{**kwargs, "n": 1}) for _ in range(n)]
-            )
-            # Merge completions
-            completions = completion_list[0]
-            for c in completion_list[1:]:
-                for choice in c.choices:
-                    choice.index = len(completions.choices)
-                    completions.choices.append(choice)
-            return completions
-        else:
-            return await self.openai.chat.completions.create(**kwargs)
-
-    async def _completion_wrapper(self, **kwargs) -> Completion:
-        """
-        Wrapper for completion using an OpenAI-compatible API.
-        """
-        assert kwargs.get("model") is not None, "Model is required!"
-        assert kwargs.get("prompt") is not None, "Prompt is required!"
-        
-        n = kwargs.get("n", 1)
-        
-        # Some OpenAI-compatible servers don't support n > 1.
-        if n > 1:
-            completion_list = await asyncio.gather(
-                *[self.openai.completions.create(**{**kwargs, "n": 1}) for _ in range(n)]
-            )
-            completions = completion_list[0]
-            for c in completion_list[1:]:
-                for choice in c.choices:
-                    choice.index = len(completions.choices)
-                    completions.choices.append(choice)
-            return completions
-        else:
-            return await self.openai.completions.create(**kwargs)
-
-    async def _tokens_and_logprobs_completion_wrapper(
-        self, **kwargs
-    ) -> tuple[List[int], List[List[int]], List[List[float]], List[str]]:
-        """
-        Wrapper for tokens and logprobs completion.
-        
-        Returns:
-            Tuple of (prompt_tokens, output_tokens_list, output_logprobs_list, finish_reasons)
-        
-        Note: Many OpenAI-compatible local servers don't return per-token logprobs,
-        so we use placeholder logprobs (0.0). For real training, use vLLM/SGLang.
-        """
-        model = kwargs.get("model")
-        assert model is not None, "Model is required!"
-        
-        # Handle input_ids (from ManagedServer) or prompt
-        if "input_ids" in kwargs:
-            prompt_tokens = kwargs.pop("input_ids")
-            prompt = self.tokenizer.decode(prompt_tokens)
-            kwargs.pop("prompt", None)
-        else:
-            prompt = kwargs.pop("prompt", "")
-            prompt_tokens = self.tokenizer.encode(prompt, add_special_tokens=True)
-        
-        n = kwargs.pop("n", 1)
-        max_tokens = kwargs.pop("max_tokens", 256)
-        temperature = kwargs.pop("temperature", 0.7)
-        stop = kwargs.pop("stop", None)
-        
-        # Make completion requests
-        completions = []
-        for _ in range(n):
-            try:
-                response = await self.openai.completions.create(
-                    model=model,
-                    prompt=prompt,
-                    max_tokens=max_tokens,
-                    temperature=temperature,
-                    stop=stop,
-                )
-                completions.append(response)
-            except Exception as e:
-                # Fallback to chat completion if completion endpoint not supported
-                warnings.warn(f"Completion API failed, trying chat: {e}")
-                response = await self.openai.chat.completions.create(
-                    model=model,
-                    messages=[{"role": "user", "content": prompt}],
-                    max_tokens=max_tokens,
-                    temperature=temperature,
-                    stop=stop,
-                )
-                # Convert to completion-like response
-                completions.append(response)
-        
-        output_tokens_list = []
-        output_logprobs_list = []
-        finish_reasons = []
-        
-        for completion in completions:
-            # Extract text from response
-            if hasattr(completion.choices[0], "text"):
-                # Completion API response
-                text = completion.choices[0].text
-                finish_reason = completion.choices[0].finish_reason or "stop"
-            else:
-                # Chat completion API response
-                text = completion.choices[0].message.content or ""
-                finish_reason = completion.choices[0].finish_reason or "stop"
-            
-            # Tokenize output
-            output_tokens = self.tokenizer.encode(text, add_special_tokens=False)
-            
-            # Placeholder logprobs (many local servers don't provide per-token logprobs).
-            # In production, use vLLM/SGLang which return real logprobs
-            output_logprobs = [0.0] * len(output_tokens)
-            
-            output_tokens_list.append(output_tokens)
-            output_logprobs_list.append(output_logprobs)
-            finish_reasons.append(finish_reason)
-        
-        return prompt_tokens, output_tokens_list, output_logprobs_list, finish_reasons
-
-    def managed_server(self, tokenizer=None, track_tree: bool = False):
-        """
-        Create a ManagedServer context manager for this server.
-        
-        Args:
-            tokenizer: Optional tokenizer override
-            track_tree: Whether to maintain tree structure for multi-turn
-            
-        Returns:
-            ManagedServer context manager
-        """
-        from atroposlib.envs.server_handling.managed_server import ManagedServer
-        
-        return ManagedServerContext(
-            self,
-            tokenizer=tokenizer or self.tokenizer,
-            track_tree=track_tree,
-        )
-
-
-class ManagedServerContext:
-    """
-    Context manager wrapper for ManagedServer.
-    
-    Usage:
-        async with server.managed_server(tokenizer=tokenizer) as managed:
-            response = await managed.chat_completion(...)
-            state = managed.get_state()
-    """
-    
-    def __init__(self, server: LocalServer, tokenizer, track_tree: bool = False):
-        self.server = server
-        self.tokenizer = tokenizer
-        self.track_tree = track_tree
-        self.managed = None
-    
-    async def __aenter__(self):
-        from atroposlib.envs.server_handling.managed_server import ManagedServer
-        
-        self.managed = ManagedServer(
-            self.server,
-            tokenizer=self.tokenizer,
-            track_tree=self.track_tree,
-        )
-        return self.managed
-    
-    async def __aexit__(self, exc_type, exc_val, exc_tb):
-        if self.managed:
-            self.managed.reset()
-        return False
--- a/memory-bank/activeContext.md
+++ b/memory-bank/activeContext.md
@@ -1,62 +0,0 @@
-# Active Context
-
-## Current Focus
-Singularity/Apptainer integration for HPC environments has been **COMPLETED AND TESTED**.
-
-## Recently Completed (Feb 6, 2026)
-
-### Singularity/Apptainer Sandbox Integration - FULLY WORKING
-Successfully adapted the Atropos implementation from Docker to Singularity/Apptainer for HPC clusters where Docker cannot run without sudo permissions.
-
-**Files Modified:**
-1. `atropos/nomad/client.py` - Added `driver` and `singularity_image` parameters to `create_sandbox_job()`; Fixed port detection to check both `DynamicPorts` and `ReservedPorts` in `get_job_allocations()`
-2. `atropos/slots/pool.py` - Added `driver` and `singularity_image` to `SlotPoolConfig`
-3. `atropos/backends/nomad_backend.py` - Added driver options to `NomadBackendConfig`
-4. `atropos/envs/agent_env.py` - Added CLI arguments `--env.driver` and `--env.singularity_image` to `AgentEnvConfig`
-
-**Files Created:**
-1. `nomad-singularity.hcl` - Nomad config with raw_exec driver enabled
-2. `atropos/atropos-sandbox.sif` - Singularity image (80MB) built from Docker image
-3. `test_singularity_job.py` - Test script for Singularity integration
-
-**Key Implementation Details:**
- Uses Nomad's `raw_exec` driver to run `apptainer` commands
- Shell wrapper (`/bin/sh -c`) ensures Nomad environment variables expand correctly
- Binds Nomad allocation directory to `/data` for workspace persistence
- Uses **static ports** (`ReservedPorts`) instead of dynamic ports since raw_exec runs directly on host
- `get_job_allocations()` now checks both `DynamicPorts` (Docker) and `ReservedPorts` (Singularity)
-
-**Test Results (All Passing):**
- Health check: ✅ Server responding with 5 slots
- Bash execution: ✅ Commands execute inside Singularity container
- Write file: ✅ File written to slot workspace
- Read file: ✅ File read back successfully
-
-## Usage
-
-### For Docker (default):
-```python
-config = SlotPoolConfig(
-    driver="docker",
-    image="atropos-sandbox:local",
-)
-```
-
-### For Singularity/Apptainer:
-```python
-config = SlotPoolConfig(
-    driver="singularity",
-    singularity_image="/path/to/atropos-sandbox.sif",
-)
-```
-
-### Nomad Configuration:
-```bash
-# Start Nomad with Singularity support
-nomad agent -dev -config=nomad-singularity.hcl
-```
-
-## Next Steps
- Deploy to HPC cluster for production testing
- Consider adding bubblewrap (bwrap) support inside Singularity for additional sandboxing
- Document HPC-specific deployment procedures in skills/mlops/
--- a/memory-bank/productContext.md
+++ b/memory-bank/productContext.md
@@ -1,55 +0,0 @@
-# Product Context: Hermes-Agent
-
-## Why This Project Exists
-
-Hermes-Agent addresses several key challenges in the AI agent space:
-
-1. **Unified Tool Interface** - Provides a clean, consistent interface for LLMs to use various tools (web, terminal, browser, vision, etc.) without requiring custom integration for each model provider.
-
-2. **Training Data Generation** - Enables efficient generation of high-quality tool-calling trajectories for fine-tuning LLMs, with features like batch processing, checkpointing, and trajectory compression.
-
-3. **Flexible Deployment** - Supports multiple execution environments (local, Docker, Singularity, Modal, SSH) to accommodate different security and isolation requirements.
-
-4. **Developer Experience** - Offers a beautiful, interactive CLI with kawaii-style feedback that makes working with AI agents enjoyable.
-
-## Problems It Solves
-
-### For AI Researchers
- **Data Generation at Scale**: Parallel batch processing with content-based checkpointing for fault tolerance
- **Clean Trajectories**: Trajectory compression to fit token budgets while preserving important information
- **Toolset Distributions**: Probability-based tool selection for varied training data
-
-### For Developers
- **Tool Orchestration**: Logical grouping of tools into toolsets (research, development, debugging, etc.)
- **Session Persistence**: Conversation history and session logging for debugging
- **Multi-Model Support**: Works with any OpenAI-compatible API (OpenRouter, local models, etc.)
-
-### For MLOps
- **Skills System**: On-demand knowledge documents for specific tools/frameworks (Axolotl, vLLM, TRL, etc.)
- **Sandboxed Execution**: Terminal commands can run in isolated environments (Docker, Singularity, Modal)
- **Configurable Backends**: Easy switching between local and cloud execution
-
-## How It Should Work
-
-### User Flow (CLI)
-1. User launches `./hermes` 
-2. Beautiful welcome banner displays with caduceus logo, model info, and available tools
-3. User types a natural language request
-4. Agent processes request, potentially calling tools with animated feedback
-5. Agent responds with results, conversation continues
-6. Session is automatically logged for debugging
-
-### User Flow (Batch Processing)
-1. User prepares JSONL file with prompts
-2. Runs `batch_runner.py` with distribution and worker count
-3. System processes prompts in parallel, saves checkpoints
-4. Completed trajectories saved to `data/<run_name>/trajectories.jsonl`
-5. Optional: compress trajectories with `trajectory_compressor.py`
-
-## User Experience Goals
-
- **Delightful Interaction**: Kawaii ASCII faces, animated spinners, cute messages
- **Informative Feedback**: Clear progress indication during tool execution
- **Configurable Personalities**: From "helpful" to "pirate" to "Shakespeare"
- **Easy Configuration**: YAML config file + environment variables + CLI flags
- **Graceful Degradation**: Missing tools/APIs don't break the system, just disable features
--- a/memory-bank/progress.md
+++ b/memory-bank/progress.md
@@ -1,67 +0,0 @@
-# Progress
-
-## Completed Features
-
-### ✅ Singularity/Apptainer Sandbox Integration (Feb 6, 2026 - FULLY TESTED)
-Adapted the Atropos sandbox environment from Docker to Singularity/Apptainer for HPC clusters.
-
-**What Works:**
- `create_sandbox_job()` supports both `driver="docker"` and `driver="singularity"`
- SlotPoolConfig and NomadBackendConfig propagate driver settings
- Singularity container runs sandbox_server.py via Nomad's raw_exec driver
- All sandbox operations work: bash execution, file read/write
- Nomad environment variables properly expanded via shell wrapper
- **CLI arguments** `--env.driver` and `--env.singularity_image` for AgentEnvConfig
- **Static port binding** for Singularity (ReservedPorts vs DynamicPorts)
- **Port detection** works for both Docker and Singularity allocations
-
-**CLI Usage:**
-```bash
-python -m atropos.envs.swe_smith_oracle_env process \
-    --env.driver singularity \
-    --env.singularity_image /path/to/atropos-sandbox.sif
-```
-
-**Created Files:**
- `nomad-singularity.hcl` - Nomad config with raw_exec enabled
- `atropos/atropos-sandbox.sif` - 80MB Singularity image
- `test_singularity_job.py` - Integration test script
-
-**Modified Files:**
- `atropos/nomad/client.py` - driver support + ReservedPorts detection
- `atropos/slots/pool.py` - driver config fields
- `atropos/backends/nomad_backend.py` - driver config fields
- `atropos/envs/agent_env.py` - CLI arguments for driver selection
-
-### ✅ Memory Bank Initialized (Feb 5, 2026)
-Set up project documentation structure for context persistence.
-
-## In Progress
-None currently.
-
-## Known Issues
- `bwrap_available: false` in Singularity containers - bubblewrap sandboxing not available inside the container (kernel namespaces already in use)
- Health check timing - may need longer wait for container startup on slower systems
-
-## What's Left to Build
-
-### HPC Deployment
- [ ] Test on actual HPC cluster with Slurm/PBS integration
- [ ] Document cluster-specific deployment procedures
- [ ] Add support for shared filesystem workspace binding
-
-### Enhanced Sandboxing
- [ ] Investigate alternative sandboxing inside Singularity (seccomp, etc.)
- [ ] Add network isolation options for Singularity
-
-### Documentation
- [ ] Add Singularity deployment to README
- [ ] Create HPC deployment skill in skills/mlops/
-
-## Evolution of Decisions
-
-### Container Runtime Selection
- **Initial**: Docker-only via Nomad docker driver
- **Problem**: HPC clusters don't allow Docker without sudo
- **Solution**: Added Singularity/Apptainer support via raw_exec driver
- **Result**: Both runtimes now supported with same API
--- a/memory-bank/projectbrief.md
+++ b/memory-bank/projectbrief.md
@@ -1,44 +0,0 @@
-# Project Brief: Hermes-Agent
-
-## Overview
-Hermes-Agent is an AI agent harness for LLMs with advanced tool-calling capabilities, featuring a flexible toolsets system for organizing and managing tools. Named after Hermes, the Greek messenger god, it serves as a bridge between human intent and AI-powered task execution.
-
-## Core Requirements
-
-### Primary Goals
-1. **Interactive CLI Experience** - Beautiful terminal interface with animated feedback, personalities, and session management
-2. **Flexible Tool System** - Modular tools organized into logical toolsets for different use cases
-3. **Batch Processing** - Process multiple prompts in parallel with checkpointing and statistics
-4. **Multi-Backend Support** - Support for local, Docker, Singularity, Modal, and SSH terminal backends
-5. **Training Data Generation** - Save conversation trajectories in formats suitable for LLM fine-tuning
-
-### Target Users
- AI researchers generating training data
- Developers needing an AI assistant with tool access
- MLOps practitioners automating workflows
- Anyone needing a powerful CLI-based AI agent
-
-## Scope
-
-### In Scope
- Interactive CLI with rich formatting and kawaii-style feedback
- Web tools (search, extract, crawl via Firecrawl)
- Terminal tools (command execution across multiple backends)
- Browser automation (via agent-browser + Browserbase)
- Vision tools (image analysis)
- Image generation (FLUX via FAL.ai)
- Mixture-of-Agents reasoning
- Skills system for on-demand knowledge
- Batch processing with parallel workers
- Trajectory compression for training
-
-### Out of Scope (Current)
- Proactive suggestions (agent only runs on request)
- Clipboard integration (no local system access)
- Real-time streaming of thinking/reasoning (deferred)
-
-## Success Metrics
- Clean, maintainable tool architecture
- Reliable tool execution with proper error handling
- Efficient context management for long conversations
- High-quality trajectory data for training
--- a/memory-bank/systemPatterns.md
+++ b/memory-bank/systemPatterns.md
@@ -1,149 +0,0 @@
-# System Patterns: Hermes-Agent
-
-## Architecture Overview
-
-```
-┌─────────────────────────────────────────────────────────────────┐
-│                           CLI (cli.py)                          │
-│  - Rich welcome banner with caduceus                            │
-│  - prompt_toolkit for input with history                        │
-│  - Kawaii-style feedback and personalities                      │
-└────────────────────────────┬────────────────────────────────────┘
-                             │
-                             ▼
-┌─────────────────────────────────────────────────────────────────┐
-│                     AIAgent (run_agent.py)                      │
-│  - Conversation loop with tool calling                          │
-│  - KawaiiSpinner for animated feedback                          │
-│  - Retry logic with exponential backoff                         │
-│  - Session logging to logs/ directory                           │
-└────────────────────────────┬────────────────────────────────────┘
-                             │
-                             ▼
-┌─────────────────────────────────────────────────────────────────┐
-│                   Tool Routing (model_tools.py)                 │
-│  - get_tool_definitions() - returns tools for API calls         │
-│  - handle_function_call() - dispatches to tool handlers         │
-│  - Toolset filtering (enabled/disabled)                         │
-└────────────────────────────┬────────────────────────────────────┘
-                             │
-           ┌─────────────────┼─────────────────┐
-           ▼                 ▼                 ▼
-    ┌───────────┐     ┌───────────┐     ┌───────────┐
-    │ Web Tools │     │ Terminal  │     │ Browser   │
-    │ (Firecrawl)│    │ (mini-swe)│     │(agent-brw)│
-    └───────────┘     └───────────┘     └───────────┘
-           │                 │                 │
-           └─────────────────┼─────────────────┘
-                             ▼
-                    ┌───────────────┐
-                    │  Toolsets     │
-                    │  (toolsets.py)│
-                    │  Composition  │
-                    └───────────────┘
-```
-
-## Key Design Patterns
-
-### 1. Toolset Composition Pattern
-Toolsets can include other toolsets, allowing flexible composition:
-
-```python
-TOOLSETS = {
-    "web": {"tools": ["web_search", "web_extract"], "includes": []},
-    "debugging": {"tools": ["terminal"], "includes": ["web"]},
-    "full_stack": {"tools": [], "includes": ["web", "terminal", "vision", "browser"]}
-}
-```
-
-Resolution is recursive with cycle detection.
-
-### 2. Graceful Degradation Pattern
-Each tool module has a `check_*_requirements()` function:
- Tools are only loaded if requirements are met
- Missing API keys disable tools, not crash the system
- Import errors are caught and tools marked unavailable
-
-```python
-try:
-    from tools.web_tools import web_search_tool, check_firecrawl_api_key
-except ModuleNotFoundError:
-    web_search_tool = None
-    def check_firecrawl_api_key(): return False
-```
-
-### 3. Session Isolation Pattern (task_id)
-Stateful tools (terminal, browser) use `task_id` to isolate concurrent sessions:
- Each batch worker gets unique task_id
- VMs and browser sessions are tracked per task_id
- Cleanup functions release resources: `cleanup_vm(task_id)`, `cleanup_browser(task_id)`
-
-### 4. Trajectory Format Pattern
-Conversations are saved in ShareGPT format for training:
-
-```json
-{"from": "system", "value": "System prompt with <tools>...</tools>"}
-{"from": "human", "value": "User message"}
-{"from": "gpt", "value": "<think>reasoning</think>\n<tool_call>{...}</tool_call>"}
-{"from": "tool", "value": "<tool_response>{...}</tool_response>"}
-{"from": "gpt", "value": "Final response"}
-```
-
-### 5. Ephemeral System Prompt Pattern
-Guide model behavior during data collection without saving to trajectories:
- `ephemeral_system_prompt` influences execution
- Only standard tool-calling system prompt saved to trajectories
- Keeps training data clean
-
-### 6. Retry with Validation Pattern
-The agent validates responses before accepting:
- Check tool names against `valid_tool_names` set
- Validate JSON arguments can be parsed
- Check for content after `<think>` blocks
- Roll back to last valid state on persistent failures
-
-## Component Relationships
-
-### AIAgent Class
- Central orchestrator for conversations
- Manages conversation history
- Calls OpenAI-compatible API
- Routes tool calls to handlers
- Provides animated feedback (KawaiiSpinner)
-
-### Tool Modules (tools/*.py)
- Self-contained tool implementations
- Export: handler function + check function + schema
- Return JSON strings (never raw dicts)
- Accept optional `task_id` for stateful tools
-
-### Toolsets System (toolsets.py)
- Defines logical groupings of tools
- Supports composition via `includes`
- `resolve_toolset()` recursively resolves all tools
- `validate_toolset()` checks if name is valid
-
-### Model Tools (model_tools.py)
- Aggregates all tool definitions
- Routes function calls to correct handlers
- Filters tools based on enabled/disabled toolsets
- Bridge between agent and tool implementations
-
-## Critical Implementation Paths
-
-### Tool Execution Flow
-1. AIAgent receives tool_calls from API response
-2. Validates tool names against `valid_tool_names`
-3. Validates JSON arguments can be parsed
-4. Calls `handle_function_call()` with tool name, args, task_id
-5. `handle_function_call()` routes to appropriate handler
-6. Tool executes, returns JSON string
-7. Result added to conversation as tool message
-8. Loop continues until natural language response
-
-### Configuration Loading Flow
-1. `cli.py` calls `load_cli_config()`
-2. Loads `cli-config.yaml`, merges with defaults
-3. Sets environment variables for terminal config
-4. `AIAgent` reads env vars when initializing terminal tool
-5. Terminal tool creates appropriate backend based on `TERMINAL_ENV`
--- a/memory-bank/techContext.md
+++ b/memory-bank/techContext.md
@@ -1,113 +0,0 @@
-# Technical Context: Hermes-Agent
-
-## Technologies Used
-
-### Core Stack
- **Python 3.11+** - Primary language
- **OpenAI SDK** - For LLM API interactions (OpenAI-compatible)
- **OpenRouter** - Default LLM provider (supports multiple models)
- **Rich** - Terminal formatting and panels
- **prompt_toolkit** - Interactive input with history
- **Fire** - CLI argument parsing
- **PyYAML** - Configuration files
- **python-dotenv** - Environment variable management
-
-### Tool Dependencies
- **Firecrawl** - Web search and extraction (`FIRECRAWL_API_KEY`)
- **mini-swe-agent** - Terminal tool backend (local/docker/singularity/modal/ssh)
- **agent-browser** - Browser automation (npm package)
- **Browserbase** - Cloud browser execution (`BROWSERBASE_API_KEY`)
- **FAL.ai** - Image generation with FLUX (`FAL_KEY`)
- **Nous API** - Vision and MoA tools (`NOUS_API_KEY`)
-
-### Optional Dependencies
- **Modal** - Cloud compute for sandboxed environments
- **Singularity/Apptainer** - Rootless containers (HPC environments)
- **Docker** - Container isolation
-
-## Development Setup
-
-### Quick Start
-```bash
-# Clone with submodules
-git clone --recurse-submodules https://github.com/NousResearch/Hermes-Agent.git
-cd Hermes-Agent
-
-# Create virtual environment
-python3 -m venv venv
-source venv/bin/activate
-
-# Install dependencies
-pip install -r requirements.txt
-pip install -e ./mini-swe-agent
-
-# Install browser tools (optional)
-npm install
-
-# Configure environment
-cp .env.example .env
-# Edit .env with your API keys
-```
-
-### Key Configuration Files
- `.env` - API keys and secrets
- `cli-config.yaml` - CLI configuration (model, terminal, toolsets, personalities)
- `configs/` - Batch run scripts and configuration
-
-### Environment Variables
-
-**Required for Full Functionality:**
- `OPENROUTER_API_KEY` - Primary LLM access
- `FIRECRAWL_API_KEY` - Web tools
- `NOUS_API_KEY` - Vision and reasoning tools
- `FAL_KEY` - Image generation
-
-**Terminal Backend:**
- `TERMINAL_ENV` - Backend type: `local`, `docker`, `singularity`, `modal`, `ssh`
- `TERMINAL_CWD` - Working directory
- `TERMINAL_DOCKER_IMAGE` / `TERMINAL_SINGULARITY_IMAGE` - Container images
- `TERMINAL_SSH_HOST/USER/KEY` - SSH backend config
- `SUDO_PASSWORD` - Optional sudo support
-
-**Browser:**
- `BROWSERBASE_API_KEY` - Browser automation
- `BROWSERBASE_PROJECT_ID` - Browserbase project
-
-## Technical Constraints
-
-1. **Context Window Limits** - Long tool outputs can exhaust context; trajectory compression helps
-2. **API Rate Limits** - OpenRouter and tool APIs have rate limits; exponential backoff implemented
-3. **Tool Availability** - Tools gracefully degrade if dependencies/keys missing
-4. **Async Compatibility** - Some tools are async, handled via `asyncio.run()` in sync context
-
-## Dependency Graph
-
-```
-tools/*.py → tools/__init__.py → model_tools.py → toolsets.py → toolset_distributions.py
-                                       ↑
-run_agent.py ──────────────────────────┘
-cli.py → run_agent.py (uses AIAgent with quiet_mode=True)
-batch_runner.py → run_agent.py + toolset_distributions.py
-```
-
-## Tool Usage Patterns
-
-### Adding a New Tool
-1. Create `tools/your_tool.py` with handler + requirements check
-2. Export in `tools/__init__.py`
-3. Register in `model_tools.py` (definitions + handler routing)
-4. Add to toolset in `toolsets.py`
-5. Optionally add to `toolset_distributions.py` for batch processing
-
-### Tool Handler Pattern
-```python
-def your_tool(param: str, task_id: str = None) -> str:
-    """Execute tool and return JSON string result."""
-    try:
-        result = {"success": True, "data": "..."}
-        return json.dumps(result, ensure_ascii=False)
-    except Exception as e:
-        return json.dumps({"error": str(e)}, ensure_ascii=False)
-```
-
-All tool handlers MUST return a JSON string, never raw dicts.
--- a/modal_profiles.yaml.example
+++ b/modal_profiles.yaml.example
@@ -1,134 +0,0 @@
-# Modal Sandbox Profiles Configuration
-# =====================================
-# This file defines different sandbox profiles for heterogeneous workloads.
-# Copy to modal_profiles.yaml and customize as needed.
-#
-# Usage:
-#   terminal_tool("python train.py", profile="pytorch-gpu")
-#   terminal_tool("npm test", profile="node")
-#
-# Each profile can specify:
-#   - image: Docker image to use
-#   - gpu: GPU type (null, "T4", "A10G", "A100", "H100")
-#   - cpu: CPU cores (float)
-#   - memory: Memory in MB
-#   - min_pool: Minimum warm sandboxes (cost vs latency tradeoff)
-#   - max_pool: Maximum sandboxes (hard cost cap)
-#   - idle_timeout: Server-side auto-cleanup in seconds
-#   - max_lifetime: Maximum sandbox lifetime in seconds
-#   - scale_down_idle: Client-side scale-down threshold in seconds
-#   - workdir: Working directory inside container
-#   - secrets: List of Modal Secret names to inject (created via dashboard/CLI)
-#   - env_vars: Dict of environment variables to pass directly
-#   - use_dotenv: If true, loads local .env file into sandbox
-#
-# SECRETS SETUP:
-#   Create secrets via Modal dashboard or CLI:
-#     modal secret create huggingface-token HF_TOKEN=hf_xxx
-#     modal secret create openai-key OPENAI_API_KEY=sk-xxx
-#   Then reference by name in profile's secrets list.
-
-# Default profile used when no profile specified
-default_profile: default
-
-profiles:
-  # Default Python environment - good for most tasks
-  default:
-    image: python:3.11
-    gpu: null
-    cpu: 1.0
-    memory: 2048
-    min_pool: 1        # Keep 1 warm for fast response
-    max_pool: 5
-    idle_timeout: 120  # Modal terminates if idle 2 min
-    max_lifetime: 3600 # Max 1 hour
-    scale_down_idle: 180
-    workdir: /workspace
-    secrets: []        # Add secret names here: ["my-api-keys"]
-    env_vars: {}       # Add env vars here: {DEBUG: "1"}
-    use_dotenv: false  # Set to true to load local .env
-
-  # PyTorch with GPU for ML training/inference
-  pytorch-gpu:
-    image: pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime
-    gpu: T4            # Options: T4, A10G, A100, H100
-    cpu: 4.0
-    memory: 16384      # 16GB
-    min_pool: 0        # Don't keep GPU sandboxes warm (expensive!)
-    max_pool: 2
-    idle_timeout: 60   # Shorter idle timeout for GPU (cost)
-    max_lifetime: 1800 # 30 min max for GPU tasks
-    scale_down_idle: 60
-    workdir: /workspace
-    # ML-specific secrets
-    secrets:
-      - huggingface-token  # HF_TOKEN env var
-      - wandb-key          # WANDB_API_KEY env var
-    env_vars:
-      CUDA_VISIBLE_DEVICES: "0"
-      PYTORCH_CUDA_ALLOC_CONF: "expandable_segments:True"
-
-  # High-end GPU for large models
-  pytorch-a100:
-    image: pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime
-    gpu: A100
-    cpu: 8.0
-    memory: 65536      # 64GB
-    min_pool: 0
-    max_pool: 1        # Only 1 at a time (very expensive)
-    idle_timeout: 30
-    max_lifetime: 3600
-    scale_down_idle: 30
-    workdir: /workspace
-
-  # Node.js for JavaScript/TypeScript tasks
-  node:
-    image: node:18
-    gpu: null
-    cpu: 1.0
-    memory: 2048
-    min_pool: 0        # Create on-demand
-    max_pool: 3
-    idle_timeout: 120
-    max_lifetime: 3600
-    scale_down_idle: 180
-    workdir: /workspace
-
-  # High memory for data processing
-  high-memory:
-    image: python:3.11
-    gpu: null
-    cpu: 4.0
-    memory: 32768      # 32GB
-    min_pool: 0
-    max_pool: 2
-    idle_timeout: 120
-    max_lifetime: 3600
-    scale_down_idle: 180
-    workdir: /workspace
-
-  # Rust development environment
-  rust:
-    image: rust:1.75
-    gpu: null
-    cpu: 2.0
-    memory: 4096
-    min_pool: 0
-    max_pool: 2
-    idle_timeout: 120
-    max_lifetime: 3600
-    scale_down_idle: 180
-    workdir: /workspace
-
-  # Go development environment
-  golang:
-    image: golang:1.21
-    gpu: null
-    cpu: 2.0
-    memory: 4096
-    min_pool: 0
-    max_pool: 2
-    idle_timeout: 120
-    max_lifetime: 3600
-    scale_down_idle: 180
-    workdir: /workspace
--- a/model_tools.py
+++ b/model_tools.py
--- a/nomad-dev.hcl
+++ b/nomad-dev.hcl
@@ -1,37 +0,0 @@
-# Nomad Development Configuration (Hermes-Agent)
-# Run with: nomad agent -dev -config=nomad-dev.hcl
-#
-# This is intended for local development only.
-
-client {
-  enabled = true
-
-  options {
-    # Enable Docker volume mounts for persistent slot workspaces
-    "docker.volumes.enabled" = "true"
-  }
-}
-
-# Docker driver plugin configuration
-plugin "docker" {
-  config {
-    # CRITICAL: Enable volume mounts
-    volumes {
-      enabled = true
-    }
-
-    # Allow privileged containers if needed
-    allow_privileged = false
-
-    # Garbage collection settings
-    gc {
-      image       = true
-      # NOTE: For local dev we often rely on locally built images like `atropos-sandbox:local`.
-      # A short image GC delay can delete these between runs, causing confusing "Failed to pull"
-      # crash loops. Keep this comfortably long; tighten it for CI/production if needed.
-      image_delay = "24h"
-      container   = true
-    }
-  }
-}
-
--- a/nomad-singularity.hcl
+++ b/nomad-singularity.hcl
@@ -1,31 +0,0 @@
-# Nomad Configuration for Singularity/Apptainer Sandbox
-# Run with: nomad agent -dev -config=nomad-singularity.hcl
-#
-# This uses the raw_exec driver to run Apptainer containers.
-# Suitable for HPC environments where Docker cannot run without sudo.
-
-client {
-  enabled = true
-
-  options {
-    # Enable raw_exec driver for Singularity/Apptainer
-    "driver.raw_exec.enable" = "1"
-  }
-}
-
-# raw_exec driver plugin configuration
-plugin "raw_exec" {
-  config {
-    enabled = true
-  }
-}
-
-# Optional: If you have the nomad-driver-singularity plugin installed,
-# uncomment the following instead of using raw_exec:
-# plugin "singularity" {
-#   config {
-#     enabled = true
-#     # Allow bind mounts
-#     bind_paths = ["/tmp", "/var/tmp"]
-#   }
-# }
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -19,7 +19,6 @@ dependencies = [
  "rich",
  "tenacity",
  "pyyaml",
-  "prompt_toolkit",
  "requests",
  "jinja2",
  "pydantic>=2.0",
@@ -35,32 +34,17 @@ dependencies = [
 [project.optional-dependencies]
 modal = ["modal", "boto3"]
 dev = ["pytest", "pytest-asyncio"]
-# Install Atropos from source (PyPI is often stale for this internal dependency).
-atropos = [
-  "atroposlib @ git+https://github.com/NousResearch/atropos.git",
-  # Atropos integration runtime deps (kept optional for Hermes-only users)
-  "aiohttp",
-  "fastapi",
-  "uvicorn",
-  "pyte",
-]
+messaging = ["python-telegram-bot>=20.0", "discord.py>=2.0"]
+cron = ["croniter"]
+cli = ["simple-term-menu"]
+all = ["croniter", "python-telegram-bot>=20.0", "discord.py>=2.0", "simple-term-menu"]

 [project.scripts]
+hermes = "hermes_cli.main:main"
 hermes-agent = "run_agent:main"
-hermes-atropos-sandbox-smoke = "atropos.envs.sandbox_terminal_smoke_env:SandboxTerminalSmokeEnv.cli"
-hermes-atropos-toolserver-smoke = "atropos.envs.toolserver_smoke_env:ToolServerSmokeEnv.cli"

 [tool.setuptools]
-py-modules = [
-  "run_agent",
-  "model_tools",
-  "toolsets",
-  "batch_runner",
-  "trajectory_compressor",
-  "toolset_distributions",
-  "atropos_compatible_agent",
-  "local_server",
-]
+py-modules = ["run_agent", "model_tools", "toolsets", "batch_runner", "trajectory_compressor", "toolset_distributions", "cli"]

 [tool.setuptools.packages.find]
-include = ["tools", "atropos", "atropos.*"]
+include = ["tools", "hermes_cli", "gateway", "cron"]
--- a/requirements.txt
+++ b/requirements.txt
@@ -30,5 +30,15 @@ platformdirs
 # modal
 # boto3

-# Optional: Legacy Hecate terminal backend
-# git+ssh://git@github.com/NousResearch/hecate.git
+# Optional: For cron expression parsing (cronjob scheduling)
+croniter
+
+# Optional: For messaging platform integrations (gateway)
+# Telegram: pip install python-telegram-bot
+python-telegram-bot>=20.0
+
+# Discord: pip install discord.py
+discord.py>=2.0
+
+# WhatsApp: Requires Node.js bridge (see docs/messaging.md)
+# aiohttp  # For WhatsApp bridge communication
--- a/rl_cli.py
+++ b/rl_cli.py
@@ -0,0 +1,448 @@
+#!/usr/bin/env python3
+"""
+RL Training CLI Runner
+
+Dedicated CLI runner for RL training workflows with:
+- Extended timeouts for long-running training
+- RL-focused system prompts
+- Full toolset including RL training tools
+- Special handling for 30-minute check intervals
+
+Usage:
+    python rl_cli.py "Train a model on GSM8k for math reasoning"
+    python rl_cli.py --interactive
+    python rl_cli.py --list-environments
+
+Environment Variables:
+    TINKER_API_KEY: API key for Tinker service (required)
+    WANDB_API_KEY: API key for WandB metrics (required)
+    OPENROUTER_API_KEY: API key for OpenRouter (required for agent)
+"""
+
+import asyncio
+import os
+import sys
+from pathlib import Path
+
+import fire
+import yaml
+
+# Load environment variables from .env file
+from dotenv import load_dotenv
+
+# Load from ~/.hermes/.env first, then local .env
+hermes_env_path = Path.home() / '.hermes' / '.env'
+local_env_path = Path(__file__).parent / '.env'
+
+if hermes_env_path.exists():
+    load_dotenv(dotenv_path=hermes_env_path)
+    print(f"✅ Loaded environment variables from {hermes_env_path}")
+elif local_env_path.exists():
+    load_dotenv(dotenv_path=local_env_path)
+    print(f"✅ Loaded environment variables from {local_env_path}")
+
+# Set terminal working directory to tinker-atropos submodule
+# This ensures terminal commands run in the right context for RL work
+tinker_atropos_dir = Path(__file__).parent / 'tinker-atropos'
+if tinker_atropos_dir.exists():
+    os.environ['TERMINAL_CWD'] = str(tinker_atropos_dir)
+    os.environ['HERMES_QUIET'] = '1'  # Disable temp subdirectory creation
+    print(f"📂 Terminal working directory: {tinker_atropos_dir}")
+else:
+    # Fall back to hermes-agent directory if submodule not found
+    os.environ['TERMINAL_CWD'] = str(Path(__file__).parent)
+    os.environ['HERMES_QUIET'] = '1'
+    print(f"⚠️  tinker-atropos submodule not found, using: {Path(__file__).parent}")
+
+# Import agent and tools
+from run_agent import AIAgent
+from model_tools import get_tool_definitions, check_toolset_requirements
+from tools.rl_training_tool import check_rl_api_keys, get_missing_keys
+
+
+# ============================================================================
+# Config Loading
+# ============================================================================
+
+DEFAULT_MODEL = "anthropic/claude-opus-4.5"
+DEFAULT_BASE_URL = "https://openrouter.ai/api/v1"
+
+
+def load_hermes_config() -> dict:
+    """
+    Load configuration from ~/.hermes/config.yaml.
+    
+    Returns:
+        dict: Configuration with model, base_url, etc.
+    """
+    config_path = Path.home() / '.hermes' / 'config.yaml'
+    
+    config = {
+        "model": DEFAULT_MODEL,
+        "base_url": DEFAULT_BASE_URL,
+    }
+    
+    if config_path.exists():
+        try:
+            with open(config_path, "r") as f:
+                file_config = yaml.safe_load(f) or {}
+            
+            # Get model from config
+            if "model" in file_config:
+                if isinstance(file_config["model"], str):
+                    config["model"] = file_config["model"]
+                elif isinstance(file_config["model"], dict):
+                    config["model"] = file_config["model"].get("default", DEFAULT_MODEL)
+            
+            # Get base_url if specified
+            if "base_url" in file_config:
+                config["base_url"] = file_config["base_url"]
+                
+        except Exception as e:
+            print(f"⚠️  Warning: Failed to load config.yaml: {e}")
+    
+    return config
+
+
+# ============================================================================
+# RL-Specific Configuration
+# ============================================================================
+
+# Extended timeouts for long-running RL operations
+RL_MAX_ITERATIONS = 200  # Allow many more iterations for long workflows
+
+# RL-focused system prompt
+RL_SYSTEM_PROMPT = """You are an automated post-training engineer specializing in reinforcement learning for language models.
+
+## Your Capabilities
+
+You have access to RL training tools for running reinforcement learning on models through Tinker-Atropos:
+
+1. **DISCOVER**: Use `rl_list_environments` to see available RL environments
+2. **INSPECT**: Read environment files to understand how they work (verifiers, data loading, rewards)
+3. **INSPECT DATA**: Use terminal to explore HuggingFace datasets and understand their format
+4. **CREATE**: Copy existing environments as templates, modify for your needs
+5. **CONFIGURE**: Use `rl_select_environment` and `rl_edit_config` to set up training
+6. **TEST**: Always use `rl_test_inference` before full training to validate your setup
+7. **TRAIN**: Use `rl_start_training` to begin, `rl_check_status` to monitor
+8. **EVALUATE**: Use `rl_get_results` and analyze WandB metrics to assess performance
+
+## Environment Files
+
+Environment files are located in: `tinker-atropos/tinker_atropos/environments/`
+
+Study existing environments to learn patterns. Look for:
+- `load_dataset()` calls - how data is loaded
+- `score_answer()` / `score()` - verification logic
+- `get_next_item()` - prompt formatting
+- `system_prompt` - instruction format
+- `config_init()` - default configuration
+
+## Creating New Environments
+
+To create a new environment:
+1. Read an existing environment file (e.g., gsm8k_tinker.py)
+2. Use terminal to explore the target dataset format
+3. Copy the environment file as a template
+4. Modify the dataset loading, prompt formatting, and verifier logic
+5. Test with `rl_test_inference` before training
+
+## Important Guidelines
+
+- **Always test before training**: Training runs take hours - verify everything works first
+- **Monitor metrics**: Check WandB for reward/mean and percent_correct
+- **Status check intervals**: Wait at least 30 minutes between status checks
+- **Early stopping**: Stop training early if metrics look bad or stagnant
+- **Iterate quickly**: Start with small total_steps to validate, then scale up
+
+## Available Toolsets
+
+You have access to:
+- **RL tools**: Environment discovery, config management, training, testing
+- **Terminal**: Run commands, inspect files, explore datasets
+- **Web**: Search for information, documentation, papers
+- **File tools**: Read and modify code files
+
+When asked to train a model, follow this workflow:
+1. List available environments
+2. Select and configure the appropriate environment
+3. Test with sample prompts
+4. Start training with conservative settings
+5. Monitor progress and adjust as needed
+"""
+
+# Toolsets to enable for RL workflows
+RL_TOOLSETS = ["terminal", "web", "rl"]
+
+
+# ============================================================================
+# Helper Functions
+# ============================================================================
+
+def check_requirements():
+    """Check that all required environment variables and services are available."""
+    errors = []
+    
+    # Check API keys
+    if not os.getenv("OPENROUTER_API_KEY"):
+        errors.append("OPENROUTER_API_KEY not set - required for agent")
+    
+    missing_rl_keys = get_missing_keys()
+    if missing_rl_keys:
+        errors.append(f"Missing RL API keys: {', '.join(missing_rl_keys)}")
+    
+    if errors:
+        print("❌ Missing requirements:")
+        for error in errors:
+            print(f"   - {error}")
+        print("\nPlease set these environment variables in your .env file or shell.")
+        return False
+    
+    return True
+
+
+def check_tinker_atropos():
+    """Check if tinker-atropos submodule is properly set up."""
+    tinker_path = Path(__file__).parent / "tinker-atropos"
+    
+    if not tinker_path.exists():
+        return False, "tinker-atropos submodule not found. Run: git submodule update --init"
+    
+    envs_path = tinker_path / "tinker_atropos" / "environments"
+    if not envs_path.exists():
+        return False, f"environments directory not found at {envs_path}"
+    
+    env_files = list(envs_path.glob("*.py"))
+    env_files = [f for f in env_files if not f.name.startswith("_")]
+    
+    return True, {"path": str(tinker_path), "environments_count": len(env_files)}
+
+
+def list_environments_sync():
+    """List available environments (synchronous wrapper)."""
+    from tools.rl_training_tool import rl_list_environments
+    import json
+    
+    async def _list():
+        result = await rl_list_environments()
+        return json.loads(result)
+    
+    return asyncio.run(_list())
+
+
+# ============================================================================
+# Main CLI
+# ============================================================================
+
+def main(
+    task: str = None,
+    model: str = None,
+    api_key: str = None,
+    base_url: str = None,
+    max_iterations: int = RL_MAX_ITERATIONS,
+    interactive: bool = False,
+    list_environments: bool = False,
+    check_server: bool = False,
+    verbose: bool = False,
+    save_trajectories: bool = True,
+):
+    """
+    RL Training CLI - Dedicated runner for RL training workflows.
+    
+    Args:
+        task: The training task/goal (e.g., "Train a model on GSM8k for math")
+        model: Model to use for the agent (reads from ~/.hermes/config.yaml if not provided)
+        api_key: OpenRouter API key (uses OPENROUTER_API_KEY env var if not provided)
+        base_url: API base URL (reads from config or defaults to OpenRouter)
+        max_iterations: Maximum agent iterations (default: 200 for long workflows)
+        interactive: Run in interactive mode (multiple conversations)
+        list_environments: Just list available RL environments and exit
+        check_server: Check if RL API server is running and exit
+        verbose: Enable verbose logging
+        save_trajectories: Save conversation trajectories (default: True for RL)
+    
+    Examples:
+        # Train on a specific environment
+        python rl_cli.py "Train a model on GSM8k math problems"
+        
+        # Interactive mode
+        python rl_cli.py --interactive
+        
+        # List available environments
+        python rl_cli.py --list-environments
+        
+        # Check server status
+        python rl_cli.py --check-server
+    """
+    # Load config from ~/.hermes/config.yaml
+    config = load_hermes_config()
+    
+    # Use config values if not explicitly provided
+    if model is None:
+        model = config["model"]
+    if base_url is None:
+        base_url = config["base_url"]
+    
+    print("🎯 RL Training Agent")
+    print("=" * 60)
+    
+    # Handle setup check
+    if check_server:
+        print("\n🔍 Checking tinker-atropos setup...")
+        ok, result = check_tinker_atropos()
+        if ok:
+            print("✅ tinker-atropos submodule found")
+            print(f"   Path: {result.get('path')}")
+            print(f"   Environments found: {result.get('environments_count', 0)}")
+            
+            # Also check API keys
+            missing = get_missing_keys()
+            if missing:
+                print(f"\n⚠️  Missing API keys: {', '.join(missing)}")
+                print("   Add them to ~/.hermes/.env")
+            else:
+                print("✅ API keys configured")
+        else:
+            print(f"❌ tinker-atropos not set up: {result}")
+            print("\nTo set up:")
+            print("  git submodule update --init")
+            print("  pip install -e ./tinker-atropos")
+        return
+    
+    # Handle environment listing
+    if list_environments:
+        print("\n📋 Available RL Environments:")
+        print("-" * 40)
+        try:
+            data = list_environments_sync()
+            if "error" in data:
+                print(f"❌ Error: {data['error']}")
+                return
+            
+            envs = data.get("environments", [])
+            if not envs:
+                print("No environments found.")
+                print("\nMake sure tinker-atropos is set up:")
+                print("  git submodule update --init")
+                return
+            
+            for env in envs:
+                print(f"\n  📦 {env['name']}")
+                print(f"     Class: {env['class_name']}")
+                print(f"     Path: {env['file_path']}")
+                if env.get('description'):
+                    desc = env['description'][:100] + "..." if len(env.get('description', '')) > 100 else env.get('description', '')
+                    print(f"     Description: {desc}")
+            
+            print(f"\n📊 Total: {len(envs)} environments")
+            print("\nUse `rl_select_environment(name)` to select an environment for training.")
+        except Exception as e:
+            print(f"❌ Error listing environments: {e}")
+            print("\nMake sure tinker-atropos is set up:")
+            print("  git submodule update --init")
+            print("  pip install -e ./tinker-atropos")
+        return
+    
+    # Check requirements
+    if not check_requirements():
+        sys.exit(1)
+    
+    # Set default task if none provided
+    if not task and not interactive:
+        print("\n⚠️  No task provided. Use --interactive for interactive mode or provide a task.")
+        print("\nExamples:")
+        print('  python rl_cli.py "Train a model on GSM8k math problems"')
+        print('  python rl_cli.py "Create an RL environment for code generation"')
+        print('  python rl_cli.py --interactive')
+        return
+    
+    # Get API key
+    api_key = api_key or os.getenv("OPENROUTER_API_KEY")
+    if not api_key:
+        print("❌ No API key provided. Set OPENROUTER_API_KEY or pass --api-key")
+        sys.exit(1)
+    
+    print(f"\n🤖 Model: {model}")
+    print(f"🔧 Max iterations: {max_iterations}")
+    print(f"📁 Toolsets: {', '.join(RL_TOOLSETS)}")
+    print("=" * 60)
+    
+    # Create agent with RL configuration
+    agent = AIAgent(
+        base_url=base_url,
+        api_key=api_key,
+        model=model,
+        max_iterations=max_iterations,
+        enabled_toolsets=RL_TOOLSETS,
+        save_trajectories=save_trajectories,
+        verbose_logging=verbose,
+        quiet_mode=False,
+        ephemeral_system_prompt=RL_SYSTEM_PROMPT,
+    )
+    
+    if interactive:
+        # Interactive mode - multiple conversations
+        print("\n🔄 Interactive RL Training Mode")
+        print("Type 'quit' or 'exit' to end the session.")
+        print("Type 'status' to check active training runs.")
+        print("-" * 40)
+        
+        while True:
+            try:
+                user_input = input("\n🎯 RL Task> ").strip()
+                
+                if not user_input:
+                    continue
+                
+                if user_input.lower() in ('quit', 'exit', 'q'):
+                    print("\n👋 Goodbye!")
+                    break
+                
+                if user_input.lower() == 'status':
+                    # Quick status check
+                    from tools.rl_training_tool import rl_list_runs
+                    import json
+                    result = asyncio.run(rl_list_runs())
+                    runs = json.loads(result)
+                    if isinstance(runs, list) and runs:
+                        print("\n📊 Active Runs:")
+                        for run in runs:
+                            print(f"  - {run['run_id']}: {run['environment']} ({run['status']})")
+                    else:
+                        print("\nNo active runs.")
+                    continue
+                
+                # Run the agent
+                print("\n" + "=" * 60)
+                response = agent.run_conversation(user_input)
+                print("\n" + "=" * 60)
+                
+            except KeyboardInterrupt:
+                print("\n\n👋 Interrupted. Goodbye!")
+                break
+            except Exception as e:
+                print(f"\n❌ Error: {e}")
+                if verbose:
+                    import traceback
+                    traceback.print_exc()
+    else:
+        # Single task mode
+        print(f"\n📝 Task: {task}")
+        print("-" * 40)
+        
+        try:
+            response = agent.run_conversation(task)
+            print("\n" + "=" * 60)
+            print("✅ Task completed")
+        except KeyboardInterrupt:
+            print("\n\n⚠️ Interrupted by user")
+        except Exception as e:
+            print(f"\n❌ Error: {e}")
+            if verbose:
+                import traceback
+                traceback.print_exc()
+            sys.exit(1)
+
+
+if __name__ == "__main__":
+    fire.Fire(main)
--- a/run_agent.py
+++ b/run_agent.py
@@ -30,6 +30,7 @@ import threading
 import uuid
 from typing import List, Dict, Any, Optional
 from openai import OpenAI
+import fire
 from datetime import datetime
 from pathlib import Path

@@ -50,6 +51,410 @@ from model_tools import get_tool_definitions, handle_function_call, check_toolse
 from tools.terminal_tool import cleanup_vm
 from tools.browser_tool import cleanup_browser

+import requests
+
+# =============================================================================
+# Model Context Management
+# =============================================================================
+
+# Cache for model metadata from OpenRouter
+_model_metadata_cache: Dict[str, Dict[str, Any]] = {}
+_model_metadata_cache_time: float = 0
+_MODEL_CACHE_TTL = 3600  # 1 hour cache TTL
+
+# Default context lengths for common models (fallback if API fails)
+DEFAULT_CONTEXT_LENGTHS = {
+    "anthropic/claude-opus-4": 200000,
+    "anthropic/claude-opus-4.5": 200000,
+    "anthropic/claude-sonnet-4": 200000,
+    "anthropic/claude-sonnet-4-20250514": 200000,
+    "anthropic/claude-haiku-4.5": 200000,
+    "openai/gpt-4o": 128000,
+    "openai/gpt-4-turbo": 128000,
+    "openai/gpt-4o-mini": 128000,
+    "google/gemini-2.0-flash": 1048576,
+    "google/gemini-2.5-pro": 1048576,
+    "meta-llama/llama-3.3-70b-instruct": 131072,
+    "deepseek/deepseek-chat-v3": 65536,
+    "qwen/qwen-2.5-72b-instruct": 32768,
+}
+
+
+def fetch_model_metadata(force_refresh: bool = False) -> Dict[str, Dict[str, Any]]:
+    """
+    Fetch model metadata from OpenRouter's /api/v1/models endpoint.
+    Results are cached for 1 hour to minimize API calls.
+    
+    Returns:
+        Dict mapping model_id to metadata (context_length, max_completion_tokens, etc.)
+    """
+    global _model_metadata_cache, _model_metadata_cache_time
+    
+    # Return cached data if fresh
+    if not force_refresh and _model_metadata_cache and (time.time() - _model_metadata_cache_time) < _MODEL_CACHE_TTL:
+        return _model_metadata_cache
+    
+    try:
+        response = requests.get(
+            "https://openrouter.ai/api/v1/models",
+            timeout=10
+        )
+        response.raise_for_status()
+        data = response.json()
+        
+        # Build cache mapping model_id to relevant metadata
+        cache = {}
+        for model in data.get("data", []):
+            model_id = model.get("id", "")
+            cache[model_id] = {
+                "context_length": model.get("context_length", 128000),
+                "max_completion_tokens": model.get("top_provider", {}).get("max_completion_tokens", 4096),
+                "name": model.get("name", model_id),
+                "pricing": model.get("pricing", {}),
+            }
+            # Also cache by canonical slug if different
+            canonical = model.get("canonical_slug", "")
+            if canonical and canonical != model_id:
+                cache[canonical] = cache[model_id]
+        
+        _model_metadata_cache = cache
+        _model_metadata_cache_time = time.time()
+        
+        if not os.getenv("HERMES_QUIET"):
+            logging.debug(f"Fetched metadata for {len(cache)} models from OpenRouter")
+        
+        return cache
+        
+    except Exception as e:
+        logging.warning(f"Failed to fetch model metadata from OpenRouter: {e}")
+        # Return cached data even if stale, or empty dict
+        return _model_metadata_cache or {}
+
+
+def get_model_context_length(model: str) -> int:
+    """
+    Get the context length for a specific model.
+    
+    Args:
+        model: Model identifier (e.g., "anthropic/claude-sonnet-4")
+        
+    Returns:
+        Context length in tokens (defaults to 128000 if unknown)
+    """
+    # Try to get from OpenRouter API
+    metadata = fetch_model_metadata()
+    if model in metadata:
+        return metadata[model].get("context_length", 128000)
+    
+    # Check default fallbacks (handles partial matches)
+    for default_model, length in DEFAULT_CONTEXT_LENGTHS.items():
+        if default_model in model or model in default_model:
+            return length
+    
+    # Conservative default
+    return 128000
+
+
+def estimate_tokens_rough(text: str) -> int:
+    """
+    Rough token estimate for pre-flight checks (before API call).
+    Uses ~4 chars per token heuristic.
+    
+    For accurate counts, use the `usage.prompt_tokens` from API responses.
+    
+    Args:
+        text: Text to estimate tokens for
+        
+    Returns:
+        Rough estimated token count
+    """
+    if not text:
+        return 0
+    return len(text) // 4
+
+
+def estimate_messages_tokens_rough(messages: List[Dict[str, Any]]) -> int:
+    """
+    Rough token estimate for messages (pre-flight check only).
+    
+    For accurate counts, use the `usage.prompt_tokens` from API responses.
+    
+    Args:
+        messages: List of message dicts
+        
+    Returns:
+        Rough estimated token count
+    """
+    total_chars = sum(len(str(msg)) for msg in messages)
+    return total_chars // 4
+
+
+class ContextCompressor:
+    """
+    Compresses conversation context when approaching model's context limit.
+    
+    Uses similar logic to trajectory_compressor but operates in real-time:
+    1. Protects first few turns (system, initial user, first assistant response)
+    2. Protects last N turns (recent context is most relevant)
+    3. Summarizes middle turns when threshold is reached
+    
+    Token tracking uses actual counts from API responses (usage.prompt_tokens)
+    rather than estimates for accuracy.
+    """
+    
+    def __init__(
+        self,
+        model: str,
+        threshold_percent: float = 0.85,
+        summary_model: str = "google/gemini-2.0-flash-001",
+        protect_first_n: int = 3,
+        protect_last_n: int = 4,
+        summary_target_tokens: int = 500,
+        quiet_mode: bool = False,
+    ):
+        """
+        Initialize the context compressor.
+        
+        Args:
+            model: The main model being used (to determine context limit)
+            threshold_percent: Trigger compression at this % of context (default 85%)
+            summary_model: Model to use for generating summaries (cheap/fast)
+            protect_first_n: Number of initial turns to always keep
+            protect_last_n: Number of recent turns to always keep
+            summary_target_tokens: Target token count for summaries
+            quiet_mode: Suppress compression notifications
+        """
+        self.model = model
+        self.threshold_percent = threshold_percent
+        self.summary_model = summary_model
+        self.protect_first_n = protect_first_n
+        self.protect_last_n = protect_last_n
+        self.summary_target_tokens = summary_target_tokens
+        self.quiet_mode = quiet_mode
+        
+        self.context_length = get_model_context_length(model)
+        self.threshold_tokens = int(self.context_length * threshold_percent)
+        self.compression_count = 0
+        
+        # Track actual token usage from API responses
+        self.last_prompt_tokens = 0
+        self.last_completion_tokens = 0
+        self.last_total_tokens = 0
+        
+        # Initialize OpenRouter client for summarization
+        api_key = os.getenv("OPENROUTER_API_KEY", "")
+        self.client = OpenAI(
+            api_key=api_key,
+            base_url="https://openrouter.ai/api/v1"
+        ) if api_key else None
+    
+    def update_from_response(self, usage: Dict[str, Any]):
+        """
+        Update tracked token usage from API response.
+        
+        Args:
+            usage: The usage dict from response (contains prompt_tokens, completion_tokens, total_tokens)
+        """
+        self.last_prompt_tokens = usage.get("prompt_tokens", 0)
+        self.last_completion_tokens = usage.get("completion_tokens", 0)
+        self.last_total_tokens = usage.get("total_tokens", 0)
+    
+    def should_compress(self, prompt_tokens: int = None) -> bool:
+        """
+        Check if context exceeds the compression threshold.
+        
+        Uses actual token count from API response for accuracy.
+        
+        Args:
+            prompt_tokens: Actual prompt tokens from last API response.
+                          If None, uses last tracked value.
+            
+        Returns:
+            True if compression should be triggered
+        """
+        tokens = prompt_tokens if prompt_tokens is not None else self.last_prompt_tokens
+        return tokens >= self.threshold_tokens
+    
+    def should_compress_preflight(self, messages: List[Dict[str, Any]]) -> bool:
+        """
+        Quick pre-flight check using rough estimate (before API call).
+        
+        Use this to avoid making an API call that would fail due to context overflow.
+        For post-response compression decisions, use should_compress() with actual tokens.
+        
+        Args:
+            messages: Current conversation messages
+            
+        Returns:
+            True if compression is likely needed
+        """
+        rough_estimate = estimate_messages_tokens_rough(messages)
+        return rough_estimate >= self.threshold_tokens
+    
+    def get_status(self) -> Dict[str, Any]:
+        """
+        Get current compression status for display/logging.
+        
+        Returns:
+            Dict with token usage and threshold info
+        """
+        return {
+            "last_prompt_tokens": self.last_prompt_tokens,
+            "threshold_tokens": self.threshold_tokens,
+            "context_length": self.context_length,
+            "usage_percent": (self.last_prompt_tokens / self.context_length * 100) if self.context_length else 0,
+            "compression_count": self.compression_count,
+        }
+    
+    def _generate_summary(self, turns_to_summarize: List[Dict[str, Any]]) -> str:
+        """
+        Generate a concise summary of conversation turns using a fast model.
+        
+        Args:
+            turns_to_summarize: List of message dicts to summarize
+            
+        Returns:
+            Summary string
+        """
+        if not self.client:
+            # Fallback if no API key
+            return "[CONTEXT SUMMARY]: Previous conversation turns have been compressed to save space. The assistant performed various actions and received responses."
+        
+        # Format turns for summarization
+        parts = []
+        for i, msg in enumerate(turns_to_summarize):
+            role = msg.get("role", "unknown")
+            content = msg.get("content", "")
+            
+            # Truncate very long content
+            if len(content) > 2000:
+                content = content[:1000] + "\n...[truncated]...\n" + content[-500:]
+            
+            # Include tool call info if present
+            tool_calls = msg.get("tool_calls", [])
+            if tool_calls:
+                tool_names = [tc.get("function", {}).get("name", "?") for tc in tool_calls if isinstance(tc, dict)]
+                content += f"\n[Tool calls: {', '.join(tool_names)}]"
+            
+            parts.append(f"[{role.upper()}]: {content}")
+        
+        content_to_summarize = "\n\n".join(parts)
+        
+        prompt = f"""Summarize these conversation turns concisely. This summary will replace these turns in the conversation history.
+
+Write from a neutral perspective describing:
+1. What actions were taken (tool calls, searches, file operations)
+2. Key information or results obtained
+3. Important decisions or findings
+4. Relevant data, file names, or outputs
+
+Keep factual and informative. Target ~{self.summary_target_tokens} tokens.
+
+---
+TURNS TO SUMMARIZE:
+{content_to_summarize}
+---
+
+Write only the summary, starting with "[CONTEXT SUMMARY]:" prefix."""
+
+        try:
+            response = self.client.chat.completions.create(
+                model=self.summary_model,
+                messages=[{"role": "user", "content": prompt}],
+                temperature=0.3,
+                max_tokens=self.summary_target_tokens * 2,
+                timeout=30.0,
+            )
+            
+            summary = response.choices[0].message.content.strip()
+            if not summary.startswith("[CONTEXT SUMMARY]:"):
+                summary = "[CONTEXT SUMMARY]: " + summary
+            
+            return summary
+            
+        except Exception as e:
+            logging.warning(f"Failed to generate context summary: {e}")
+            return "[CONTEXT SUMMARY]: Previous conversation turns have been compressed. The assistant performed tool calls and received responses."
+    
+    def compress(self, messages: List[Dict[str, Any]], current_tokens: int = None) -> List[Dict[str, Any]]:
+        """
+        Compress conversation messages by summarizing middle turns.
+        
+        Algorithm:
+        1. Keep first N turns (system prompt, initial context)
+        2. Keep last N turns (recent/relevant context)
+        3. Summarize everything in between
+        4. Insert summary as a user message
+        
+        Args:
+            messages: Current conversation messages
+            current_tokens: Actual token count from API (for logging). If None, uses estimate.
+            
+        Returns:
+            Compressed message list
+        """
+        n_messages = len(messages)
+        
+        # Not enough messages to compress
+        if n_messages <= self.protect_first_n + self.protect_last_n + 1:
+            if not self.quiet_mode:
+                print(f"⚠️  Cannot compress: only {n_messages} messages (need > {self.protect_first_n + self.protect_last_n + 1})")
+            return messages
+        
+        # Determine compression boundaries
+        compress_start = self.protect_first_n
+        compress_end = n_messages - self.protect_last_n
+        
+        # Nothing to compress
+        if compress_start >= compress_end:
+            return messages
+        
+        # Extract turns to summarize
+        turns_to_summarize = messages[compress_start:compress_end]
+        
+        # Use actual token count if provided, otherwise estimate
+        display_tokens = current_tokens if current_tokens else self.last_prompt_tokens or estimate_messages_tokens_rough(messages)
+        
+        if not self.quiet_mode:
+            print(f"\n📦 Context compression triggered ({display_tokens:,} tokens ≥ {self.threshold_tokens:,} threshold)")
+            print(f"   📊 Model context limit: {self.context_length:,} tokens ({self.threshold_percent*100:.0f}% = {self.threshold_tokens:,})")
+            print(f"   🗜️  Summarizing turns {compress_start+1}-{compress_end} ({len(turns_to_summarize)} turns)")
+        
+        # Generate summary
+        summary = self._generate_summary(turns_to_summarize)
+        
+        # Build compressed messages
+        compressed = []
+        
+        # Keep protected head turns
+        for i in range(compress_start):
+            msg = messages[i].copy()
+            # Add notice to system message on first compression
+            if i == 0 and msg.get("role") == "system" and self.compression_count == 0:
+                msg["content"] = msg.get("content", "") + "\n\n[Note: Some earlier conversation turns may be summarized to preserve context space.]"
+            compressed.append(msg)
+        
+        # Add summary as user message
+        compressed.append({
+            "role": "user",
+            "content": summary
+        })
+        
+        # Keep protected tail turns
+        for i in range(compress_end, n_messages):
+            compressed.append(messages[i].copy())
+        
+        self.compression_count += 1
+        
+        if not self.quiet_mode:
+            # Estimate new size (actual will be known after next API call)
+            new_estimate = estimate_messages_tokens_rough(compressed)
+            saved_estimate = display_tokens - new_estimate
+            print(f"   ✅ Compressed: {n_messages} → {len(compressed)} messages (~{saved_estimate:,} tokens saved)")
+            print(f"   💡 Compression #{self.compression_count} complete")
+        
+        return compressed
+

 # =============================================================================
 # Default System Prompt Components
@@ -180,7 +585,7 @@ class AIAgent:
        base_url: str = None,
        api_key: str = None,
        model: str = "anthropic/claude-sonnet-4-20250514",  # OpenRouter format
-        max_iterations: int = 10,
+        max_iterations: int = 60,  # Default tool-calling iterations
        tool_delay: float = 1.0,
        enabled_toolsets: List[str] = None,
        disabled_toolsets: List[str] = None,
@@ -195,6 +600,7 @@ class AIAgent:
        providers_order: List[str] = None,
        provider_sort: str = None,
        session_id: str = None,
+        tool_progress_callback: callable = None,
    ):
        """
        Initialize the AI Agent.
@@ -218,6 +624,7 @@ class AIAgent:
            providers_order (List[str]): OpenRouter providers to try in order (optional)
            provider_sort (str): Sort providers by price/throughput/latency (optional)
            session_id (str): Pre-generated session ID for logging (optional, auto-generated if not provided)
+            tool_progress_callback (callable): Callback function(tool_name, args_preview) for progress notifications
        """
        self.model = model
        self.max_iterations = max_iterations
@@ -229,6 +636,12 @@ class AIAgent:
        self.log_prefix_chars = log_prefix_chars
        self.log_prefix = f"{log_prefix} " if log_prefix else ""
        self.base_url = base_url or ""  # Store for OpenRouter detection
+        self.tool_progress_callback = tool_progress_callback
+        self._last_reported_tool = None  # Track for "new tool" mode
+        
+        # Interrupt mechanism for breaking out of tool loops
+        self._interrupt_requested = False
+        self._interrupt_message = None  # Optional message that triggered interrupt
        
        # Store OpenRouter provider preferences
        self.providers_allowed = providers_allowed
@@ -363,6 +776,30 @@ class AIAgent:
        
        # Track conversation messages for session logging
        self._session_messages: List[Dict[str, Any]] = []
+        
+        # Initialize context compressor for automatic context management
+        # Compresses conversation when approaching model's context limit
+        # Configuration via environment variables (can be set in .env or cli-config.yaml)
+        compression_threshold = float(os.getenv("CONTEXT_COMPRESSION_THRESHOLD", "0.85"))
+        compression_model = os.getenv("CONTEXT_COMPRESSION_MODEL", "google/gemini-2.0-flash-001")
+        compression_enabled = os.getenv("CONTEXT_COMPRESSION_ENABLED", "true").lower() in ("true", "1", "yes")
+        
+        self.context_compressor = ContextCompressor(
+            model=self.model,
+            threshold_percent=compression_threshold,
+            summary_model=compression_model,
+            protect_first_n=3,  # Keep system, first user, first assistant
+            protect_last_n=4,   # Keep recent context
+            summary_target_tokens=500,
+            quiet_mode=self.quiet_mode,
+        )
+        self.compression_enabled = compression_enabled
+        
+        if not self.quiet_mode:
+            if compression_enabled:
+                print(f"📊 Context limit: {self.context_compressor.context_length:,} tokens (compress at {int(compression_threshold*100)}% = {self.context_compressor.threshold_tokens:,})")
+            else:
+                print(f"📊 Context limit: {self.context_compressor.context_length:,} tokens (auto-compression disabled)")
    
    # Pools of kawaii faces for random selection
    KAWAII_SEARCH = [
@@ -551,6 +988,49 @@ class AIAgent:
        # Check if there's any non-whitespace content remaining
        return bool(cleaned.strip())
    
+    def _extract_reasoning(self, assistant_message) -> Optional[str]:
+        """
+        Extract reasoning/thinking content from an assistant message.
+        
+        OpenRouter and various providers can return reasoning in multiple formats:
+        1. message.reasoning - Direct reasoning field (DeepSeek, Qwen, etc.)
+        2. message.reasoning_content - Alternative field (Moonshot AI, Novita, etc.)
+        3. message.reasoning_details - Array of {type, summary, ...} objects (OpenRouter unified)
+        
+        Args:
+            assistant_message: The assistant message object from the API response
+            
+        Returns:
+            Combined reasoning text, or None if no reasoning found
+        """
+        reasoning_parts = []
+        
+        # Check direct reasoning field
+        if hasattr(assistant_message, 'reasoning') and assistant_message.reasoning:
+            reasoning_parts.append(assistant_message.reasoning)
+        
+        # Check reasoning_content field (alternative name used by some providers)
+        if hasattr(assistant_message, 'reasoning_content') and assistant_message.reasoning_content:
+            # Don't duplicate if same as reasoning
+            if assistant_message.reasoning_content not in reasoning_parts:
+                reasoning_parts.append(assistant_message.reasoning_content)
+        
+        # Check reasoning_details array (OpenRouter unified format)
+        # Format: [{"type": "reasoning.summary", "summary": "...", ...}, ...]
+        if hasattr(assistant_message, 'reasoning_details') and assistant_message.reasoning_details:
+            for detail in assistant_message.reasoning_details:
+                if isinstance(detail, dict):
+                    # Extract summary from reasoning detail object
+                    summary = detail.get('summary') or detail.get('content') or detail.get('text')
+                    if summary and summary not in reasoning_parts:
+                        reasoning_parts.append(summary)
+        
+        # Combine all reasoning parts
+        if reasoning_parts:
+            return "\n\n".join(reasoning_parts)
+        
+        return None
+    
    def _get_messages_up_to_last_assistant(self, messages: List[Dict]) -> List[Dict]:
        """
        Get messages up to (but not including) the last assistant turn.
@@ -796,9 +1276,16 @@ class AIAgent:
            return
        
        try:
-            # Convert to trajectory format (reuse existing method)
-            # Use empty string as user_query since it's embedded in messages
-            trajectory = self._convert_to_trajectory_format(messages, "", True)
+            # Extract the first user message for the trajectory format
+            # The first message should be the user's initial query
+            first_user_query = ""
+            for msg in messages:
+                if msg.get("role") == "user":
+                    first_user_query = msg.get("content", "")
+                    break
+            
+            # Convert to trajectory format
+            trajectory = self._convert_to_trajectory_format(messages, first_user_query, True)
            
            # Build the session log entry
            entry = {
@@ -819,6 +1306,42 @@ class AIAgent:
            if self.verbose_logging:
                logging.warning(f"Failed to save session log: {e}")
    
+    def interrupt(self, message: str = None) -> None:
+        """
+        Request the agent to interrupt its current tool-calling loop.
+        
+        Call this from another thread (e.g., input handler, message receiver)
+        to gracefully stop the agent and process a new message.
+        
+        Args:
+            message: Optional new message that triggered the interrupt.
+                     If provided, the agent will include this in its response context.
+        
+        Example (CLI):
+            # In a separate input thread:
+            if user_typed_something:
+                agent.interrupt(user_input)
+        
+        Example (Messaging):
+            # When new message arrives for active session:
+            if session_has_running_agent:
+                running_agent.interrupt(new_message.text)
+        """
+        self._interrupt_requested = True
+        self._interrupt_message = message
+        if not self.quiet_mode:
+            print(f"\n⚡ Interrupt requested" + (f": '{message[:40]}...'" if message and len(message) > 40 else f": '{message}'" if message else ""))
+    
+    def clear_interrupt(self) -> None:
+        """Clear any pending interrupt request."""
+        self._interrupt_requested = False
+        self._interrupt_message = None
+    
+    @property
+    def is_interrupted(self) -> bool:
+        """Check if an interrupt has been requested."""
+        return self._interrupt_requested
+    
    def run_conversation(
        self,
        user_message: str,
@@ -876,8 +1399,19 @@ class AIAgent:
        # Main conversation loop
        api_call_count = 0
        final_response = None
+        interrupted = False
+        
+        # Clear any stale interrupt state at start
+        self.clear_interrupt()
        
        while api_call_count < self.max_iterations:
+            # Check for interrupt request (e.g., user sent new message)
+            if self._interrupt_requested:
+                interrupted = True
+                if not self.quiet_mode:
+                    print(f"\n⚡ Breaking out of tool loop due to interrupt...")
+                break
+            
            api_call_count += 1
            
            # Prepare messages for API call
@@ -889,37 +1423,25 @@ class AIAgent:
            for msg in messages:
                api_msg = msg.copy()
                
-                # For assistant messages with tool_calls, providers require 'reasoning_content' field
-                # Extract reasoning from our stored 'reasoning' field and add it as 'reasoning_content'
-                if msg.get("role") == "assistant" and msg.get("tool_calls"):
+                # For ALL assistant messages, pass reasoning back to the API
+                # This ensures multi-turn reasoning context is preserved
+                if msg.get("role") == "assistant":
                    reasoning_text = msg.get("reasoning")
                    if reasoning_text:
-                        # Add reasoning_content for API compatibility (Moonshot AI, Novita, etc.)
+                        # Add reasoning_content for API compatibility (Moonshot AI, Novita, OpenRouter)
                        api_msg["reasoning_content"] = reasoning_text
                
                # Remove 'reasoning' field - it's for trajectory storage only
-                # The reasoning is already in the content via <think> tags AND
-                # we've added reasoning_content for API compatibility above
+                # We've copied it to 'reasoning_content' for the API above
                if "reasoning" in api_msg:
                    api_msg.pop("reasoning")
-                # Remove 'reasoning_details' if present - we use reasoning_content instead
-                if "reasoning_details" in api_msg:
-                    api_msg.pop("reasoning_details")
+                # Keep 'reasoning_details' - OpenRouter uses this for multi-turn reasoning context
+                # The signature field helps maintain reasoning continuity
                api_messages.append(api_msg)
            
            if active_system_prompt:
                # Insert system message at the beginning
                api_messages = [{"role": "system", "content": active_system_prompt}] + api_messages
-
-            if os.getenv("HERMES_DEBUG_OPENAI_REQUEST") == "1":
-                meta = {
-                    "model": self.model,
-                    "base_url": self.base_url,
-                    "messages": api_messages,
-                    "tools": self.tools if self.tools else None,
-                }
-                print("\n=== HERMES_DEBUG_OPENAI_REQUEST ===", flush=True)
-                print(json.dumps(meta, ensure_ascii=False, indent=2)[:200_000], flush=True)
            
            # Calculate approximate request size for logging
            total_chars = sum(len(str(msg)) for msg in api_messages)
@@ -933,13 +1455,12 @@ class AIAgent:
                print(f"{self.log_prefix}   📊 Request size: {len(api_messages)} messages, ~{approx_tokens:,} tokens (~{total_chars:,} chars)")
                print(f"{self.log_prefix}   🔧 Available tools: {len(self.tools) if self.tools else 0}")
            else:
-                # Animated thinking spinner in quiet mode (disable for wrappers/non-TTY usage)
-                if os.getenv("HERMES_DISABLE_SPINNER") != "1":
-                    face = random.choice(KawaiiSpinner.KAWAII_THINKING)
-                    verb = random.choice(KawaiiSpinner.THINKING_VERBS)
-                    spinner_type = random.choice(['brain', 'sparkle', 'pulse', 'moon', 'star'])
-                    thinking_spinner = KawaiiSpinner(f"{face} {verb}...", spinner_type=spinner_type)
-                    thinking_spinner.start()
+                # Animated thinking spinner in quiet mode
+                face = random.choice(KawaiiSpinner.KAWAII_THINKING)
+                verb = random.choice(KawaiiSpinner.THINKING_VERBS)
+                spinner_type = random.choice(['brain', 'sparkle', 'pulse', 'moon', 'star'])
+                thinking_spinner = KawaiiSpinner(f"{face} {verb}...", spinner_type=spinner_type)
+                thinking_spinner.start()
            
            # Log request details if verbose
            if self.verbose_logging:
@@ -990,14 +1511,6 @@ class AIAgent:
                        api_kwargs["extra_body"] = extra_body
                    
                    response = self.client.chat.completions.create(**api_kwargs)
-
-                    if os.getenv("HERMES_DEBUG_OPENAI_RESPONSE") == "1":
-                        try:
-                            dumped = response.model_dump()
-                        except Exception:
-                            dumped = getattr(response, "__dict__", {"repr": repr(response)})
-                        print("\n=== HERMES_DEBUG_OPENAI_RESPONSE: ChatCompletion (raw) ===", flush=True)
-                        print(json.dumps(dumped, ensure_ascii=False, indent=2), flush=True)
                    
                    api_duration = time.time() - api_start_time
                    
@@ -1123,6 +1636,18 @@ class AIAgent:
                                "error": "First response truncated due to output length limit"
                            }
                    
+                    # Track actual token usage from response for context management
+                    if hasattr(response, 'usage') and response.usage:
+                        usage_dict = {
+                            "prompt_tokens": getattr(response.usage, 'prompt_tokens', 0),
+                            "completion_tokens": getattr(response.usage, 'completion_tokens', 0),
+                            "total_tokens": getattr(response.usage, 'total_tokens', 0),
+                        }
+                        self.context_compressor.update_from_response(usage_dict)
+                        
+                        if self.verbose_logging:
+                            logging.debug(f"Token usage: prompt={usage_dict['prompt_tokens']:,}, completion={usage_dict['completion_tokens']:,}, total={usage_dict['total_tokens']:,}")
+                    
                    break  # Success, exit retry loop

                except Exception as api_error:
@@ -1150,17 +1675,28 @@ class AIAgent:
                    ])
                    
                    if is_context_length_error:
-                        print(f"{self.log_prefix}❌ Context length exceeded - this error cannot be resolved by retrying.")
-                        print(f"{self.log_prefix}   💡 The conversation has accumulated too much content from tool responses.")
-                        logging.error(f"{self.log_prefix}Context length exceeded: {approx_tokens:,} tokens. Cannot continue.")
-                        # Return a partial result instead of crashing
-                        return {
-                            "messages": messages,
-                            "completed": False,
-                            "api_calls": api_call_count,
-                            "error": f"Context length exceeded ({approx_tokens:,} tokens). Conversation terminated early.",
-                            "partial": True
-                        }
+                        print(f"{self.log_prefix}⚠️  Context length exceeded - attempting compression...")
+                        
+                        # Try to compress and retry
+                        original_len = len(messages)
+                        messages = self.context_compressor.compress(messages, current_tokens=approx_tokens)
+                        
+                        if len(messages) < original_len:
+                            # Compression was possible, retry
+                            print(f"{self.log_prefix}   🗜️  Compressed {original_len} → {len(messages)} messages, retrying...")
+                            continue  # Retry with compressed messages
+                        else:
+                            # Can't compress further
+                            print(f"{self.log_prefix}❌ Context length exceeded and cannot compress further.")
+                            print(f"{self.log_prefix}   💡 The conversation has accumulated too much content.")
+                            logging.error(f"{self.log_prefix}Context length exceeded: {approx_tokens:,} tokens. Cannot compress further.")
+                            return {
+                                "messages": messages,
+                                "completed": False,
+                                "api_calls": api_call_count,
+                                "error": f"Context length exceeded ({approx_tokens:,} tokens). Cannot compress further.",
+                                "partial": True
+                            }
                    
                    if retry_count > max_retries:
                        print(f"{self.log_prefix}❌ Max retries ({max_retries}) exceeded. Giving up.")
@@ -1228,10 +1764,16 @@ class AIAgent:
                        self._invalid_tool_retries = 0
                    
                    # Validate tool call arguments are valid JSON
+                    # Handle empty strings as empty objects (common model quirk)
                    invalid_json_args = []
                    for tc in assistant_message.tool_calls:
+                        args = tc.function.arguments
+                        # Treat empty/whitespace strings as empty object
+                        if not args or not args.strip():
+                            tc.function.arguments = "{}"
+                            continue
                        try:
-                            json.loads(tc.function.arguments)
+                            json.loads(args)
                        except json.JSONDecodeError as e:
                            invalid_json_args.append((tc.function.name, str(e)))
                    
@@ -1247,28 +1789,34 @@ class AIAgent:
                            # Don't add anything to messages, just retry the API call
                            continue
                        else:
-                            print(f"{self.log_prefix}❌ Max retries (3) for invalid JSON arguments exceeded. Stopping as partial.")
-                            self._invalid_json_retries = 0  # Reset for next conversation
-                            return {
-                                "final_response": None,
-                                "messages": messages,  # Messages up to last valid point
-                                "api_calls": api_call_count,
-                                "completed": False,
-                                "partial": True,
-                                "error": f"Model generated invalid JSON arguments for tool '{tool_name}': {error_msg}"
-                            }
+                            # Instead of returning partial, inject a helpful message and let model recover
+                            print(f"{self.log_prefix}⚠️  Injecting recovery message for invalid JSON...")
+                            self._invalid_json_retries = 0  # Reset for next attempt
+                            
+                            # Add a user message explaining the issue
+                            recovery_msg = (
+                                f"Your tool call to '{tool_name}' had invalid JSON arguments. "
+                                f"Error: {error_msg}. "
+                                f"For tools with no required parameters, use an empty object: {{}}. "
+                                f"Please either retry the tool call with valid JSON, or respond without using that tool."
+                            )
+                            messages.append({"role": "user", "content": recovery_msg})
+                            # Continue the loop - model will see this message and can recover
+                            continue
                    
                    # Reset retry counter on successful JSON validation
                    self._invalid_json_retries = 0
                    
-                    # Extract reasoning from response if available (for reasoning models like minimax, kimi, etc.)
-                    # Extract reasoning from response for storage
-                    # The reasoning_content field will be added when preparing API messages
-                    reasoning_text = None
-                    if hasattr(assistant_message, 'reasoning') and assistant_message.reasoning:
-                        reasoning_text = assistant_message.reasoning
-                    elif hasattr(assistant_message, 'reasoning_content') and assistant_message.reasoning_content:
-                        reasoning_text = assistant_message.reasoning_content
+                    # Extract reasoning from response if available
+                    # OpenRouter can return reasoning in multiple formats:
+                    # 1. message.reasoning - direct reasoning field
+                    # 2. message.reasoning_content - alternative field (some providers)
+                    # 3. message.reasoning_details - array with {summary: "..."} objects
+                    reasoning_text = self._extract_reasoning(assistant_message)
+                    
+                    if reasoning_text and self.verbose_logging:
+                        preview = reasoning_text[:100] + "..." if len(reasoning_text) > 100 else reasoning_text
+                        logging.debug(f"Captured reasoning ({len(reasoning_text)} chars): {preview}")
                    
                    # Build assistant message with tool calls
                    # Content stays as-is; reasoning is stored separately and will be passed
@@ -1290,6 +1838,14 @@ class AIAgent:
                        ]
                    }
                    
+                    # Store reasoning_details for multi-turn reasoning context (OpenRouter)
+                    if hasattr(assistant_message, 'reasoning_details') and assistant_message.reasoning_details:
+                        assistant_msg["reasoning_details"] = [
+                            {"type": d.get("type"), "text": d.get("text"), "signature": d.get("signature")}
+                            for d in assistant_message.reasoning_details
+                            if isinstance(d, dict)
+                        ]
+                    
                    messages.append(assistant_msg)
                    
                    # Execute each tool call
@@ -1309,11 +1865,24 @@ class AIAgent:
                            args_str = json.dumps(function_args, ensure_ascii=False)
                            args_preview = args_str[:self.log_prefix_chars] + "..." if len(args_str) > self.log_prefix_chars else args_str
                            print(f"  📞 Tool {i}: {function_name}({list(function_args.keys())}) - {args_preview}")
+                        
+                        # Fire progress callback if registered (for messaging platforms)
+                        if self.tool_progress_callback:
+                            try:
+                                # Build preview for terminal commands
+                                if function_name == "terminal":
+                                    cmd = function_args.get("command", "")
+                                    preview = cmd[:50] + "..." if len(cmd) > 50 else cmd
+                                else:
+                                    preview = None
+                                self.tool_progress_callback(function_name, preview)
+                            except Exception as cb_err:
+                                logging.debug(f"Tool progress callback error: {cb_err}")

                        tool_start_time = time.time()

                        # Execute the tool - with animated spinner in quiet mode
-                        if self.quiet_mode and os.getenv("HERMES_DISABLE_SPINNER") != "1":
+                        if self.quiet_mode:
                            # Tool-specific spinner animations
                            tool_spinners = {
                                'web_search': ('arrows', ['🔍', '🌐', '📡', '🔎']),
@@ -1343,9 +1912,6 @@ class AIAgent:
                                tool_duration = time.time() - tool_start_time
                                cute_msg = self._get_cute_tool_message(function_name, function_args, tool_duration)
                                spinner.stop(cute_msg)
-                        elif self.quiet_mode:
-                            function_result = handle_function_call(function_name, function_args, effective_task_id)
-                            tool_duration = time.time() - tool_start_time
                        else:
                            function_result = handle_function_call(function_name, function_args, effective_task_id)
                            tool_duration = time.time() - tool_start_time
@@ -1372,6 +1938,18 @@ class AIAgent:
                        if self.tool_delay > 0 and i < len(assistant_message.tool_calls):
                            time.sleep(self.tool_delay)
                    
+                    # Check if context compression is needed before next API call
+                    # Uses actual token count from last API response
+                    if self.compression_enabled and self.context_compressor.should_compress():
+                        messages = self.context_compressor.compress(
+                            messages, 
+                            current_tokens=self.context_compressor.last_prompt_tokens
+                        )
+                    
+                    # Save session log incrementally (so progress is visible even if interrupted)
+                    self._session_messages = messages
+                    self._save_session_log(messages)
+                    
                    # Continue loop for next response
                    continue
                
@@ -1427,11 +2005,11 @@ class AIAgent:
                        self._empty_content_retries = 0
                    
                    # Extract reasoning from response if available
-                    reasoning_text = None
-                    if hasattr(assistant_message, 'reasoning') and assistant_message.reasoning:
-                        reasoning_text = assistant_message.reasoning
-                    elif hasattr(assistant_message, 'reasoning_content') and assistant_message.reasoning_content:
-                        reasoning_text = assistant_message.reasoning_content
+                    reasoning_text = self._extract_reasoning(assistant_message)
+                    
+                    if reasoning_text and self.verbose_logging:
+                        preview = reasoning_text[:100] + "..." if len(reasoning_text) > 100 else reasoning_text
+                        logging.debug(f"Captured final reasoning ({len(reasoning_text)} chars): {preview}")
                    
                    # Build final assistant message
                    # Content stays as-is; reasoning stored separately for trajectory extraction
@@ -1441,6 +2019,14 @@ class AIAgent:
                        "reasoning": reasoning_text  # Stored for trajectory extraction
                    }
                    
+                    # Store reasoning_details for multi-turn reasoning context (OpenRouter)
+                    if hasattr(assistant_message, 'reasoning_details') and assistant_message.reasoning_details:
+                        final_msg["reasoning_details"] = [
+                            {"type": d.get("type"), "text": d.get("text"), "signature": d.get("signature")}
+                            for d in assistant_message.reasoning_details
+                            if isinstance(d, dict)
+                        ]
+                    
                    messages.append(final_msg)
                    
                    if not self.quiet_mode:
@@ -1465,11 +2051,47 @@ class AIAgent:
                    final_response = f"I apologize, but I encountered repeated errors: {error_msg}"
                    break
        
-        # Handle max iterations reached
-        if api_call_count >= self.max_iterations:
-            print(f"⚠️  Reached maximum iterations ({self.max_iterations}). Stopping to prevent infinite loop.")
-            if final_response is None:
-                final_response = "I've reached the maximum number of iterations. Here's what I found so far."
+        # Handle max iterations reached - ask model to summarize what it found
+        if api_call_count >= self.max_iterations and final_response is None:
+            print(f"⚠️  Reached maximum iterations ({self.max_iterations}). Requesting summary...")
+            
+            # Inject a user message asking for a summary
+            summary_request = (
+                "You've reached the maximum number of tool-calling iterations allowed. "
+                "Please provide a final response summarizing what you've found and accomplished so far, "
+                "without calling any more tools."
+            )
+            messages.append({"role": "user", "content": summary_request})
+            
+            # Make one final API call WITHOUT tools to force a text response
+            try:
+                api_messages = messages.copy()
+                if self.ephemeral_system_prompt:
+                    api_messages = [{"role": "system", "content": self.ephemeral_system_prompt}] + api_messages
+                
+                summary_response = self.client.chat.completions.create(
+                    model=self.model,
+                    messages=api_messages,
+                    # No tools parameter - forces text response
+                    extra_headers=self.extra_headers,
+                    extra_body=self.extra_body,
+                )
+                
+                if summary_response.choices and summary_response.choices[0].message.content:
+                    final_response = summary_response.choices[0].message.content
+                    # Strip think blocks from final response
+                    if "<think>" in final_response:
+                        import re
+                        final_response = re.sub(r'<think>.*?</think>\s*', '', final_response, flags=re.DOTALL).strip()
+                    
+                    # Add to messages for session continuity
+                    messages.append({"role": "assistant", "content": final_response})
+                else:
+                    final_response = "I reached the iteration limit and couldn't generate a summary."
+                    
+            except Exception as e:
+                logging.warning(f"Failed to get summary response: {e}")
+                final_response = f"I reached the maximum iterations ({self.max_iterations}) but couldn't summarize. Error: {str(e)}"
        
        # Determine if conversation completed successfully
        completed = final_response is not None and api_call_count < self.max_iterations
@@ -1494,13 +2116,24 @@ class AIAgent:
        self._session_messages = messages
        self._save_session_log(messages)
        
-        return {
+        # Build result with interrupt info if applicable
+        result = {
            "final_response": final_response,
            "messages": messages,
            "api_calls": api_call_count,
            "completed": completed,
-            "partial": False  # True only when stopped due to invalid tool calls
+            "partial": False,  # True only when stopped due to invalid tool calls
+            "interrupted": interrupted,
        }
+        
+        # Include interrupt message if one triggered the interrupt
+        if interrupted and self._interrupt_message:
+            result["interrupt_message"] = self._interrupt_message
+        
+        # Clear interrupt state after handling
+        self.clear_interrupt()
+        
+        return result
    
    def chat(self, message: str) -> str:
        """
@@ -1732,11 +2365,4 @@ def main(


 if __name__ == "__main__":
-    try:
-        import fire  # type: ignore
-    except ModuleNotFoundError as exc:
-        raise SystemExit(
-            "Missing optional dependency 'fire'. Install hermes-agent with its CLI extras or add `fire` "
-            f"to your environment. Original error: {exc}"
-        ) from exc
    fire.Fire(main)
--- a/scripts/hermes-gateway
+++ b/scripts/hermes-gateway
@@ -0,0 +1,414 @@
+#!/usr/bin/env python3
+"""
+Hermes Gateway - Standalone messaging platform integration.
+
+This is the proper entry point for running the gateway as a service.
+NOT tied to the CLI - runs independently.
+
+Usage:
+    # Run in foreground (for testing)
+    ./scripts/hermes-gateway
+    
+    # Install as systemd service
+    ./scripts/hermes-gateway install
+    
+    # Manage the service
+    ./scripts/hermes-gateway start
+    ./scripts/hermes-gateway stop
+    ./scripts/hermes-gateway restart
+    ./scripts/hermes-gateway status
+    
+    # Uninstall
+    ./scripts/hermes-gateway uninstall
+"""
+
+import argparse
+import asyncio
+import os
+import subprocess
+import sys
+from pathlib import Path
+
+# Add parent directory to path
+SCRIPT_DIR = Path(__file__).parent.resolve()
+PROJECT_DIR = SCRIPT_DIR.parent
+sys.path.insert(0, str(PROJECT_DIR))
+
+# Load .env file
+from dotenv import load_dotenv
+env_path = PROJECT_DIR / '.env'
+if env_path.exists():
+    load_dotenv(dotenv_path=env_path)
+
+
+# =============================================================================
+# Service Configuration
+# =============================================================================
+
+SERVICE_NAME = "hermes-gateway"
+SERVICE_DESCRIPTION = "Hermes Agent Gateway - Messaging Platform Integration"
+
+def get_systemd_unit_path() -> Path:
+    """Get the path for the systemd user service file."""
+    return Path.home() / ".config" / "systemd" / "user" / f"{SERVICE_NAME}.service"
+
+def get_launchd_plist_path() -> Path:
+    """Get the path for the launchd plist file (macOS)."""
+    return Path.home() / "Library" / "LaunchAgents" / f"ai.hermes.gateway.plist"
+
+def get_python_path() -> str:
+    """Get the path to the Python interpreter."""
+    # Prefer the venv if it exists
+    venv_python = PROJECT_DIR / "venv" / "bin" / "python"
+    if venv_python.exists():
+        return str(venv_python)
+    return sys.executable
+
+def get_gateway_script_path() -> str:
+    """Get the path to this script."""
+    return str(Path(__file__).resolve())
+
+
+# =============================================================================
+# Systemd Service (Linux)
+# =============================================================================
+
+def generate_systemd_unit() -> str:
+    """Generate the systemd unit file content."""
+    python_path = get_python_path()
+    script_path = get_gateway_script_path()
+    working_dir = str(PROJECT_DIR)
+    
+    return f"""[Unit]
+Description={SERVICE_DESCRIPTION}
+After=network.target
+
+[Service]
+Type=simple
+ExecStart={python_path} {script_path} run
+WorkingDirectory={working_dir}
+Restart=on-failure
+RestartSec=10
+StandardOutput=journal
+StandardError=journal
+
+# Environment (optional - can also use .env file)
+# Environment="TELEGRAM_BOT_TOKEN=your_token"
+# Environment="DISCORD_BOT_TOKEN=your_token"
+
+[Install]
+WantedBy=default.target
+"""
+
+def install_systemd():
+    """Install the systemd user service."""
+    unit_path = get_systemd_unit_path()
+    unit_path.parent.mkdir(parents=True, exist_ok=True)
+    
+    print(f"Installing systemd service to: {unit_path}")
+    unit_path.write_text(generate_systemd_unit())
+    
+    # Reload systemd
+    subprocess.run(["systemctl", "--user", "daemon-reload"], check=True)
+    
+    # Enable the service (start on boot)
+    subprocess.run(["systemctl", "--user", "enable", SERVICE_NAME], check=True)
+    
+    print(f"✓ Service installed and enabled")
+    print(f"")
+    print(f"To start the service:")
+    print(f"  systemctl --user start {SERVICE_NAME}")
+    print(f"")
+    print(f"To view logs:")
+    print(f"  journalctl --user -u {SERVICE_NAME} -f")
+    print(f"")
+    print(f"To enable lingering (keeps service running after logout):")
+    print(f"  sudo loginctl enable-linger $USER")
+
+def uninstall_systemd():
+    """Uninstall the systemd user service."""
+    unit_path = get_systemd_unit_path()
+    
+    # Stop and disable first
+    subprocess.run(["systemctl", "--user", "stop", SERVICE_NAME], check=False)
+    subprocess.run(["systemctl", "--user", "disable", SERVICE_NAME], check=False)
+    
+    # Remove the unit file
+    if unit_path.exists():
+        unit_path.unlink()
+        print(f"✓ Removed {unit_path}")
+    
+    # Reload systemd
+    subprocess.run(["systemctl", "--user", "daemon-reload"], check=True)
+    print(f"✓ Service uninstalled")
+
+def systemd_status():
+    """Show systemd service status."""
+    subprocess.run(["systemctl", "--user", "status", SERVICE_NAME])
+
+def systemd_start():
+    """Start the systemd service."""
+    subprocess.run(["systemctl", "--user", "start", SERVICE_NAME], check=True)
+    print(f"✓ Service started")
+
+def systemd_stop():
+    """Stop the systemd service."""
+    subprocess.run(["systemctl", "--user", "stop", SERVICE_NAME], check=True)
+    print(f"✓ Service stopped")
+
+def systemd_restart():
+    """Restart the systemd service."""
+    subprocess.run(["systemctl", "--user", "restart", SERVICE_NAME], check=True)
+    print(f"✓ Service restarted")
+
+
+# =============================================================================
+# Launchd Service (macOS)
+# =============================================================================
+
+def generate_launchd_plist() -> str:
+    """Generate the launchd plist file content."""
+    python_path = get_python_path()
+    script_path = get_gateway_script_path()
+    working_dir = str(PROJECT_DIR)
+    log_dir = Path.home() / ".hermes" / "logs"
+    
+    return f"""<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
+<plist version="1.0">
+<dict>
+    <key>Label</key>
+    <string>ai.hermes.gateway</string>
+    
+    <key>ProgramArguments</key>
+    <array>
+        <string>{python_path}</string>
+        <string>{script_path}</string>
+        <string>run</string>
+    </array>
+    
+    <key>WorkingDirectory</key>
+    <string>{working_dir}</string>
+    
+    <key>RunAtLoad</key>
+    <true/>
+    
+    <key>KeepAlive</key>
+    <dict>
+        <key>SuccessfulExit</key>
+        <false/>
+    </dict>
+    
+    <key>StandardOutPath</key>
+    <string>{log_dir}/gateway.log</string>
+    
+    <key>StandardErrorPath</key>
+    <string>{log_dir}/gateway.error.log</string>
+    
+    <key>EnvironmentVariables</key>
+    <dict>
+        <key>PATH</key>
+        <string>/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin</string>
+    </dict>
+</dict>
+</plist>
+"""
+
+def install_launchd():
+    """Install the launchd service (macOS)."""
+    plist_path = get_launchd_plist_path()
+    plist_path.parent.mkdir(parents=True, exist_ok=True)
+    
+    # Ensure log directory exists
+    log_dir = Path.home() / ".hermes" / "logs"
+    log_dir.mkdir(parents=True, exist_ok=True)
+    
+    print(f"Installing launchd service to: {plist_path}")
+    plist_path.write_text(generate_launchd_plist())
+    
+    # Load the service
+    subprocess.run(["launchctl", "load", str(plist_path)], check=True)
+    
+    print(f"✓ Service installed and loaded")
+    print(f"")
+    print(f"To view logs:")
+    print(f"  tail -f ~/.hermes/logs/gateway.log")
+    print(f"")
+    print(f"To manage the service:")
+    print(f"  launchctl start ai.hermes.gateway")
+    print(f"  launchctl stop ai.hermes.gateway")
+
+def uninstall_launchd():
+    """Uninstall the launchd service (macOS)."""
+    plist_path = get_launchd_plist_path()
+    
+    # Unload first
+    subprocess.run(["launchctl", "unload", str(plist_path)], check=False)
+    
+    # Remove the plist file
+    if plist_path.exists():
+        plist_path.unlink()
+        print(f"✓ Removed {plist_path}")
+    
+    print(f"✓ Service uninstalled")
+
+def launchd_status():
+    """Show launchd service status."""
+    subprocess.run(["launchctl", "list", "ai.hermes.gateway"])
+
+def launchd_start():
+    """Start the launchd service."""
+    subprocess.run(["launchctl", "start", "ai.hermes.gateway"], check=True)
+    print(f"✓ Service started")
+
+def launchd_stop():
+    """Stop the launchd service."""
+    subprocess.run(["launchctl", "stop", "ai.hermes.gateway"], check=True)
+    print(f"✓ Service stopped")
+
+def launchd_restart():
+    """Restart the launchd service."""
+    launchd_stop()
+    launchd_start()
+
+
+# =============================================================================
+# Platform Detection
+# =============================================================================
+
+def is_linux() -> bool:
+    return sys.platform.startswith('linux')
+
+def is_macos() -> bool:
+    return sys.platform == 'darwin'
+
+def is_windows() -> bool:
+    return sys.platform == 'win32'
+
+
+# =============================================================================
+# Gateway Runner
+# =============================================================================
+
+def run_gateway():
+    """Run the gateway in foreground."""
+    from gateway.run import start_gateway
+    print("Starting Hermes Gateway...")
+    print("Press Ctrl+C to stop.")
+    print()
+    asyncio.run(start_gateway())
+
+
+# =============================================================================
+# Main CLI
+# =============================================================================
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Hermes Gateway - Messaging Platform Integration",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Examples:
+    # Run in foreground (for testing)
+    ./scripts/hermes-gateway run
+    
+    # Install as system service
+    ./scripts/hermes-gateway install
+    
+    # Manage the service
+    ./scripts/hermes-gateway start
+    ./scripts/hermes-gateway stop
+    ./scripts/hermes-gateway restart
+    ./scripts/hermes-gateway status
+    
+    # Uninstall
+    ./scripts/hermes-gateway uninstall
+
+Configuration:
+    Set environment variables in .env file or system environment:
+    - TELEGRAM_BOT_TOKEN
+    - DISCORD_BOT_TOKEN
+    - WHATSAPP_ENABLED
+    
+    Or create ~/.hermes/gateway.json for advanced configuration.
+"""
+    )
+    
+    parser.add_argument(
+        "command",
+        choices=["run", "install", "uninstall", "start", "stop", "restart", "status"],
+        nargs="?",
+        default="run",
+        help="Command to execute (default: run)"
+    )
+    
+    parser.add_argument(
+        "--verbose", "-v",
+        action="store_true",
+        help="Verbose output"
+    )
+    
+    args = parser.parse_args()
+    
+    # Detect platform and dispatch command
+    if args.command == "run":
+        run_gateway()
+    
+    elif args.command == "install":
+        if is_linux():
+            install_systemd()
+        elif is_macos():
+            install_launchd()
+        else:
+            print("Service installation not supported on this platform.")
+            print("Please run manually: ./scripts/hermes-gateway run")
+            sys.exit(1)
+    
+    elif args.command == "uninstall":
+        if is_linux():
+            uninstall_systemd()
+        elif is_macos():
+            uninstall_launchd()
+        else:
+            print("Service uninstallation not supported on this platform.")
+            sys.exit(1)
+    
+    elif args.command == "start":
+        if is_linux():
+            systemd_start()
+        elif is_macos():
+            launchd_start()
+        else:
+            print("Not supported on this platform.")
+            sys.exit(1)
+    
+    elif args.command == "stop":
+        if is_linux():
+            systemd_stop()
+        elif is_macos():
+            launchd_stop()
+        else:
+            print("Not supported on this platform.")
+            sys.exit(1)
+    
+    elif args.command == "restart":
+        if is_linux():
+            systemd_restart()
+        elif is_macos():
+            launchd_restart()
+        else:
+            print("Not supported on this platform.")
+            sys.exit(1)
+    
+    elif args.command == "status":
+        if is_linux():
+            systemd_status()
+        elif is_macos():
+            launchd_status()
+        else:
+            print("Not supported on this platform.")
+            sys.exit(1)
+
+
+if __name__ == "__main__":
+    main()
--- a/scripts/install.ps1
+++ b/scripts/install.ps1
@@ -0,0 +1,519 @@
+# ============================================================================
+# Hermes Agent Installer for Windows
+# ============================================================================
+# Installation script for Windows (PowerShell).
+#
+# Usage:
+#   irm https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.ps1 | iex
+#
+# Or download and run with options:
+#   .\install.ps1 -NoVenv -SkipSetup
+#
+# ============================================================================
+
+param(
+    [switch]$NoVenv,
+    [switch]$SkipSetup,
+    [string]$Branch = "main",
+    [string]$HermesHome = "$env:USERPROFILE\.hermes",
+    [string]$InstallDir = "$env:USERPROFILE\.hermes\hermes-agent"
+)
+
+$ErrorActionPreference = "Stop"
+
+# ============================================================================
+# Configuration
+# ============================================================================
+
+$RepoUrlSsh = "git@github.com:NousResearch/hermes-agent.git"
+$RepoUrlHttps = "https://github.com/NousResearch/hermes-agent.git"
+
+# ============================================================================
+# Helper functions
+# ============================================================================
+
+function Write-Banner {
+    Write-Host ""
+    Write-Host "┌─────────────────────────────────────────────────────────┐" -ForegroundColor Magenta
+    Write-Host "│             🦋 Hermes Agent Installer                   │" -ForegroundColor Magenta
+    Write-Host "├─────────────────────────────────────────────────────────┤" -ForegroundColor Magenta
+    Write-Host "│  I'm just a butterfly with a lot of tools.             │" -ForegroundColor Magenta
+    Write-Host "└─────────────────────────────────────────────────────────┘" -ForegroundColor Magenta
+    Write-Host ""
+}
+
+function Write-Info {
+    param([string]$Message)
+    Write-Host "→ $Message" -ForegroundColor Cyan
+}
+
+function Write-Success {
+    param([string]$Message)
+    Write-Host "✓ $Message" -ForegroundColor Green
+}
+
+function Write-Warning {
+    param([string]$Message)
+    Write-Host "⚠ $Message" -ForegroundColor Yellow
+}
+
+function Write-Error {
+    param([string]$Message)
+    Write-Host "✗ $Message" -ForegroundColor Red
+}
+
+# ============================================================================
+# Dependency checks
+# ============================================================================
+
+function Test-Python {
+    Write-Info "Checking Python..."
+    
+    # Try different python commands
+    $pythonCmds = @("python3", "python", "py -3")
+    
+    foreach ($cmd in $pythonCmds) {
+        try {
+            $version = & $cmd.Split()[0] $cmd.Split()[1..99] -c "import sys; print(f'{sys.version_info.major}.{sys.version_info.minor}')" 2>$null
+            if ($version) {
+                $major, $minor = $version.Split('.')
+                if ([int]$major -ge 3 -and [int]$minor -ge 10) {
+                    $script:PythonCmd = $cmd
+                    Write-Success "Python $version found"
+                    return $true
+                }
+            }
+        } catch {
+            # Try next command
+        }
+    }
+    
+    Write-Error "Python 3.10+ not found"
+    Write-Info "Please install Python 3.10 or newer from:"
+    Write-Info "  https://www.python.org/downloads/"
+    Write-Info ""
+    Write-Info "Make sure to check 'Add Python to PATH' during installation"
+    return $false
+}
+
+function Test-Git {
+    Write-Info "Checking Git..."
+    
+    if (Get-Command git -ErrorAction SilentlyContinue) {
+        $version = git --version
+        Write-Success "Git found ($version)"
+        return $true
+    }
+    
+    Write-Error "Git not found"
+    Write-Info "Please install Git from:"
+    Write-Info "  https://git-scm.com/download/win"
+    return $false
+}
+
+function Test-Node {
+    Write-Info "Checking Node.js (optional, for browser tools)..."
+    
+    if (Get-Command node -ErrorAction SilentlyContinue) {
+        $version = node --version
+        Write-Success "Node.js $version found"
+        $script:HasNode = $true
+        return $true
+    }
+    
+    Write-Warning "Node.js not found (browser tools will be limited)"
+    Write-Info "To install Node.js (optional):"
+    Write-Info "  https://nodejs.org/en/download/"
+    $script:HasNode = $false
+    return $true  # Don't fail - Node is optional
+}
+
+function Test-Ripgrep {
+    Write-Info "Checking ripgrep (optional, for faster file search)..."
+    
+    if (Get-Command rg -ErrorAction SilentlyContinue) {
+        $version = rg --version | Select-Object -First 1
+        Write-Success "$version found"
+        $script:HasRipgrep = $true
+        return $true
+    }
+    
+    Write-Warning "ripgrep not found (file search will use findstr fallback)"
+    
+    # Check what package managers are available
+    $hasWinget = Get-Command winget -ErrorAction SilentlyContinue
+    $hasChoco = Get-Command choco -ErrorAction SilentlyContinue
+    $hasScoop = Get-Command scoop -ErrorAction SilentlyContinue
+    
+    # Offer to install
+    Write-Host ""
+    $response = Read-Host "Would you like to install ripgrep? (faster search, recommended) [Y/n]"
+    
+    if ($response -eq "" -or $response -match "^[Yy]") {
+        Write-Info "Installing ripgrep..."
+        
+        if ($hasWinget) {
+            try {
+                winget install BurntSushi.ripgrep.MSVC --silent 2>&1 | Out-Null
+                if ($LASTEXITCODE -eq 0) {
+                    Write-Success "ripgrep installed via winget"
+                    $script:HasRipgrep = $true
+                    return $true
+                }
+            } catch { }
+        }
+        
+        if ($hasChoco) {
+            try {
+                choco install ripgrep -y 2>&1 | Out-Null
+                if ($LASTEXITCODE -eq 0) {
+                    Write-Success "ripgrep installed via chocolatey"
+                    $script:HasRipgrep = $true
+                    return $true
+                }
+            } catch { }
+        }
+        
+        if ($hasScoop) {
+            try {
+                scoop install ripgrep 2>&1 | Out-Null
+                if ($LASTEXITCODE -eq 0) {
+                    Write-Success "ripgrep installed via scoop"
+                    $script:HasRipgrep = $true
+                    return $true
+                }
+            } catch { }
+        }
+        
+        Write-Warning "Auto-install failed. You can install manually:"
+    } else {
+        Write-Info "Skipping ripgrep installation. To install manually:"
+    }
+    
+    # Show manual install instructions
+    Write-Info "  winget install BurntSushi.ripgrep.MSVC"
+    Write-Info "  Or: choco install ripgrep"
+    Write-Info "  Or: scoop install ripgrep"
+    Write-Info "  Or download from: https://github.com/BurntSushi/ripgrep/releases"
+    
+    $script:HasRipgrep = $false
+    return $true  # Don't fail - ripgrep is optional
+}
+
+# ============================================================================
+# Installation
+# ============================================================================
+
+function Install-Repository {
+    Write-Info "Installing to $InstallDir..."
+    
+    if (Test-Path $InstallDir) {
+        if (Test-Path "$InstallDir\.git") {
+            Write-Info "Existing installation found, updating..."
+            Push-Location $InstallDir
+            git fetch origin
+            git checkout $Branch
+            git pull origin $Branch
+            Pop-Location
+        } else {
+            Write-Error "Directory exists but is not a git repository: $InstallDir"
+            Write-Info "Remove it or choose a different directory with -InstallDir"
+            exit 1
+        }
+    } else {
+        # Try SSH first (for private repo access), fall back to HTTPS
+        # Use --recurse-submodules to also clone mini-swe-agent and tinker-atropos
+        Write-Info "Trying SSH clone..."
+        $sshResult = git clone --branch $Branch --recurse-submodules $RepoUrlSsh $InstallDir 2>&1
+        
+        if ($LASTEXITCODE -eq 0) {
+            Write-Success "Cloned via SSH"
+        } else {
+            Write-Info "SSH failed, trying HTTPS..."
+            $httpsResult = git clone --branch $Branch --recurse-submodules $RepoUrlHttps $InstallDir 2>&1
+            
+            if ($LASTEXITCODE -eq 0) {
+                Write-Success "Cloned via HTTPS"
+            } else {
+                Write-Error "Failed to clone repository"
+                Write-Info "For private repo access, ensure your SSH key is added to GitHub:"
+                Write-Info "  ssh-add ~/.ssh/id_rsa"
+                Write-Info "  ssh -T git@github.com  # Test connection"
+                exit 1
+            }
+        }
+    }
+    
+    # Ensure submodules are initialized and updated (for existing installs or if --recurse failed)
+    Write-Info "Initializing submodules (mini-swe-agent, tinker-atropos)..."
+    Push-Location $InstallDir
+    git submodule update --init --recursive
+    Pop-Location
+    Write-Success "Submodules ready"
+    
+    Write-Success "Repository ready"
+}
+
+function Install-Venv {
+    if ($NoVenv) {
+        Write-Info "Skipping virtual environment (-NoVenv)"
+        return
+    }
+    
+    Write-Info "Creating virtual environment..."
+    
+    Push-Location $InstallDir
+    
+    if (-not (Test-Path "venv")) {
+        & $PythonCmd -m venv venv
+    }
+    
+    # Activate
+    & .\venv\Scripts\Activate.ps1
+    
+    # Upgrade pip
+    pip install --upgrade pip wheel setuptools | Out-Null
+    
+    Pop-Location
+    
+    Write-Success "Virtual environment ready"
+}
+
+function Install-Dependencies {
+    Write-Info "Installing dependencies..."
+    
+    Push-Location $InstallDir
+    
+    if (-not $NoVenv) {
+        & .\venv\Scripts\Activate.ps1
+    }
+    
+    # Install main package
+    try {
+        pip install -e ".[all]" 2>&1 | Out-Null
+    } catch {
+        pip install -e "." | Out-Null
+    }
+    
+    Write-Success "Main package installed"
+    
+    # Install submodules
+    Write-Info "Installing mini-swe-agent (terminal tool backend)..."
+    if (Test-Path "mini-swe-agent\pyproject.toml") {
+        try {
+            pip install -e ".\mini-swe-agent" 2>&1 | Out-Null
+            Write-Success "mini-swe-agent installed"
+        } catch {
+            Write-Warning "mini-swe-agent install failed (terminal tools may not work)"
+        }
+    } else {
+        Write-Warning "mini-swe-agent not found (run: git submodule update --init)"
+    }
+    
+    Write-Info "Installing tinker-atropos (RL training backend)..."
+    if (Test-Path "tinker-atropos\pyproject.toml") {
+        try {
+            pip install -e ".\tinker-atropos" 2>&1 | Out-Null
+            Write-Success "tinker-atropos installed"
+        } catch {
+            Write-Warning "tinker-atropos install failed (RL tools may not work)"
+        }
+    } else {
+        Write-Warning "tinker-atropos not found (run: git submodule update --init)"
+    }
+    
+    Pop-Location
+    
+    Write-Success "All dependencies installed"
+}
+
+function Set-PathVariable {
+    Write-Info "Setting up PATH..."
+    
+    if ($NoVenv) {
+        $binDir = "$InstallDir"
+    } else {
+        $binDir = "$InstallDir\venv\Scripts"
+    }
+    
+    # Add to user PATH
+    $currentPath = [Environment]::GetEnvironmentVariable("Path", "User")
+    
+    if ($currentPath -notlike "*$binDir*") {
+        [Environment]::SetEnvironmentVariable(
+            "Path",
+            "$binDir;$currentPath",
+            "User"
+        )
+        Write-Success "Added to user PATH"
+    } else {
+        Write-Info "PATH already configured"
+    }
+    
+    # Update current session
+    $env:Path = "$binDir;$env:Path"
+}
+
+function Copy-ConfigTemplates {
+    Write-Info "Setting up configuration files..."
+    
+    # Create ~/.hermes directory structure (config at top level, code in subdir)
+    New-Item -ItemType Directory -Force -Path "$HermesHome\cron" | Out-Null
+    New-Item -ItemType Directory -Force -Path "$HermesHome\sessions" | Out-Null
+    New-Item -ItemType Directory -Force -Path "$HermesHome\logs" | Out-Null
+    
+    # Create .env at ~/.hermes/.env (top level, easy to find)
+    $envPath = "$HermesHome\.env"
+    if (-not (Test-Path $envPath)) {
+        $examplePath = "$InstallDir\.env.example"
+        if (Test-Path $examplePath) {
+            Copy-Item $examplePath $envPath
+            Write-Success "Created ~/.hermes/.env from template"
+        } else {
+            # Create empty .env if no example exists
+            New-Item -ItemType File -Force -Path $envPath | Out-Null
+            Write-Success "Created ~/.hermes/.env"
+        }
+    } else {
+        Write-Info "~/.hermes/.env already exists, keeping it"
+    }
+    
+    # Create config.yaml at ~/.hermes/config.yaml (top level, easy to find)
+    $configPath = "$HermesHome\config.yaml"
+    if (-not (Test-Path $configPath)) {
+        $examplePath = "$InstallDir\cli-config.yaml.example"
+        if (Test-Path $examplePath) {
+            Copy-Item $examplePath $configPath
+            Write-Success "Created ~/.hermes/config.yaml from template"
+        }
+    } else {
+        Write-Info "~/.hermes/config.yaml already exists, keeping it"
+    }
+    
+    Write-Success "Configuration directory ready: ~/.hermes/"
+}
+
+function Install-NodeDeps {
+    if (-not $HasNode) {
+        Write-Info "Skipping Node.js dependencies (Node not installed)"
+        return
+    }
+    
+    Push-Location $InstallDir
+    
+    if (Test-Path "package.json") {
+        Write-Info "Installing Node.js dependencies..."
+        try {
+            npm install --silent 2>&1 | Out-Null
+            Write-Success "Node.js dependencies installed"
+        } catch {
+            Write-Warning "npm install failed (browser tools may not work)"
+        }
+    }
+    
+    Pop-Location
+}
+
+function Invoke-SetupWizard {
+    if ($SkipSetup) {
+        Write-Info "Skipping setup wizard (-SkipSetup)"
+        return
+    }
+    
+    Write-Host ""
+    Write-Info "Starting setup wizard..."
+    Write-Host ""
+    
+    Push-Location $InstallDir
+    
+    if (-not $NoVenv) {
+        & .\venv\Scripts\Activate.ps1
+    }
+    
+    python -m hermes_cli.main setup
+    
+    Pop-Location
+}
+
+function Write-Completion {
+    Write-Host ""
+    Write-Host "┌─────────────────────────────────────────────────────────┐" -ForegroundColor Green
+    Write-Host "│              ✓ Installation Complete!                   │" -ForegroundColor Green
+    Write-Host "└─────────────────────────────────────────────────────────┘" -ForegroundColor Green
+    Write-Host ""
+    
+    # Show file locations
+    Write-Host "📁 Your files (all in ~/.hermes/):" -ForegroundColor Cyan
+    Write-Host ""
+    Write-Host "   Config:    " -NoNewline -ForegroundColor Yellow
+    Write-Host "$HermesHome\config.yaml"
+    Write-Host "   API Keys:  " -NoNewline -ForegroundColor Yellow
+    Write-Host "$HermesHome\.env"
+    Write-Host "   Data:      " -NoNewline -ForegroundColor Yellow
+    Write-Host "$HermesHome\cron\, sessions\, logs\"
+    Write-Host "   Code:      " -NoNewline -ForegroundColor Yellow
+    Write-Host "$HermesHome\hermes-agent\"
+    Write-Host ""
+    
+    Write-Host "─────────────────────────────────────────────────────────" -ForegroundColor Cyan
+    Write-Host ""
+    Write-Host "🚀 Commands:" -ForegroundColor Cyan
+    Write-Host ""
+    Write-Host "   hermes              " -NoNewline -ForegroundColor Green
+    Write-Host "Start chatting"
+    Write-Host "   hermes setup        " -NoNewline -ForegroundColor Green
+    Write-Host "Configure API keys & settings"
+    Write-Host "   hermes config       " -NoNewline -ForegroundColor Green
+    Write-Host "View/edit configuration"
+    Write-Host "   hermes config edit  " -NoNewline -ForegroundColor Green
+    Write-Host "Open config in editor"
+    Write-Host "   hermes gateway      " -NoNewline -ForegroundColor Green
+    Write-Host "Run messaging gateway"
+    Write-Host "   hermes update       " -NoNewline -ForegroundColor Green
+    Write-Host "Update to latest version"
+    Write-Host ""
+    
+    Write-Host "─────────────────────────────────────────────────────────" -ForegroundColor Cyan
+    Write-Host ""
+    Write-Host "⚡ Restart your terminal for PATH changes to take effect" -ForegroundColor Yellow
+    Write-Host ""
+    
+    # Show notes about optional tools
+    if (-not $HasNode) {
+        Write-Host "Note: Node.js was not found. Browser automation tools" -ForegroundColor Yellow
+        Write-Host "will have limited functionality." -ForegroundColor Yellow
+        Write-Host ""
+    }
+    
+    if (-not $HasRipgrep) {
+        Write-Host "Note: ripgrep (rg) was not found. File search will use" -ForegroundColor Yellow
+        Write-Host "findstr as a fallback. For faster search:" -ForegroundColor Yellow
+        Write-Host "  winget install BurntSushi.ripgrep.MSVC" -ForegroundColor Yellow
+        Write-Host ""
+    }
+}
+
+# ============================================================================
+# Main
+# ============================================================================
+
+function Main {
+    Write-Banner
+    
+    if (-not (Test-Python)) { exit 1 }
+    if (-not (Test-Git)) { exit 1 }
+    Test-Node      # Optional, doesn't fail
+    Test-Ripgrep   # Optional, doesn't fail
+    
+    Install-Repository
+    Install-Venv
+    Install-Dependencies
+    Install-NodeDeps
+    Set-PathVariable
+    Copy-ConfigTemplates
+    Invoke-SetupWizard
+    
+    Write-Completion
+}
+
+Main
--- a/scripts/install.sh
+++ b/scripts/install.sh
@@ -0,0 +1,692 @@
+#!/bin/bash
+# ============================================================================
+# Hermes Agent Installer
+# ============================================================================
+# Installation script for Linux and macOS.
+#
+# Usage:
+#   curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
+#
+# Or with options:
+#   curl -fsSL ... | bash -s -- --no-venv --skip-setup
+#
+# ============================================================================
+
+set -e
+
+# Colors
+RED='\033[0;31m'
+GREEN='\033[0;32m'
+YELLOW='\033[0;33m'
+BLUE='\033[0;34m'
+MAGENTA='\033[0;35m'
+CYAN='\033[0;36m'
+NC='\033[0m' # No Color
+BOLD='\033[1m'
+
+# Configuration
+REPO_URL_SSH="git@github.com:NousResearch/hermes-agent.git"
+REPO_URL_HTTPS="https://github.com/NousResearch/hermes-agent.git"
+HERMES_HOME="$HOME/.hermes"
+INSTALL_DIR="${HERMES_INSTALL_DIR:-$HERMES_HOME/hermes-agent}"
+PYTHON_MIN_VERSION="3.10"
+
+# Options
+USE_VENV=true
+RUN_SETUP=true
+BRANCH="main"
+
+# Parse arguments
+while [[ $# -gt 0 ]]; do
+    case $1 in
+        --no-venv)
+            USE_VENV=false
+            shift
+            ;;
+        --skip-setup)
+            RUN_SETUP=false
+            shift
+            ;;
+        --branch)
+            BRANCH="$2"
+            shift 2
+            ;;
+        --dir)
+            INSTALL_DIR="$2"
+            shift 2
+            ;;
+        -h|--help)
+            echo "Hermes Agent Installer"
+            echo ""
+            echo "Usage: install.sh [OPTIONS]"
+            echo ""
+            echo "Options:"
+            echo "  --no-venv      Don't create virtual environment"
+            echo "  --skip-setup   Skip interactive setup wizard"
+            echo "  --branch NAME  Git branch to install (default: main)"
+            echo "  --dir PATH     Installation directory (default: ~/.hermes-agent)"
+            echo "  -h, --help     Show this help"
+            exit 0
+            ;;
+        *)
+            echo "Unknown option: $1"
+            exit 1
+            ;;
+    esac
+done
+
+# ============================================================================
+# Helper functions
+# ============================================================================
+
+print_banner() {
+    echo ""
+    echo -e "${MAGENTA}${BOLD}"
+    echo "┌─────────────────────────────────────────────────────────┐"
+    echo "│             🦋 Hermes Agent Installer                   │"
+    echo "├─────────────────────────────────────────────────────────┤"
+    echo "│  I'm just a butterfly with a lot of tools.             │"
+    echo "└─────────────────────────────────────────────────────────┘"
+    echo -e "${NC}"
+}
+
+log_info() {
+    echo -e "${CYAN}→${NC} $1"
+}
+
+log_success() {
+    echo -e "${GREEN}✓${NC} $1"
+}
+
+log_warn() {
+    echo -e "${YELLOW}⚠${NC} $1"
+}
+
+log_error() {
+    echo -e "${RED}✗${NC} $1"
+}
+
+# ============================================================================
+# System detection
+# ============================================================================
+
+detect_os() {
+    case "$(uname -s)" in
+        Linux*)
+            OS="linux"
+            if [ -f /etc/os-release ]; then
+                . /etc/os-release
+                DISTRO="$ID"
+            else
+                DISTRO="unknown"
+            fi
+            ;;
+        Darwin*)
+            OS="macos"
+            DISTRO="macos"
+            ;;
+        CYGWIN*|MINGW*|MSYS*)
+            OS="windows"
+            DISTRO="windows"
+            log_error "Windows detected. Please use the PowerShell installer:"
+            log_info "  irm https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.ps1 | iex"
+            exit 1
+            ;;
+        *)
+            OS="unknown"
+            DISTRO="unknown"
+            log_warn "Unknown operating system"
+            ;;
+    esac
+    
+    log_success "Detected: $OS ($DISTRO)"
+}
+
+# ============================================================================
+# Dependency checks
+# ============================================================================
+
+check_python() {
+    log_info "Checking Python..."
+    
+    # Try different python commands
+    for cmd in python3.12 python3.11 python3.10 python3 python; do
+        if command -v $cmd &> /dev/null; then
+            PYTHON_CMD=$cmd
+            PYTHON_VERSION=$($cmd -c 'import sys; print(f"{sys.version_info.major}.{sys.version_info.minor}")')
+            
+            # Check version
+            if python3 -c "import sys; exit(0 if sys.version_info >= (3, 10) else 1)" 2>/dev/null; then
+                log_success "Python $PYTHON_VERSION found"
+                return 0
+            fi
+        fi
+    done
+    
+    log_error "Python 3.10+ not found"
+    log_info "Please install Python 3.10 or newer:"
+    
+    case "$OS" in
+        linux)
+            case "$DISTRO" in
+                ubuntu|debian)
+                    log_info "  sudo apt update && sudo apt install python3.11 python3.11-venv"
+                    ;;
+                fedora)
+                    log_info "  sudo dnf install python3.11"
+                    ;;
+                arch)
+                    log_info "  sudo pacman -S python"
+                    ;;
+                *)
+                    log_info "  Use your package manager to install Python 3.10+"
+                    ;;
+            esac
+            ;;
+        macos)
+            log_info "  brew install python@3.11"
+            log_info "  Or download from https://www.python.org/downloads/"
+            ;;
+    esac
+    
+    exit 1
+}
+
+check_git() {
+    log_info "Checking Git..."
+    
+    if command -v git &> /dev/null; then
+        GIT_VERSION=$(git --version | awk '{print $3}')
+        log_success "Git $GIT_VERSION found"
+        return 0
+    fi
+    
+    log_error "Git not found"
+    log_info "Please install Git:"
+    
+    case "$OS" in
+        linux)
+            case "$DISTRO" in
+                ubuntu|debian)
+                    log_info "  sudo apt update && sudo apt install git"
+                    ;;
+                fedora)
+                    log_info "  sudo dnf install git"
+                    ;;
+                arch)
+                    log_info "  sudo pacman -S git"
+                    ;;
+                *)
+                    log_info "  Use your package manager to install git"
+                    ;;
+            esac
+            ;;
+        macos)
+            log_info "  xcode-select --install"
+            log_info "  Or: brew install git"
+            ;;
+    esac
+    
+    exit 1
+}
+
+check_node() {
+    log_info "Checking Node.js (optional, for browser tools)..."
+    
+    if command -v node &> /dev/null; then
+        NODE_VERSION=$(node --version)
+        log_success "Node.js $NODE_VERSION found"
+        HAS_NODE=true
+        return 0
+    fi
+    
+    log_warn "Node.js not found (browser tools will be limited)"
+    log_info "To install Node.js (optional):"
+    
+    case "$OS" in
+        linux)
+            case "$DISTRO" in
+                ubuntu|debian)
+                    log_info "  curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -"
+                    log_info "  sudo apt install -y nodejs"
+                    ;;
+                fedora)
+                    log_info "  sudo dnf install nodejs"
+                    ;;
+                arch)
+                    log_info "  sudo pacman -S nodejs npm"
+                    ;;
+                *)
+                    log_info "  https://nodejs.org/en/download/"
+                    ;;
+            esac
+            ;;
+        macos)
+            log_info "  brew install node"
+            log_info "  Or: https://nodejs.org/en/download/"
+            ;;
+    esac
+    
+    HAS_NODE=false
+    # Don't exit - Node is optional
+}
+
+check_ripgrep() {
+    log_info "Checking ripgrep (optional, for faster file search)..."
+    
+    if command -v rg &> /dev/null; then
+        RG_VERSION=$(rg --version | head -1)
+        log_success "$RG_VERSION found"
+        HAS_RIPGREP=true
+        return 0
+    fi
+    
+    log_warn "ripgrep not found (file search will use grep fallback)"
+    
+    # Offer to install
+    echo ""
+    read -p "Would you like to install ripgrep? (faster search, recommended) [Y/n] " -n 1 -r
+    echo
+    
+    if [[ $REPLY =~ ^[Yy]$ ]] || [[ -z $REPLY ]]; then
+        log_info "Installing ripgrep..."
+        
+        # Check if we can use sudo
+        CAN_SUDO=false
+        if command -v sudo &> /dev/null; then
+            # Check if user has sudo access (without actually running sudo)
+            if sudo -n true 2>/dev/null || sudo -v 2>/dev/null; then
+                CAN_SUDO=true
+            fi
+        fi
+        
+        case "$OS" in
+            linux)
+                if [ "$CAN_SUDO" = true ]; then
+                    case "$DISTRO" in
+                        ubuntu|debian)
+                            if sudo apt install -y ripgrep 2>/dev/null; then
+                                log_success "ripgrep installed"
+                                HAS_RIPGREP=true
+                                return 0
+                            fi
+                            ;;
+                        fedora)
+                            if sudo dnf install -y ripgrep 2>/dev/null; then
+                                log_success "ripgrep installed"
+                                HAS_RIPGREP=true
+                                return 0
+                            fi
+                            ;;
+                        arch)
+                            if sudo pacman -S --noconfirm ripgrep 2>/dev/null; then
+                                log_success "ripgrep installed"
+                                HAS_RIPGREP=true
+                                return 0
+                            fi
+                            ;;
+                    esac
+                else
+                    log_warn "sudo not available - cannot auto-install system packages"
+                    # Try cargo as fallback if available
+                    if command -v cargo &> /dev/null; then
+                        log_info "Trying cargo install (no sudo required)..."
+                        if cargo install ripgrep 2>/dev/null; then
+                            log_success "ripgrep installed via cargo"
+                            HAS_RIPGREP=true
+                            return 0
+                        fi
+                    fi
+                fi
+                ;;
+            macos)
+                if command -v brew &> /dev/null; then
+                    if brew install ripgrep 2>/dev/null; then
+                        log_success "ripgrep installed"
+                        HAS_RIPGREP=true
+                        return 0
+                    fi
+                fi
+                ;;
+        esac
+        log_warn "Auto-install failed. You can install manually later:"
+    else
+        log_info "Skipping ripgrep installation. To install manually:"
+    fi
+    
+    # Show manual install instructions
+    case "$OS" in
+        linux)
+            case "$DISTRO" in
+                ubuntu|debian)
+                    log_info "  sudo apt install ripgrep"
+                    ;;
+                fedora)
+                    log_info "  sudo dnf install ripgrep"
+                    ;;
+                arch)
+                    log_info "  sudo pacman -S ripgrep"
+                    ;;
+                *)
+                    log_info "  https://github.com/BurntSushi/ripgrep#installation"
+                    ;;
+            esac
+            # Show cargo alternative for users without sudo
+            if command -v cargo &> /dev/null; then
+                log_info "  Or without sudo: cargo install ripgrep"
+            fi
+            ;;
+        macos)
+            log_info "  brew install ripgrep"
+            ;;
+    esac
+    
+    HAS_RIPGREP=false
+    # Don't exit - ripgrep is optional (grep fallback exists)
+}
+
+# ============================================================================
+# Installation
+# ============================================================================
+
+clone_repo() {
+    log_info "Installing to $INSTALL_DIR..."
+    
+    if [ -d "$INSTALL_DIR" ]; then
+        if [ -d "$INSTALL_DIR/.git" ]; then
+            log_info "Existing installation found, updating..."
+            cd "$INSTALL_DIR"
+            git fetch origin
+            git checkout "$BRANCH"
+            git pull origin "$BRANCH"
+        else
+            log_error "Directory exists but is not a git repository: $INSTALL_DIR"
+            log_info "Remove it or choose a different directory with --dir"
+            exit 1
+        fi
+    else
+        # Try SSH first (for private repo access), fall back to HTTPS
+        # Use --recurse-submodules to also clone mini-swe-agent and tinker-atropos
+        log_info "Trying SSH clone..."
+        if git clone --branch "$BRANCH" --recurse-submodules "$REPO_URL_SSH" "$INSTALL_DIR" 2>/dev/null; then
+            log_success "Cloned via SSH"
+        else
+            log_info "SSH failed, trying HTTPS..."
+            if git clone --branch "$BRANCH" --recurse-submodules "$REPO_URL_HTTPS" "$INSTALL_DIR"; then
+                log_success "Cloned via HTTPS"
+            else
+                log_error "Failed to clone repository"
+                log_info "For private repo access, ensure your SSH key is added to GitHub:"
+                log_info "  ssh-add ~/.ssh/id_rsa"
+                log_info "  ssh -T git@github.com  # Test connection"
+                exit 1
+            fi
+        fi
+    fi
+    
+    cd "$INSTALL_DIR"
+    
+    # Ensure submodules are initialized and updated (for existing installs or if --recurse failed)
+    log_info "Initializing submodules (mini-swe-agent, tinker-atropos)..."
+    git submodule update --init --recursive
+    log_success "Submodules ready"
+    
+    log_success "Repository ready"
+}
+
+setup_venv() {
+    if [ "$USE_VENV" = false ]; then
+        log_info "Skipping virtual environment (--no-venv)"
+        return 0
+    fi
+    
+    log_info "Creating virtual environment..."
+    
+    if [ -d "venv" ]; then
+        log_info "Virtual environment already exists"
+    else
+        $PYTHON_CMD -m venv venv
+    fi
+    
+    # Activate
+    source venv/bin/activate
+    
+    # Upgrade pip
+    pip install --upgrade pip wheel setuptools > /dev/null
+    
+    log_success "Virtual environment ready"
+}
+
+install_deps() {
+    log_info "Installing dependencies..."
+    
+    if [ "$USE_VENV" = true ]; then
+        source venv/bin/activate
+    fi
+    
+    # Install the main package in editable mode with all extras
+    pip install -e ".[all]" > /dev/null 2>&1 || pip install -e "." > /dev/null
+    
+    log_success "Main package installed"
+    
+    # Install submodules
+    log_info "Installing mini-swe-agent (terminal tool backend)..."
+    if [ -d "mini-swe-agent" ] && [ -f "mini-swe-agent/pyproject.toml" ]; then
+        pip install -e "./mini-swe-agent" > /dev/null 2>&1 || log_warn "mini-swe-agent install failed (terminal tools may not work)"
+        log_success "mini-swe-agent installed"
+    else
+        log_warn "mini-swe-agent not found (run: git submodule update --init)"
+    fi
+    
+    log_info "Installing tinker-atropos (RL training backend)..."
+    if [ -d "tinker-atropos" ] && [ -f "tinker-atropos/pyproject.toml" ]; then
+        pip install -e "./tinker-atropos" > /dev/null 2>&1 || log_warn "tinker-atropos install failed (RL tools may not work)"
+        log_success "tinker-atropos installed"
+    else
+        log_warn "tinker-atropos not found (run: git submodule update --init)"
+    fi
+    
+    log_success "All dependencies installed"
+}
+
+setup_path() {
+    log_info "Setting up PATH..."
+    
+    # Determine the bin directory
+    if [ "$USE_VENV" = true ]; then
+        BIN_DIR="$INSTALL_DIR/venv/bin"
+    else
+        BIN_DIR="$HOME/.local/bin"
+        mkdir -p "$BIN_DIR"
+        
+        # Create a wrapper script
+        cat > "$BIN_DIR/hermes" << EOF
+#!/bin/bash
+cd "$INSTALL_DIR"
+exec python -m hermes_cli.main "\$@"
+EOF
+        chmod +x "$BIN_DIR/hermes"
+    fi
+    
+    # Add to PATH in shell config
+    SHELL_CONFIG=""
+    if [ -n "$BASH_VERSION" ]; then
+        if [ -f "$HOME/.bashrc" ]; then
+            SHELL_CONFIG="$HOME/.bashrc"
+        elif [ -f "$HOME/.bash_profile" ]; then
+            SHELL_CONFIG="$HOME/.bash_profile"
+        fi
+    elif [ -n "$ZSH_VERSION" ] || [ -f "$HOME/.zshrc" ]; then
+        SHELL_CONFIG="$HOME/.zshrc"
+    fi
+    
+    PATH_LINE="export PATH=\"$BIN_DIR:\$PATH\""
+    
+    if [ -n "$SHELL_CONFIG" ]; then
+        if ! grep -q "hermes-agent" "$SHELL_CONFIG" 2>/dev/null; then
+            echo "" >> "$SHELL_CONFIG"
+            echo "# Hermes Agent" >> "$SHELL_CONFIG"
+            echo "$PATH_LINE" >> "$SHELL_CONFIG"
+            log_success "Added to $SHELL_CONFIG"
+        else
+            log_info "PATH already configured in $SHELL_CONFIG"
+        fi
+    fi
+    
+    # Also export for current session
+    export PATH="$BIN_DIR:$PATH"
+    
+    log_success "PATH configured"
+}
+
+copy_config_templates() {
+    log_info "Setting up configuration files..."
+    
+    # Create ~/.hermes directory structure (config at top level, code in subdir)
+    mkdir -p "$HERMES_HOME/cron"
+    mkdir -p "$HERMES_HOME/sessions"
+    mkdir -p "$HERMES_HOME/logs"
+    
+    # Create .env at ~/.hermes/.env (top level, easy to find)
+    if [ ! -f "$HERMES_HOME/.env" ]; then
+        if [ -f "$INSTALL_DIR/.env.example" ]; then
+            cp "$INSTALL_DIR/.env.example" "$HERMES_HOME/.env"
+            log_success "Created ~/.hermes/.env from template"
+        else
+            # Create empty .env if no example exists
+            touch "$HERMES_HOME/.env"
+            log_success "Created ~/.hermes/.env"
+        fi
+    else
+        log_info "~/.hermes/.env already exists, keeping it"
+    fi
+    
+    # Create config.yaml at ~/.hermes/config.yaml (top level, easy to find)
+    if [ ! -f "$HERMES_HOME/config.yaml" ]; then
+        if [ -f "$INSTALL_DIR/cli-config.yaml.example" ]; then
+            cp "$INSTALL_DIR/cli-config.yaml.example" "$HERMES_HOME/config.yaml"
+            log_success "Created ~/.hermes/config.yaml from template"
+        fi
+    else
+        log_info "~/.hermes/config.yaml already exists, keeping it"
+    fi
+    
+    log_success "Configuration directory ready: ~/.hermes/"
+}
+
+install_node_deps() {
+    if [ "$HAS_NODE" = false ]; then
+        log_info "Skipping Node.js dependencies (Node not installed)"
+        return 0
+    fi
+    
+    if [ -f "$INSTALL_DIR/package.json" ]; then
+        log_info "Installing Node.js dependencies..."
+        cd "$INSTALL_DIR"
+        npm install --silent 2>/dev/null || {
+            log_warn "npm install failed (browser tools may not work)"
+            return 0
+        }
+        log_success "Node.js dependencies installed"
+    fi
+}
+
+run_setup_wizard() {
+    if [ "$RUN_SETUP" = false ]; then
+        log_info "Skipping setup wizard (--skip-setup)"
+        return 0
+    fi
+    
+    echo ""
+    log_info "Starting setup wizard..."
+    echo ""
+    
+    if [ "$USE_VENV" = true ]; then
+        source "$INSTALL_DIR/venv/bin/activate"
+    fi
+    
+    cd "$INSTALL_DIR"
+    python -m hermes_cli.main setup
+}
+
+print_success() {
+    echo ""
+    echo -e "${GREEN}${BOLD}"
+    echo "┌─────────────────────────────────────────────────────────┐"
+    echo "│              ✓ Installation Complete!                   │"
+    echo "└─────────────────────────────────────────────────────────┘"
+    echo -e "${NC}"
+    echo ""
+    
+    # Show file locations
+    echo -e "${CYAN}${BOLD}📁 Your files (all in ~/.hermes/):${NC}"
+    echo ""
+    echo -e "   ${YELLOW}Config:${NC}    ~/.hermes/config.yaml"
+    echo -e "   ${YELLOW}API Keys:${NC}  ~/.hermes/.env"
+    echo -e "   ${YELLOW}Data:${NC}      ~/.hermes/cron/, sessions/, logs/"
+    echo -e "   ${YELLOW}Code:${NC}      ~/.hermes/hermes-agent/"
+    echo ""
+    
+    echo -e "${CYAN}─────────────────────────────────────────────────────────${NC}"
+    echo ""
+    echo -e "${CYAN}${BOLD}🚀 Commands:${NC}"
+    echo ""
+    echo -e "   ${GREEN}hermes${NC}              Start chatting"
+    echo -e "   ${GREEN}hermes setup${NC}        Configure API keys & settings"
+    echo -e "   ${GREEN}hermes config${NC}       View/edit configuration"
+    echo -e "   ${GREEN}hermes config edit${NC}  Open config in editor"
+    echo -e "   ${GREEN}hermes gateway${NC}      Run messaging gateway"
+    echo -e "   ${GREEN}hermes update${NC}       Update to latest version"
+    echo ""
+    
+    echo -e "${CYAN}─────────────────────────────────────────────────────────${NC}"
+    echo ""
+    echo -e "${YELLOW}⚡ Reload your shell to use 'hermes' command:${NC}"
+    echo ""
+    echo "   source ~/.bashrc   # or ~/.zshrc"
+    echo ""
+    
+    # Show Node.js warning if not installed
+    if [ "$HAS_NODE" = false ]; then
+        echo -e "${YELLOW}"
+        echo "Note: Node.js was not found. Browser automation tools"
+        echo "will have limited functionality. Install Node.js later"
+        echo "if you need full browser support."
+        echo -e "${NC}"
+    fi
+    
+    # Show ripgrep note if not installed
+    if [ "$HAS_RIPGREP" = false ]; then
+        echo -e "${YELLOW}"
+        echo "Note: ripgrep (rg) was not found. File search will use"
+        echo "grep as a fallback. For faster search in large codebases,"
+        echo "install ripgrep: sudo apt install ripgrep (or brew install ripgrep)"
+        echo -e "${NC}"
+    fi
+}
+
+# ============================================================================
+# Main
+# ============================================================================
+
+main() {
+    print_banner
+    
+    detect_os
+    check_python
+    check_git
+    check_node
+    check_ripgrep
+    
+    clone_repo
+    setup_venv
+    install_deps
+    install_node_deps
+    setup_path
+    copy_config_templates
+    run_setup_wizard
+    
+    print_success
+}
+
+main
--- a/scripts/launch_llama_cpp_glm47_flash.sh
+++ b/scripts/launch_llama_cpp_glm47_flash.sh
@@ -1,62 +0,0 @@
-#!/usr/bin/env bash
-set -euo pipefail
-
-# Launch a local llama.cpp OpenAI-compatible server running GLM-4.7-Flash (GGUF).
-#
-# Requires:
-# - `llama-server` installed (e.g. `brew install llama.cpp`)
-#
-# Default settings are chosen to avoid clashing with Atropos sandbox_server
-# (which commonly uses port 8080 in local dev).
-#
-# Usage:
-#   Hermes-Agent/scripts/launch_llama_cpp_glm47_flash.sh
-#
-# Override defaults:
-#   LLAMA_CPP_HOST=127.0.0.1 LLAMA_CPP_PORT=8082 \
-#   LLAMA_CPP_HF_REPO=ggml-org/GLM-4.7-Flash-GGUF \
-#   LLAMA_CPP_HF_FILE=GLM-4.7-Flash-Q4_K.gguf \
-#   Hermes-Agent/scripts/launch_llama_cpp_glm47_flash.sh
-
-HOST="${LLAMA_CPP_HOST:-127.0.0.1}"
-PORT="${LLAMA_CPP_PORT:-8080}"
-HF_REPO="${LLAMA_CPP_HF_REPO:-ggml-org/GLM-4.7-Flash-GGUF}"
-HF_FILE="${LLAMA_CPP_HF_FILE:-GLM-4.7-Flash-Q4_K.gguf}"
-ALIAS="${LLAMA_CPP_ALIAS:-glm-4.7-flash}"
-
-if ! command -v llama-server >/dev/null 2>&1; then
-  echo "Error: llama-server not found in PATH."
-  echo "Install via Homebrew: brew install llama.cpp"
-  exit 1
-fi
-
-echo "Launching llama.cpp server..."
-echo "  host:  $HOST"
-echo "  port:  $PORT"
-echo "  repo:  $HF_REPO"
-echo "  file:  $HF_FILE"
-echo "  alias: $ALIAS"
-echo
-echo "Suggested env vars for Hermes/Atropos integration:"
-echo "  export ATROPOS_SERVER_BASE_URL=http://${HOST}:${PORT}"
-echo "  export ATROPOS_SERVER_MODEL=${ALIAS}"
-echo "  export ATROPOS_SERVER_API_KEY=local"
-echo
-
-if command -v lsof >/dev/null 2>&1; then
-  if lsof -nP -iTCP:"$PORT" -sTCP:LISTEN >/dev/null 2>&1; then
-    echo "Error: port $PORT is already in use."
-    echo "Pick a different port, e.g.:"
-    echo "  LLAMA_CPP_PORT=8082 Hermes-Agent/scripts/launch_llama_cpp_glm47_flash.sh"
-    exit 1
-  fi
-fi
-
-exec llama-server \
-  --host "$HOST" \
-  --port "$PORT" \
-  --hf-repo "$HF_REPO" \
-  --hf-file "$HF_FILE" \
-  --alias "$ALIAS" \
-  -c 32768 \
-  -n -1
--- a/scripts/launch_llama_cpp_hermes_4_36b.sh
+++ b/scripts/launch_llama_cpp_hermes_4_36b.sh
@@ -1,70 +0,0 @@
-#!/usr/bin/env bash
-set -euo pipefail
-
-# Launch a local llama.cpp OpenAI-compatible server running Hermes 4.3 36B (GGUF).
-#
-# Requires:
-# - `llama-server` installed (e.g. `brew install llama.cpp`)
-#
-# Note: Port choice can conflict with other local dev servers. If 8080 is already
-# in use, override via `LLAMA_CPP_PORT=...`.
-#
-# Usage:
-#   Hermes-Agent/scripts/launch_llama_cpp_hermes_4_36b.sh
-#
-# Override defaults:
-#   LLAMA_CPP_HOST=127.0.0.1 LLAMA_CPP_PORT=8082 \
-#   LLAMA_CPP_HF_REPO=NousResearch/Hermes-4.3-36B-GGUF \
-#   LLAMA_CPP_HF_FILE=hermes-4_3_36b-Q4_K_M.gguf \
-#   LLAMA_CPP_ALIAS=hermes-4-36b \
-#   LLAMA_CPP_PARALLEL=4 LLAMA_CPP_THREADS_HTTP=4 \
-#   Hermes-Agent/scripts/launch_llama_cpp_hermes_4_36b.sh
-
-HOST="${LLAMA_CPP_HOST:-127.0.0.1}"
-PORT="${LLAMA_CPP_PORT:-8080}"
-HF_REPO="${LLAMA_CPP_HF_REPO:-NousResearch/Hermes-4.3-36B-GGUF}"
-HF_FILE="${LLAMA_CPP_HF_FILE:-hermes-4_3_36b-Q4_K_M.gguf}"
-ALIAS="${LLAMA_CPP_ALIAS:-hermes-4-36b}"
-PARALLEL="${LLAMA_CPP_PARALLEL:-4}"
-THREADS_HTTP="${LLAMA_CPP_THREADS_HTTP:-4}"
-
-if ! command -v llama-server >/dev/null 2>&1; then
-  echo "Error: llama-server not found in PATH."
-  echo "Install via Homebrew: brew install llama.cpp"
-  exit 1
-fi
-
-echo "Launching llama.cpp server..."
-echo "  host:  $HOST"
-echo "  port:  $PORT"
-echo "  repo:  $HF_REPO"
-echo "  file:  $HF_FILE"
-echo "  alias: $ALIAS"
-echo "  slots: $PARALLEL"
-echo
-echo "Suggested env vars for Hermes/Atropos integration:"
-echo "  export ATROPOS_SERVER_BASE_URL=http://${HOST}:${PORT}"
-echo "  export ATROPOS_SERVER_MODEL=${ALIAS}"
-echo "  export ATROPOS_TOKENIZER_NAME=NousResearch/Hermes-4.3-36B"
-echo "  export ATROPOS_SERVER_API_KEY=local"
-echo
-
-if command -v lsof >/dev/null 2>&1; then
-  if lsof -nP -iTCP:"$PORT" -sTCP:LISTEN >/dev/null 2>&1; then
-    echo "Error: port $PORT is already in use."
-    echo "Pick a different port, e.g.:"
-    echo "  LLAMA_CPP_PORT=8082 Hermes-Agent/scripts/launch_llama_cpp_hermes_4_36b.sh"
-    exit 1
-  fi
-fi
-
-exec llama-server \
-  --host "$HOST" \
-  --port "$PORT" \
-  --hf-repo "$HF_REPO" \
-  --hf-file "$HF_FILE" \
-  --alias "$ALIAS" \
-  --parallel "$PARALLEL" \
-  --threads-http "$THREADS_HTTP" \
-  -c 32768 \
-  -n -1
--- a/setup-hermes.sh
+++ b/setup-hermes.sh
@@ -1,149 +1,203 @@
 #!/bin/bash
-
+# ============================================================================
 # Hermes Agent Setup Script
-# Automated setup for all dependencies and configuration
+# ============================================================================
+# Quick setup for developers who cloned the repo manually.
+#
+# Usage:
+#   ./setup-hermes.sh
+#
+# This script:
+# 1. Creates a virtual environment (if not exists)
+# 2. Installs dependencies
+# 3. Creates .env from template (if not exists)
+# 4. Installs the 'hermes' CLI command
+# 5. Runs the setup wizard (optional)
+# ============================================================================

 set -e

-echo "========================================="
-echo "Hermes Agent Setup"
-echo "========================================="
+# Colors
+GREEN='\033[0;32m'
+YELLOW='\033[0;33m'
+CYAN='\033[0;36m'
+NC='\033[0m'
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+cd "$SCRIPT_DIR"
+
+echo ""
+echo -e "${CYAN}🦋 Hermes Agent Setup${NC}"
 echo ""

-# Change to hermes-agent directory
-cd /home/teknium/hermes-agent
+# ============================================================================
+# Python check
+# ============================================================================

-# Check Python version
-echo "[1/10] Checking Python version..."
-python_version=$(python3 --version | cut -d' ' -f2 | cut -d'.' -f1,2)
-echo "✓ Python $python_version detected"
-echo ""
+echo -e "${CYAN}→${NC} Checking Python..."

-# Install uv
-echo "[2/10] Installing uv (fast Python package installer)..."
-if ! command -v uv &> /dev/null; then
-    echo "Installing uv..."
-    curl -LsSf https://astral.sh/uv/install.sh | sh
-    export PATH="$HOME/.cargo/bin:$PATH"
-    echo "✓ uv installed"
-else
-    echo "✓ uv already installed: $(uv --version)"
+PYTHON_CMD=""
+for cmd in python3.12 python3.11 python3.10 python3 python; do
+    if command -v $cmd &> /dev/null; then
+        if $cmd -c "import sys; exit(0 if sys.version_info >= (3, 10) else 1)" 2>/dev/null; then
+            PYTHON_CMD=$cmd
+            break
+        fi
+    fi
+done
+
+if [ -z "$PYTHON_CMD" ]; then
+    echo -e "${YELLOW}✗${NC} Python 3.10+ required"
+    exit 1
 fi
-echo ""

-# Install Node.js 20 using NodeSource
-echo "[3/10] Installing Node.js 20..."
-if ! command -v node &> /dev/null || [[ $(node --version | cut -d'v' -f2 | cut -d'.' -f1) -lt 20 ]]; then
-    echo "Installing Node.js 20 LTS..."
-    curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
-    sudo apt-get install -y nodejs
-    echo "✓ Node.js installed"
+PYTHON_VERSION=$($PYTHON_CMD -c 'import sys; print(f"{sys.version_info.major}.{sys.version_info.minor}")')
+echo -e "${GREEN}✓${NC} Python $PYTHON_VERSION found"
+
+# ============================================================================
+# Virtual environment
+# ============================================================================
+
+echo -e "${CYAN}→${NC} Setting up virtual environment..."
+
+if [ ! -d "venv" ]; then
+    $PYTHON_CMD -m venv venv
+    echo -e "${GREEN}✓${NC} Created venv"
 else
-    echo "✓ Node.js 20+ already installed: $(node --version)"
+    echo -e "${GREEN}✓${NC} venv exists"
 fi
-echo ""

-# Initialize git submodules
-echo "[4/10] Initializing git submodules..."
-git submodule update --init --recursive
-echo "✓ Submodules initialized"
-echo ""
-
-# Create Python virtual environment with uv
-echo "[5/10] Creating Python virtual environment with uv..."
-if [ -d "venv" ]; then
-    echo "Virtual environment already exists, skipping..."
-else
-    uv venv venv
-    echo "✓ Virtual environment created with uv"
-fi
-echo ""
-
-# Activate virtual environment and install Python packages with uv
-echo "[6/10] Installing Python dependencies with uv..."
 source venv/bin/activate
-uv pip install -r requirements.txt
-echo "✓ Python packages installed"
-echo ""
+pip install --upgrade pip wheel setuptools > /dev/null

-# Install mini-swe-agent with uv
-echo "[7/10] Installing mini-swe-agent..."
-uv pip install -e ./mini-swe-agent
-echo "✓ mini-swe-agent installed"
-echo ""
+# ============================================================================
+# Dependencies
+# ============================================================================

-# Install Node.js dependencies
-echo "[8/10] Installing Node.js dependencies..."
-npm install
-echo "✓ Node.js packages installed"
-echo ""
+echo -e "${CYAN}→${NC} Installing dependencies..."

-# Set up environment file
-echo "[9/10] Setting up environment configuration..."
-if [ -f ".env" ]; then
-    echo ".env file already exists, creating backup..."
-    cp .env .env.backup.$(date +%Y%m%d_%H%M%S)
-fi
-cp .env.example .env
-echo "✓ .env file created from .env.example"
-echo ""
+pip install -e ".[all]" > /dev/null 2>&1 || pip install -e "." > /dev/null

-# Set up CLI config
-echo "[10/10] Setting up CLI configuration..."
-if [ ! -f "cli-config.yaml" ]; then
-    cp cli-config.yaml.example cli-config.yaml
-    echo "✓ cli-config.yaml created from example"
+echo -e "${GREEN}✓${NC} Dependencies installed"
+
+# ============================================================================
+# Optional: ripgrep (for faster file search)
+# ============================================================================
+
+echo -e "${CYAN}→${NC} Checking ripgrep (optional, for faster search)..."
+
+if command -v rg &> /dev/null; then
+    echo -e "${GREEN}✓${NC} ripgrep found"
 else
-    echo "cli-config.yaml already exists, skipping..."
+    echo -e "${YELLOW}⚠${NC} ripgrep not found (file search will use grep fallback)"
+    read -p "Install ripgrep for faster search? [Y/n] " -n 1 -r
+    echo
+    if [[ $REPLY =~ ^[Yy]$ ]] || [[ -z $REPLY ]]; then
+        INSTALLED=false
+        
+        # Check if sudo is available
+        if command -v sudo &> /dev/null && sudo -n true 2>/dev/null; then
+            if command -v apt &> /dev/null; then
+                sudo apt install -y ripgrep && INSTALLED=true
+            elif command -v dnf &> /dev/null; then
+                sudo dnf install -y ripgrep && INSTALLED=true
+            fi
+        fi
+        
+        # Try brew (no sudo needed)
+        if [ "$INSTALLED" = false ] && command -v brew &> /dev/null; then
+            brew install ripgrep && INSTALLED=true
+        fi
+        
+        # Try cargo (no sudo needed)
+        if [ "$INSTALLED" = false ] && command -v cargo &> /dev/null; then
+            echo -e "${CYAN}→${NC} Trying cargo install (no sudo required)..."
+            cargo install ripgrep && INSTALLED=true
+        fi
+        
+        if [ "$INSTALLED" = true ]; then
+            echo -e "${GREEN}✓${NC} ripgrep installed"
+        else
+            echo -e "${YELLOW}⚠${NC} Auto-install failed. Install options:"
+            echo "    sudo apt install ripgrep     # Debian/Ubuntu"
+            echo "    brew install ripgrep         # macOS"
+            echo "    cargo install ripgrep        # With Rust (no sudo)"
+            echo "    https://github.com/BurntSushi/ripgrep#installation"
+        fi
+    fi
 fi
+
+# ============================================================================
+# Environment file
+# ============================================================================
+
+if [ ! -f ".env" ]; then
+    if [ -f ".env.example" ]; then
+        cp .env.example .env
+        echo -e "${GREEN}✓${NC} Created .env from template"
+    fi
+else
+    echo -e "${GREEN}✓${NC} .env exists"
+fi
+
+# ============================================================================
+# PATH setup
+# ============================================================================
+
+echo -e "${CYAN}→${NC} Setting up hermes command..."
+
+BIN_DIR="$SCRIPT_DIR/venv/bin"
+
+# Add to shell config if not already there
+SHELL_CONFIG=""
+if [ -f "$HOME/.zshrc" ]; then
+    SHELL_CONFIG="$HOME/.zshrc"
+elif [ -f "$HOME/.bashrc" ]; then
+    SHELL_CONFIG="$HOME/.bashrc"
+elif [ -f "$HOME/.bash_profile" ]; then
+    SHELL_CONFIG="$HOME/.bash_profile"
+fi
+
+if [ -n "$SHELL_CONFIG" ]; then
+    if ! grep -q "hermes-agent" "$SHELL_CONFIG" 2>/dev/null; then
+        echo "" >> "$SHELL_CONFIG"
+        echo "# Hermes Agent" >> "$SHELL_CONFIG"
+        echo "export PATH=\"$BIN_DIR:\$PATH\"" >> "$SHELL_CONFIG"
+        echo -e "${GREEN}✓${NC} Added to $SHELL_CONFIG"
+    else
+        echo -e "${GREEN}✓${NC} PATH already in $SHELL_CONFIG"
+    fi
+fi
+
+# ============================================================================
+# Done
+# ============================================================================
+
+echo ""
+echo -e "${GREEN}✓ Setup complete!${NC}"
+echo ""
+echo "Next steps:"
+echo ""
+echo "  1. Reload your shell:"
+echo "     source $SHELL_CONFIG"
+echo ""
+echo "  2. Run the setup wizard to configure API keys:"
+echo "     hermes setup"
+echo ""
+echo "  3. Start chatting:"
+echo "     hermes"
+echo ""
+echo "Other commands:"
+echo "  hermes status        # Check configuration"
+echo "  hermes gateway       # Start messaging gateway"
+echo "  hermes cron daemon   # Run cron daemon"
+echo "  hermes doctor        # Diagnose issues"
 echo ""

-# Show Node.js and Python versions
-echo "========================================="
-echo "Setup Complete!"
-echo "========================================="
-echo ""
-echo "Installed versions:"
-echo "  Node.js: $(node --version)"
-echo "  npm: $(npm --version)"
-echo "  Python: $(python3 --version)"
-echo "  uv: $(uv --version)"
-echo ""
-
-echo "========================================="
-echo "Next Steps:"
-echo "========================================="
-echo ""
-echo "1. Configure API Keys in .env file:"
-echo "   nano .env"
-echo ""
-echo "   Required API keys:"
-echo "   - OPENROUTER_API_KEY (https://openrouter.ai/keys)"
-echo "   - FIRECRAWL_API_KEY (https://firecrawl.dev/)"
-echo "   - NOUS_API_KEY (https://inference-api.nousresearch.com/)"
-echo "   - FAL_KEY (https://fal.ai/)"
-echo ""
-echo "   Optional API keys:"
-echo "   - BROWSERBASE_API_KEY (https://browserbase.com/)"
-echo "   - BROWSERBASE_PROJECT_ID"
-echo ""
-echo "2. Activate the virtual environment:"
-echo "   source venv/bin/activate"
-echo ""
-echo "3. Run the CLI:"
-echo "   ./hermes"
-echo ""
-echo "4. Or run a single query:"
-echo "   python run_agent.py --query \"your question here\""
-echo ""
-echo "5. List available tools:"
-echo "   python run_agent.py --list_tools"
-echo ""
-echo "========================================="
-echo "Configuration Files:"
-echo "========================================="
-echo "  .env - API keys and environment variables"
-echo "  cli-config.yaml - CLI settings and preferences"
-echo ""
-echo "For more information, see README.md"
-echo ""
+# Ask if they want to run setup wizard now
+read -p "Would you like to run the setup wizard now? [Y/n] " -n 1 -r
+echo
+if [[ $REPLY =~ ^[Yy]$ ]] || [[ -z $REPLY ]]; then
+    echo ""
+    python -m hermes_cli.main setup
+fi
--- a/tests/test_data/checkpoint_test_dataset.jsonl
+++ b/tests/test_data/checkpoint_test_dataset.jsonl
@@ -1,15 +0,0 @@
-{"prompt": "Test prompt 0: What is 2+2? Just answer briefly.", "test_id": 0}
-{"prompt": "Test prompt 1: What is 2+2? Just answer briefly.", "test_id": 1}
-{"prompt": "Test prompt 2: What is 2+2? Just answer briefly.", "test_id": 2}
-{"prompt": "Test prompt 3: What is 2+2? Just answer briefly.", "test_id": 3}
-{"prompt": "Test prompt 4: What is 2+2? Just answer briefly.", "test_id": 4}
-{"prompt": "Test prompt 5: What is 2+2? Just answer briefly.", "test_id": 5}
-{"prompt": "Test prompt 6: What is 2+2? Just answer briefly.", "test_id": 6}
-{"prompt": "Test prompt 7: What is 2+2? Just answer briefly.", "test_id": 7}
-{"prompt": "Test prompt 8: What is 2+2? Just answer briefly.", "test_id": 8}
-{"prompt": "Test prompt 9: What is 2+2? Just answer briefly.", "test_id": 9}
-{"prompt": "Test prompt 10: What is 2+2? Just answer briefly.", "test_id": 10}
-{"prompt": "Test prompt 11: What is 2+2? Just answer briefly.", "test_id": 11}
-{"prompt": "Test prompt 12: What is 2+2? Just answer briefly.", "test_id": 12}
-{"prompt": "Test prompt 13: What is 2+2? Just answer briefly.", "test_id": 13}
-{"prompt": "Test prompt 14: What is 2+2? Just answer briefly.", "test_id": 14}
--- a/Show More
+++ b/Show More
				`@@ -1,2 +0,0 @@`
				`"""Terminal helpers for stateful sandbox interactions."""`