fix: skip tests when atroposlib/minisweagent unavailable in CI

- test_agent_loop_tool_calling.py: import atroposlib at module level to trigger skip (environments.agent_loop is now importable without atroposlib due to __init__.py graceful fallback) - test_modal_sandbox_fixes.py: skip TestToolResolution tests when minisweagent not installed
fix: guard all atroposlib imports for CI without atropos installed
2026-03-09 23:37:32 -05:00 · 2026-03-09 23:33:24 -05:00 · 2026-03-09 23:14:53 -05:00 · 2026-03-09 23:02:13 -05:00 · 2026-03-09 21:32:23 -05:00 · 2026-03-09 21:21:49 -05:00
109 changed files with 4311 additions and 12140 deletions
@@ -53,6 +53,10 @@ MINIMAX_CN_API_KEY=
 # Get at: https://firecrawl.dev/
 FIRECRAWL_API_KEY=

+# Nous Research API Key - Vision analysis and multi-model reasoning
+# Get at: https://inference-api.nousresearch.com/
+NOUS_API_KEY=
+
 # FAL.ai API Key - Image generation
 # Get at: https://fal.ai/
 FAL_KEY=
@@ -49,3 +49,4 @@ cli-config.yaml
 skills/.hub/
 ignored/
 .worktrees/
+environments/benchmarks/evals/
@@ -1,60 +1,79 @@
 # Hermes Agent - Development Guide

-Instructions for AI coding assistants and developers working on the hermes-agent codebase.
+Instructions for AI coding assistants (GitHub Copilot, Cursor, etc.) and human developers.
+
+Hermes Agent is an AI agent harness with tool-calling capabilities, interactive CLI, messaging integrations, and scheduled tasks.

 ## Development Environment

+**IMPORTANT**: Always use the virtual environment if it exists:
 ```bash
-source .venv/bin/activate  # ALWAYS activate before running Python
+source venv/bin/activate  # Before running any Python commands
 ```

 ## Project Structure

 ```
 hermes-agent/
-├── run_agent.py          # AIAgent class — core conversation loop
-├── model_tools.py        # Tool orchestration, _discover_tools(), handle_function_call()
-├── toolsets.py           # Toolset definitions, _HERMES_CORE_TOOLS list
-├── cli.py                # HermesCLI class — interactive CLI orchestrator
-├── hermes_state.py       # SessionDB — SQLite session store (FTS5 search)
-├── agent/                # Agent internals
-│   ├── prompt_builder.py     # System prompt assembly
+├── agent/                # Agent internals (extracted from run_agent.py)
+│   ├── model_metadata.py     # Model context lengths, token estimation
 │   ├── context_compressor.py # Auto context compression
 │   ├── prompt_caching.py     # Anthropic prompt caching
-│   ├── auxiliary_client.py   # Auxiliary LLM client (vision, summarization)
-│   ├── model_metadata.py     # Model context lengths, token estimation
+│   ├── prompt_builder.py     # System prompt assembly (identity, skills index, context files)
 │   ├── display.py            # KawaiiSpinner, tool preview formatting
-│   ├── skill_commands.py     # Skill slash commands (shared CLI/gateway)
 │   └── trajectory.py         # Trajectory saving helpers
-├── hermes_cli/           # CLI subcommands and setup
-│   ├── main.py           # Entry point — all `hermes` subcommands
-│   ├── config.py         # DEFAULT_CONFIG, OPTIONAL_ENV_VARS, migration
-│   ├── commands.py       # Slash command definitions + SlashCommandCompleter
-│   ├── callbacks.py      # Terminal callbacks (clarify, sudo, approval)
-│   └── setup.py          # Interactive setup wizard
-├── tools/                # Tool implementations (one file per tool)
-│   ├── registry.py       # Central tool registry (schemas, handlers, dispatch)
-│   ├── approval.py       # Dangerous command detection
-│   ├── terminal_tool.py  # Terminal orchestration
-│   ├── process_registry.py # Background process management
-│   ├── file_tools.py     # File read/write/search/patch
-│   ├── web_tools.py      # Firecrawl search/extract
-│   ├── browser_tool.py   # Browserbase browser automation
-│   ├── code_execution_tool.py # execute_code sandbox
-│   ├── delegate_tool.py  # Subagent delegation
-│   ├── mcp_tool.py       # MCP client (~1050 lines)
-│   └── environments/     # Terminal backends (local, docker, ssh, modal, daytona, singularity)
-├── gateway/              # Messaging platform gateway
-│   ├── run.py            # Main loop, slash commands, message dispatch
-│   ├── session.py        # SessionStore — conversation persistence
-│   └── platforms/        # Adapters: telegram, discord, slack, whatsapp, homeassistant, signal
-├── cron/                 # Scheduler (jobs.py, scheduler.py)
-├── environments/         # RL training environments (Atropos)
-├── tests/                # Pytest suite (~2500+ tests)
+├── hermes_cli/           # CLI implementation
+│   ├── main.py           # Entry point, command dispatcher
+│   ├── banner.py         # Welcome banner, ASCII art, skills summary
+│   ├── commands.py       # Slash command definitions + autocomplete
+│   ├── callbacks.py      # Interactive prompt callbacks (clarify, sudo, approval)
+│   ├── setup.py          # Interactive setup wizard
+│   ├── config.py         # Config management & migration
+│   ├── status.py         # Status display
+│   ├── doctor.py         # Diagnostics
+│   ├── gateway.py        # Gateway management
+│   ├── uninstall.py      # Uninstaller
+│   ├── cron.py           # Cron job management
+│   └── skills_hub.py     # Skills Hub CLI + /skills slash command
+├── tools/                # Tool implementations
+│   ├── registry.py            # Central tool registry (schemas, handlers, dispatch)
+│   ├── approval.py            # Dangerous command detection + per-session approval
+│   ├── environments/          # Terminal execution backends
+│   │   ├── base.py            # BaseEnvironment ABC
+│   │   ├── local.py           # Local execution with interrupt support
+│   │   ├── docker.py          # Docker container execution
+│   │   ├── ssh.py             # SSH remote execution
+│   │   ├── singularity.py     # Singularity/Apptainer + SIF management
+│   │   ├── modal.py           # Modal cloud execution
+│   │   └── daytona.py         # Daytona cloud sandboxes
+│   ├── terminal_tool.py       # Terminal orchestration (sudo, lifecycle, factory)
+│   ├── todo_tool.py           # Planning & task management
+│   ├── process_registry.py    # Background process management
+│   └── ...                    # Other tool files
+├── gateway/              # Messaging platform adapters
+│   ├── platforms/        # Platform-specific adapters (telegram, discord, slack, whatsapp)
+│   └── ...
+├── cron/                 # Scheduler implementation
+├── environments/         # RL training environments (Atropos integration)
+├── skills/               # Bundled skill sources
+├── optional-skills/      # Official optional skills (not activated by default)
+├── cli.py                # Interactive CLI orchestrator (HermesCLI class)
+├── hermes_state.py       # SessionDB — SQLite session store (schema, titles, FTS5 search)
+├── run_agent.py          # AIAgent class (core conversation loop)
+├── model_tools.py        # Tool orchestration (thin layer over tools/registry.py)
+├── toolsets.py           # Tool groupings
+├── toolset_distributions.py  # Probability-based tool selection
 └── batch_runner.py       # Parallel batch processing
 ```

-**User config:** `~/.hermes/config.yaml` (settings), `~/.hermes/.env` (API keys)
+**User Configuration** (stored in `~/.hermes/`):
+- `~/.hermes/config.yaml` - Settings (model, terminal, toolsets, etc.)
+- `~/.hermes/.env` - API keys and secrets
+- `~/.hermes/pairing/` - DM pairing data
+- `~/.hermes/hooks/` - Custom event hooks
+- `~/.hermes/image_cache/` - Cached user images
+- `~/.hermes/audio_cache/` - Cached user voice messages
+- `~/.hermes/sticker_cache.json` - Telegram sticker descriptions

 ## File Dependency Chain

@@ -68,175 +87,631 @@ model_tools.py  (imports tools/registry + triggers tool discovery)
 run_agent.py, cli.py, batch_runner.py, environments/
 ```

+Each tool file co-locates its schema, handler, and registration. `model_tools.py` is a thin orchestration layer.
+
 ---

-## AIAgent Class (run_agent.py)
+## AIAgent Class
+
+The main agent is implemented in `run_agent.py`:

 ```python
 class AIAgent:
-    def __init__(self,
-        model: str = "anthropic/claude-opus-4.6",
-        max_iterations: int = 90,
+    def __init__(
+        self,
+        model: str = "anthropic/claude-sonnet-4.6",
+        api_key: str = None,
+        base_url: str = "https://openrouter.ai/api/v1",
+        max_iterations: int = 60,        # Max tool-calling loops
        enabled_toolsets: list = None,
        disabled_toolsets: list = None,
-        quiet_mode: bool = False,
-        save_trajectories: bool = False,
-        platform: str = None,           # "cli", "telegram", etc.
-        session_id: str = None,
-        skip_context_files: bool = False,
-        skip_memory: bool = False,
-        # ... plus provider, api_mode, callbacks, routing params
-    ): ...
-
-    def chat(self, message: str) -> str:
-        """Simple interface — returns final response string."""
-
-    def run_conversation(self, user_message: str, system_message: str = None,
-                         conversation_history: list = None, task_id: str = None) -> dict:
-        """Full interface — returns dict with final_response + messages."""
+        verbose_logging: bool = False,
+        quiet_mode: bool = False,         # Suppress progress output
+        tool_progress_callback: callable = None,  # Called on each tool use
+    ):
+        # Initialize OpenAI client, load tools based on toolsets
+        ...
+    
+    def chat(self, user_message: str, task_id: str = None) -> str:
+        # Main entry point - runs the agent loop
+        ...
 ```

 ### Agent Loop

-The core loop is inside `run_conversation()` — entirely synchronous:
+The core loop in `_run_agent_loop()`:
+
+```
+1. Add user message to conversation
+2. Call LLM with tools
+3. If LLM returns tool calls:
+   - Execute each tool
+   - Add tool results to conversation
+   - Go to step 2
+4. If LLM returns text response:
+   - Return response to user
+```

 ```python
-while api_call_count < self.max_iterations and self.iteration_budget.remaining > 0:
-    response = client.chat.completions.create(model=model, messages=messages, tools=tool_schemas)
+while turns < max_turns:
+    response = client.chat.completions.create(
+        model=model,
+        messages=messages,
+        tools=tool_schemas,
+    )
+    
    if response.tool_calls:
        for tool_call in response.tool_calls:
-            result = handle_function_call(tool_call.name, tool_call.args, task_id)
+            result = await execute_tool(tool_call)
            messages.append(tool_result_message(result))
-        api_call_count += 1
+        turns += 1
    else:
        return response.content
 ```

-Messages follow OpenAI format: `{"role": "system/user/assistant/tool", ...}`. Reasoning content is stored in `assistant_msg["reasoning"]`.
+### Conversation Management
+
+Messages are stored as a list of dicts following OpenAI format:
+
+```python
+messages = [
+    {"role": "system", "content": "You are a helpful assistant..."},
+    {"role": "user", "content": "Search for Python tutorials"},
+    {"role": "assistant", "content": None, "tool_calls": [...]},
+    {"role": "tool", "tool_call_id": "...", "content": "..."},
+    {"role": "assistant", "content": "Here's what I found..."},
+]
+```
+
+### Reasoning Model Support
+
+For models that support chain-of-thought reasoning:
+- Extract `reasoning_content` from API responses
+- Store in `assistant_msg["reasoning"]` for trajectory export
+- Pass back via `reasoning_content` field on subsequent turns

 ---

 ## CLI Architecture (cli.py)

- **Rich** for banner/panels, **prompt_toolkit** for input with autocomplete
- **KawaiiSpinner** (`agent/display.py`) — animated faces during API calls, `┊` activity feed for tool results
- `load_cli_config()` in cli.py merges hardcoded defaults + user config YAML
- `process_command()` is a method on `HermesCLI` (not in commands.py)
- Skill slash commands: `agent/skill_commands.py` scans `~/.hermes/skills/`, injects as **user message** (not system prompt) to preserve prompt caching
+The interactive CLI uses:
+- **Rich** - For the welcome banner and styled panels
+- **prompt_toolkit** - For fixed input area with history, `patch_stdout`, slash command autocomplete, and floating completion menus
+- **KawaiiSpinner** (in run_agent.py) - Animated kawaii faces during API calls; clean `┊` activity feed for tool execution results
+
+Key components:
+- `HermesCLI` class - Main CLI controller with commands and conversation loop
+- `SlashCommandCompleter` - Autocomplete dropdown for `/commands` (type `/` to see all)
+- `agent/skill_commands.py` - Scans skills and builds invocation messages (shared with gateway)
+- `load_cli_config()` - Loads config, sets environment variables for terminal
+- `build_welcome_banner()` - Displays ASCII art logo, tools, and skills summary
+
+CLI UX notes:
+- Thinking spinner (during LLM API call) shows animated kawaii face + verb (`(⌐■_■) deliberating...`)
+- When LLM returns tool calls, the spinner clears silently (no "got it!" noise)
+- Tool execution results appear as a clean activity feed: `┊ {emoji} {verb} {detail} {duration}`
+- "got it!" only appears when the LLM returns a final text response (`⚕ ready`)
+- The prompt shows `⚕ ❯` when the agent is working, `❯` when idle
+- Pasting 5+ lines auto-saves to `~/.hermes/pastes/` and collapses to a reference
+- Multi-line input via Alt+Enter or Ctrl+J
+- `/commands` - Process user commands like `/help`, `/clear`, `/personality`, etc.
+- `/skill-name` - Invoke installed skills directly (e.g., `/axolotl`, `/gif-search`)
+
+CLI uses `quiet_mode=True` when creating AIAgent to suppress verbose logging.
+
+### Skill Slash Commands
+
+Every installed skill in `~/.hermes/skills/` is automatically registered as a slash command.
+The skill name (from frontmatter or folder name) becomes the command: `axolotl` → `/axolotl`.
+
+Implementation (`agent/skill_commands.py`, shared between CLI and gateway):
+1. `scan_skill_commands()` scans all SKILL.md files at startup, filtering out skills incompatible with the current OS platform (via the `platforms` frontmatter field)
+2. `build_skill_invocation_message()` loads the SKILL.md content and builds a user-turn message
+3. The message includes the full skill content, a list of supporting files (not loaded), and the user's instruction
+4. Supporting files can be loaded on demand via the `skill_view` tool
+5. Injected as a **user message** (not system prompt) to preserve prompt caching

 ### Adding CLI Commands

-1. Add to `COMMANDS` dict in `hermes_cli/commands.py`
-2. Add handler in `HermesCLI.process_command()` in `cli.py`
-3. For persistent settings, use `save_config_value()` in `cli.py`
+1. Add to `COMMANDS` dict with description
+2. Add handler in `process_command()` method
+3. For persistent settings, use `save_config_value()` to update config
+
+---
+
+## Hermes CLI Commands
+
+The unified `hermes` command provides all functionality:
+
+| Command | Description |
+|---------|-------------|
+| `hermes` | Interactive chat (default) |
+| `hermes chat -q "..."` | Single query mode |
+| `hermes -c` / `hermes --continue` | Resume the most recent session |
+| `hermes -c "my project"` | Resume a session by name (latest in lineage) |
+| `hermes --resume <session_id>` | Resume a specific session by ID or title |
+| `hermes -w` / `hermes --worktree` | Start in isolated git worktree (for parallel agents) |
+| `hermes setup` | Configure API keys and settings |
+| `hermes config` | View current configuration |
+| `hermes config edit` | Open config in editor |
+| `hermes config set KEY VAL` | Set a specific value |
+| `hermes config check` | Check for missing config |
+| `hermes config migrate` | Prompt for missing config interactively |
+| `hermes status` | Show configuration status |
+| `hermes doctor` | Diagnose issues |
+| `hermes update` | Update to latest (checks for new config) |
+| `hermes uninstall` | Uninstall (can keep configs for reinstall) |
+| `hermes gateway` | Start gateway (messaging + cron scheduler) |
+| `hermes gateway setup` | Configure messaging platforms interactively |
+| `hermes gateway install` | Install gateway as system service |
+| `hermes sessions list` | List past sessions (title, preview, last active) |
+| `hermes sessions rename <id> <title>` | Rename/title a session |
+| `hermes cron list` | View scheduled jobs |
+| `hermes cron status` | Check if cron scheduler is running |
+| `hermes version` | Show version info |
+| `hermes pairing list/approve/revoke` | Manage DM pairing codes |
+
+---
+
+## Messaging Gateway
+
+The gateway connects Hermes to Telegram, Discord, Slack, and WhatsApp.
+
+### Setup
+
+The interactive setup wizard handles platform configuration:
+
+```bash
+hermes gateway setup      # Arrow-key menu of all platforms, configure tokens/allowlists/home channels
+```
+
+This is the recommended way to configure messaging. It shows which platforms are already set up, walks through each one interactively, and offers to start/restart the gateway service at the end.
+
+Platforms can also be configured manually in `~/.hermes/.env`:
+
+### Configuration (in `~/.hermes/.env`):
+
+```bash
+# Telegram
+TELEGRAM_BOT_TOKEN=123456:ABC-DEF...      # From @BotFather
+TELEGRAM_ALLOWED_USERS=123456789,987654   # Comma-separated user IDs (from @userinfobot)
+
+# Discord  
+DISCORD_BOT_TOKEN=MTIz...                 # From Developer Portal
+DISCORD_ALLOWED_USERS=123456789012345678  # Comma-separated user IDs
+
+# Agent Behavior
+HERMES_MAX_ITERATIONS=60                  # Max tool-calling iterations
+MESSAGING_CWD=/home/myuser                # Terminal working directory for messaging
+
+# Tool progress is configured in config.yaml (display.tool_progress: off|new|all|verbose)
+```
+
+### Working Directory Behavior
+
+- **CLI (`hermes` command)**: Uses current directory (`.` → `os.getcwd()`)
+- **Messaging (Telegram/Discord)**: Uses `MESSAGING_CWD` (default: home directory)
+
+This is intentional: CLI users are in a terminal and expect the agent to work in their current directory, while messaging users need a consistent starting location.
+
+### Security (User Allowlists):
+
+**IMPORTANT**: By default, the gateway denies all users who are not in an allowlist or paired via DM.
+
+The gateway checks `{PLATFORM}_ALLOWED_USERS` environment variables:
+- If set: Only listed user IDs can interact with the bot
+- If unset: All users are denied unless `GATEWAY_ALLOW_ALL_USERS=true` is set
+
+Users can find their IDs:
+- **Telegram**: Message [@userinfobot](https://t.me/userinfobot)
+- **Discord**: Enable Developer Mode, right-click name → Copy ID
+
+### DM Pairing System
+
+Instead of static allowlists, users can pair via one-time codes:
+1. Unknown user DMs the bot → receives pairing code
+2. Owner runs `hermes pairing approve <platform> <code>`
+3. User is permanently authorized
+
+Security: 8-char codes, 1-hour expiry, rate-limited (1/10min/user), max 3 pending per platform, lockout after 5 failed attempts, `chmod 0600` on data files.
+
+Files: `gateway/pairing.py`, `hermes_cli/pairing.py`
+
+### Event Hooks
+
+Hooks fire at lifecycle points. Place hook directories in `~/.hermes/hooks/`:
+
+```
+~/.hermes/hooks/my-hook/
+├── HOOK.yaml    # name, description, events list
+└── handler.py   # async def handle(event_type, context): ...
+```
+
+Events: `gateway:startup`, `session:start`, `session:reset`, `agent:start`, `agent:step`, `agent:end`, `command:*`
+
+The `agent:step` event fires each iteration of the tool-calling loop with tool names and results.
+
+Files: `gateway/hooks.py`
+
+### Tool Progress Notifications
+
+When `tool_progress` is enabled in `config.yaml`, the bot sends status messages as it works:
+- `💻 \`ls -la\`...` (terminal commands show the actual command)
+- `🔍 web_search...`
+- `📄 web_extract...`
+- `🐍 execute_code...` (programmatic tool calling sandbox)
+- `🔀 delegate_task...` (subagent delegation)
+- `❓ clarify...` (user question, CLI-only)
+
+Modes:
+- `new`: Only when switching to a different tool (less spam)
+- `all`: Every single tool call
+
+### Typing Indicator
+
+The gateway keeps the "typing..." indicator active throughout processing, refreshing every 4 seconds. This lets users know the bot is working even during long tool-calling sequences.
+
+### Platform Toolsets:
+
+Each platform has a dedicated toolset in `toolsets.py`:
+- `hermes-telegram`: Full tools including terminal (with safety checks)
+- `hermes-discord`: Full tools including terminal
+- `hermes-whatsapp`: Full tools including terminal
+
+---
+
+## Configuration System
+
+Configuration files are stored in `~/.hermes/` for easy user access:
+- `~/.hermes/config.yaml` - All settings (model, terminal, compression, etc.)
+- `~/.hermes/.env` - API keys and secrets
+
+### Adding New Configuration Options
+
+When adding new configuration variables, you MUST follow this process:
+
+#### For config.yaml options:
+
+1. Add to `DEFAULT_CONFIG` in `hermes_cli/config.py`
+2. **CRITICAL**: Bump `_config_version` in `DEFAULT_CONFIG` when adding required fields
+3. This triggers migration prompts for existing users on next `hermes update` or `hermes setup`
+
+Example:
+```python
+DEFAULT_CONFIG = {
+    # ... existing config ...
+    
+    "new_feature": {
+        "enabled": True,
+        "option": "default_value",
+    },
+    
+    # BUMP THIS when adding required fields
+    "_config_version": 2,  # Was 1, now 2
+}
+```
+
+#### For .env variables (API keys/secrets):
+
+1. Add to `REQUIRED_ENV_VARS` or `OPTIONAL_ENV_VARS` in `hermes_cli/config.py`
+2. Include metadata for the migration system:
+
+```python
+OPTIONAL_ENV_VARS = {
+    # ... existing vars ...
+    "NEW_API_KEY": {
+        "description": "What this key is for",
+        "prompt": "Display name in prompts",
+        "url": "https://where-to-get-it.com/",
+        "tools": ["tools_it_enables"],  # What tools need this
+        "password": True,  # Mask input
+    },
+}
+```
+
+#### Update related files:
+
+- `hermes_cli/setup.py` - Add prompts in the setup wizard
+- `cli-config.yaml.example` - Add example with comments
+- Update README.md if user-facing
+
+### Config Version Migration
+
+The system uses `_config_version` to detect outdated configs:
+
+1. `check_for_missing_config()` compares user config to `DEFAULT_CONFIG`
+2. `migrate_config()` interactively prompts for missing values
+3. Called automatically by `hermes update` and optionally by `hermes setup`
+
+---
+
+## Environment Variables
+
+API keys are loaded from `~/.hermes/.env`:
+- `OPENROUTER_API_KEY` - Main LLM API access (primary provider)
+- `FIRECRAWL_API_KEY` - Web search/extract tools
+- `FIRECRAWL_API_URL` - Self-hosted Firecrawl endpoint (optional)
+- `BROWSERBASE_API_KEY` / `BROWSERBASE_PROJECT_ID` - Browser automation
+- `FAL_KEY` - Image generation (FLUX model)
+- `NOUS_API_KEY` - Vision and Mixture-of-Agents tools
+
+Terminal tool configuration (in `~/.hermes/config.yaml`):
+- `terminal.backend` - Backend: local, docker, singularity, modal, daytona, or ssh
+- `terminal.cwd` - Working directory ("." = host CWD for local only; for remote backends set an absolute path inside the target, or omit to use the backend's default)
+- `terminal.docker_image` - Image for Docker backend
+- `terminal.singularity_image` - Image for Singularity backend
+- `terminal.modal_image` - Image for Modal backend
+- `terminal.daytona_image` - Image for Daytona backend
+- `DAYTONA_API_KEY` - API key for Daytona backend (in .env)
+- SSH: `TERMINAL_SSH_HOST`, `TERMINAL_SSH_USER`, `TERMINAL_SSH_KEY` in .env
+
+Agent behavior (in `~/.hermes/.env`):
+- `HERMES_MAX_ITERATIONS` - Max tool-calling iterations (default: 60)
+- `MESSAGING_CWD` - Working directory for messaging platforms (default: ~)
+- `display.tool_progress` in config.yaml - Tool progress: `off`, `new`, `all`, `verbose`
+- `OPENAI_API_KEY` - Voice transcription (Whisper STT)
+- `SLACK_BOT_TOKEN` / `SLACK_APP_TOKEN` - Slack integration (Socket Mode)
+- `SLACK_ALLOWED_USERS` - Comma-separated Slack user IDs
+- `HERMES_HUMAN_DELAY_MODE` - Response pacing: off/natural/custom
+- `HERMES_HUMAN_DELAY_MIN_MS` / `HERMES_HUMAN_DELAY_MAX_MS` - Custom delay range
+
+### Dangerous Command Approval
+
+The terminal tool includes safety checks for potentially destructive commands (e.g., `rm -rf`, `DROP TABLE`, `chmod 777`, etc.):
+
+**Behavior by Backend:**
+- **Docker/Singularity/Modal**: Commands run unrestricted (isolated containers)
+- **Local/SSH**: Dangerous commands trigger approval flow
+
+**Approval Flow (CLI):**
+```
+⚠️  Potentially dangerous command detected: recursive delete
+    rm -rf /tmp/test
+
+    [o]nce  |  [s]ession  |  [a]lways  |  [d]eny
+    Choice [o/s/a/D]: 
+```
+
+**Approval Flow (Messaging):**
+- Command is blocked with explanation
+- Agent explains the command was blocked for safety
+- User must add the pattern to their allowlist via `hermes config edit` or run the command directly on their machine
+
+**Configuration:**
+- `command_allowlist` in `~/.hermes/config.yaml` stores permanently allowed patterns
+- Add patterns via "always" approval or edit directly
+
+**Sudo Handling (Messaging):**
+- If sudo fails over messaging, output includes tip to add `SUDO_PASSWORD` to `~/.hermes/.env`
+
+---
+
+## Background Process Management
+
+The `process` tool works alongside `terminal` for managing long-running background processes:
+
+**Starting a background process:**
+```python
+terminal(command="pytest -v tests/", background=true)
+# Returns: {"session_id": "proc_abc123", "pid": 12345, ...}
+```
+
+**Managing it with the process tool:**
+- `process(action="list")` -- show all running/recent processes
+- `process(action="poll", session_id="proc_abc123")` -- check status + new output
+- `process(action="log", session_id="proc_abc123")` -- full output with pagination
+- `process(action="wait", session_id="proc_abc123", timeout=600)` -- block until done
+- `process(action="kill", session_id="proc_abc123")` -- terminate
+- `process(action="write", session_id="proc_abc123", data="y")` -- send stdin
+- `process(action="submit", session_id="proc_abc123", data="yes")` -- send + Enter
+
+**Key behaviors:**
+- Background processes execute through the configured terminal backend (local/Docker/Modal/Daytona/SSH/Singularity) -- never directly on the host unless `TERMINAL_ENV=local`
+- The `wait` action blocks the tool call until the process finishes, times out, or is interrupted by a new user message
+- PTY mode (`pty=true` on terminal) enables interactive CLI tools (Codex, Claude Code)
+- In RL training, background processes are auto-killed when the episode ends (`tool_context.cleanup()`)
+- In the gateway, sessions with active background processes are exempt from idle reset
+- The process registry checkpoints to `~/.hermes/processes.json` for crash recovery
+
+Files: `tools/process_registry.py` (registry + handler), `tools/terminal_tool.py` (spawn integration)

 ---

 ## Adding New Tools

-Requires changes in **3 files**:
+Adding a tool requires changes in **2 files** (the tool file and `toolsets.py`):
+
+1. **Create `tools/your_tool.py`** with handler, schema, check function, and registry call:

-**1. Create `tools/your_tool.py`:**
 ```python
-import json, os
+# tools/example_tool.py
+import json
+import os
 from tools.registry import registry

-def check_requirements() -> bool:
+def check_example_requirements() -> bool:
+    """Check if required API keys/dependencies are available."""
    return bool(os.getenv("EXAMPLE_API_KEY"))

 def example_tool(param: str, task_id: str = None) -> str:
-    return json.dumps({"success": True, "data": "..."})
+    """Execute the tool and return JSON string result."""
+    try:
+        result = {"success": True, "data": "..."}
+        return json.dumps(result, ensure_ascii=False)
+    except Exception as e:
+        return json.dumps({"error": str(e)}, ensure_ascii=False)
+
+EXAMPLE_SCHEMA = {
+    "name": "example_tool",
+    "description": "Does something useful.",
+    "parameters": {
+        "type": "object",
+        "properties": {
+            "param": {"type": "string", "description": "The parameter"}
+        },
+        "required": ["param"]
+    }
+}

 registry.register(
    name="example_tool",
    toolset="example",
-    schema={"name": "example_tool", "description": "...", "parameters": {...}},
-    handler=lambda args, **kw: example_tool(param=args.get("param", ""), task_id=kw.get("task_id")),
-    check_fn=check_requirements,
+    schema=EXAMPLE_SCHEMA,
+    handler=lambda args, **kw: example_tool(
+        param=args.get("param", ""), task_id=kw.get("task_id")),
+    check_fn=check_example_requirements,
    requires_env=["EXAMPLE_API_KEY"],
 )
 ```

-**2. Add import** in `model_tools.py` `_discover_tools()` list.
+2. **Add to `toolsets.py`**: Add `"example_tool"` to `_HERMES_CORE_TOOLS` if it should be in all platform toolsets, or create a new toolset entry.

-**3. Add to `toolsets.py`** — either `_HERMES_CORE_TOOLS` (all platforms) or a new toolset.
+3. **Add discovery import** in `model_tools.py`'s `_discover_tools()` list: `"tools.example_tool"`.

-The registry handles schema collection, dispatch, availability checking, and error wrapping. All handlers MUST return a JSON string.
+That's it. The registry handles schema collection, dispatch, availability checking, and error wrapping automatically. No edits to `TOOLSET_REQUIREMENTS`, `handle_function_call()`, `get_all_tool_names()`, or any other data structure.

-**Agent-level tools** (todo, memory): intercepted by `run_agent.py` before `handle_function_call()`. See `todo_tool.py` for the pattern.
+**Optional:** Add to `OPTIONAL_ENV_VARS` in `hermes_cli/config.py` for the setup wizard, and to `toolset_distributions.py` for batch processing.
+
+**Special case: tools that need agent-level state** (like `todo`, `memory`):
+These are intercepted by `run_agent.py`'s tool dispatch loop *before* `handle_function_call()`. The registry still holds their schemas, but dispatch returns a stub error as a safety fallback. See `todo_tool.py` for the pattern.
+
+All tool handlers MUST return a JSON string. The registry's `dispatch()` wraps all exceptions in `{"error": "..."}` automatically.
+
+### Dynamic Tool Availability
+
+Tools declare their requirements at registration time via `check_fn` and `requires_env`. The registry checks `check_fn()` when building tool definitions -- tools whose check fails are silently excluded.
+
+### Stateful Tools
+
+Tools that maintain state (terminal, browser) require:
+- `task_id` parameter for session isolation between concurrent tasks
+- `cleanup_*()` function to release resources
+- Cleanup is called automatically in run_agent.py after conversation completes

 ---

-## Adding Configuration
+## Trajectory Format

-### config.yaml options:
-1. Add to `DEFAULT_CONFIG` in `hermes_cli/config.py`
-2. Bump `_config_version` (currently 5) to trigger migration for existing users
-
-### .env variables:
-1. Add to `OPTIONAL_ENV_VARS` in `hermes_cli/config.py` with metadata:
-```python
-"NEW_API_KEY": {
-    "description": "What it's for",
-    "prompt": "Display name",
-    "url": "https://...",
-    "password": True,
-    "category": "tool",  # provider, tool, messaging, setting
-},
+Conversations are saved in ShareGPT format for training:
+```json
+{"from": "system", "value": "System prompt with <tools>...</tools>"}
+{"from": "human", "value": "User message"}
+{"from": "gpt", "value": "<think>reasoning</think>\n<tool_call>{...}</tool_call>"}
+{"from": "tool", "value": "<tool_response>{...}</tool_response>"}
+{"from": "gpt", "value": "Final response"}
 ```

-### Config loaders (two separate systems):
+Tool calls use `<tool_call>` XML tags, responses use `<tool_response>` tags, reasoning uses `<think>` tags.

-| Loader | Used by | Location |
-|--------|---------|----------|
-| `load_cli_config()` | CLI mode | `cli.py` |
-| `load_config()` | `hermes tools`, `hermes setup` | `hermes_cli/config.py` |
-| Direct YAML load | Gateway | `gateway/run.py` |
+### Trajectory Export
+
+```python
+agent = AIAgent(save_trajectories=True)
+agent.chat("Do something")
+# Saves to trajectories/*.jsonl in ShareGPT format
+```

 ---

-## Important Policies
+## Batch Processing (batch_runner.py)

-### Prompt Caching Must Not Break
+For processing multiple prompts:
+- Parallel execution with multiprocessing
+- Content-based resume for fault tolerance (matches on prompt text, not indices)
+- Toolset distributions control probabilistic tool availability per prompt
+- Output: `data/<run_name>/trajectories.jsonl` (combined) + individual batch files

-Hermes-Agent ensures caching remains valid throughout a conversation. **Do NOT implement changes that would:**
- Alter past context mid-conversation
- Change toolsets mid-conversation
- Reload memories or rebuild system prompts mid-conversation
+```bash
+python batch_runner.py \
+    --dataset_file=prompts.jsonl \
+    --batch_size=20 \
+    --num_workers=4 \
+    --run_name=my_run
+```

-Cache-breaking forces dramatically higher costs. The ONLY time we alter context is during context compression.
+---

-### Working Directory Behavior
- **CLI**: Uses current directory (`.` → `os.getcwd()`)
- **Messaging**: Uses `MESSAGING_CWD` env var (default: home directory)
+## Skills System
+
+Skills are on-demand knowledge documents the agent can load. Compatible with the [agentskills.io](https://agentskills.io/specification) open standard.
+
+```
+skills/
+├── mlops/                    # Category folder
+│   ├── axolotl/             # Skill folder
+│   │   ├── SKILL.md         # Main instructions (required)
+│   │   ├── references/      # Additional docs, API specs
+│   │   ├── templates/       # Output formats, configs
+│   │   └── assets/          # Supplementary files (agentskills.io)
+│   └── vllm/
+│       └── SKILL.md
+├── .hub/                    # Skills Hub state (gitignored)
+│   ├── lock.json            # Installed skill provenance
+│   ├── quarantine/          # Pending security review
+│   ├── audit.log            # Security scan history
+│   ├── taps.json            # Custom source repos
+│   └── index-cache/         # Cached remote indexes
+```
+
+**Progressive disclosure** (token-efficient):
+1. `skills_categories()` - List category names (~50 tokens)
+2. `skills_list(category)` - Name + description per skill (~3k tokens)
+3. `skill_view(name)` - Full content + tags + linked files
+
+SKILL.md files use YAML frontmatter (agentskills.io format):
+```yaml
+---
+name: skill-name
+description: Brief description for listing
+version: 1.0.0
+platforms: [macos]              # Optional — restrict to specific OS (macos/linux/windows)
+metadata:
+  hermes:
+    tags: [tag1, tag2]
+    related_skills: [other-skill]
+---
+# Skill Content...
+```
+
+**Platform filtering** — Skills with a `platforms` field are automatically excluded from the system prompt index, `skills_list()`, and slash commands on incompatible platforms. Skills without the field load everywhere (backward compatible). See `skills/apple/` for macOS-only examples (iMessage, Reminders, Notes, FindMy).
+
+**Skills Hub** — user-driven skill search/install from online registries and official optional skills. Sources: official optional skills (shipped with repo, labeled "official"), GitHub (openai/skills, anthropics/skills, custom taps), ClawHub, Claude marketplace, LobeHub. Not exposed as an agent tool — the model cannot search for or install skills. Users manage skills via `hermes skills browse/search/install` CLI commands or the `/skills` slash command in chat.
+
+Key files:
+- `tools/skills_tool.py` — Agent-facing skill list/view (progressive disclosure)
+- `tools/skills_guard.py` — Security scanner (regex + LLM audit, trust-aware install policy)
+- `tools/skills_hub.py` — Source adapters (OptionalSkillSource, GitHub, ClawHub, Claude marketplace, LobeHub), lock file, auth
+- `hermes_cli/skills_hub.py` — CLI subcommands + `/skills` slash command handler

 ---

 ## Known Pitfalls

 ### DO NOT use `simple_term_menu` for interactive menus
-Rendering bugs in tmux/iTerm2 — ghosting on scroll. Use `curses` (stdlib) instead. See `hermes_cli/tools_config.py` for the pattern.
+
+`simple_term_menu` has rendering bugs in tmux, iTerm2, and other non-standard terminals. When the user scrolls with arrow keys, previously highlighted items "ghost" — duplicating upward and corrupting the display. This happens because the library uses ANSI cursor-up codes to redraw in place, and tmux/iTerm miscalculate positions when the menu is near the bottom of the viewport.
+
+**Rule:** All interactive menus in `hermes_cli/` must use `curses` (Python stdlib) instead. See `tools_config.py` for the pattern — both `_prompt_choice()` (single-select) and `_prompt_toolset_checklist()` (multi-select with space toggle) use `curses.wrapper()`. The numbered-input fallback handles Windows where curses isn't available.

 ### DO NOT use `\033[K` (ANSI erase-to-EOL) in spinner/display code
-Leaks as literal `?[K` text under `prompt_toolkit`'s `patch_stdout`. Use space-padding: `f"\r{line}{' ' * pad}"`.
+
+The ANSI escape `\033[K` leaks as literal `?[K` text when `prompt_toolkit`'s `patch_stdout` is active. Use space-padding instead to clear lines: `f"\r{line}{' ' * pad}"`. See `agent/display.py` `KawaiiSpinner`.

 ### `_last_resolved_tool_names` is a process-global in `model_tools.py`
-When subagents overwrite this global, `execute_code` calls after delegation may fail with missing tool imports. Known bug.
+
+The `execute_code` sandbox uses `_last_resolved_tool_names` (set by `get_tool_definitions()`) to decide which tool stubs to generate. When subagents run with restricted toolsets, they overwrite this global. After delegation returns to the parent, `execute_code` may see the child's restricted list instead of the parent's full list. This is a known bug — `execute_code` calls after delegation may fail with `ImportError: cannot import name 'patch' from 'hermes_tools'`.

 ### Tests must not write to `~/.hermes/`
-The `_isolate_hermes_home` autouse fixture in `tests/conftest.py` redirects `HERMES_HOME` to a temp dir. Never hardcode `~/.hermes/` paths in tests.
+
+The `autouse` fixture `_isolate_hermes_home` in `tests/conftest.py` redirects `HERMES_HOME` to a temp dir. Every test runs in isolation. If you add a test that creates `AIAgent` instances or writes session logs, the fixture handles cleanup automatically. Never hardcode `~/.hermes/` paths in tests.

 ---

-## Testing
+## Testing Changes

-```bash
-source .venv/bin/activate
-python -m pytest tests/ -q          # Full suite (~2500 tests, ~2 min)
-python -m pytest tests/test_model_tools.py -q   # Toolset resolution
-python -m pytest tests/test_cli_init.py -q       # CLI config loading
-python -m pytest tests/gateway/ -q               # Gateway tests
-python -m pytest tests/tools/ -q                 # Tool-level tests
-```
+After making changes:

-Always run the full suite before pushing changes.
+1. Run `hermes doctor` to check setup
+2. Run `hermes config check` to verify config
+3. Test with `hermes chat -q "test message"`
+4. For new config options, test fresh install: `rm -rf ~/.hermes && hermes setup`
@@ -17,7 +17,7 @@ Use any model you want — [Nous Portal](https://portal.nousresearch.com), [Open

 <table>
 <tr><td><b>A real terminal interface</b></td><td>Full TUI with multiline editing, slash-command autocomplete, conversation history, interrupt-and-redirect, and streaming tool output.</td></tr>
-<tr><td><b>Lives where you do</b></td><td>Telegram, Discord, Slack, WhatsApp, Signal, and CLI — all from a single gateway process. Voice memo transcription, cross-platform conversation continuity.</td></tr>
+<tr><td><b>Lives where you do</b></td><td>Telegram, Discord, Slack, WhatsApp, and CLI — all from a single gateway process. Voice memo transcription, cross-platform conversation continuity.</td></tr>
 <tr><td><b>A closed learning loop</b></td><td>Agent-curated memory with periodic nudges. Autonomous skill creation after complex tasks. Skills self-improve during use. FTS5 session search with LLM summarization for cross-session recall. <a href="https://github.com/plastic-labs/honcho">Honcho</a> dialectic user modeling. Compatible with the <a href="https://agentskills.io">agentskills.io</a> open standard.</td></tr>
 <tr><td><b>Scheduled automations</b></td><td>Built-in cron scheduler with delivery to any platform. Daily reports, nightly backups, weekly audits — all in natural language, running unattended.</td></tr>
 <tr><td><b>Delegates and parallelizes</b></td><td>Spawn isolated subagents for parallel workstreams. Write Python scripts that call tools via RPC, collapsing multi-step pipelines into zero-context-cost turns.</td></tr>
@@ -71,7 +71,7 @@ All documentation lives at **[hermes-agent.nousresearch.com/docs](https://hermes
 | [Quickstart](https://hermes-agent.nousresearch.com/docs/getting-started/quickstart) | Install → setup → first conversation in 2 minutes |
 | [CLI Usage](https://hermes-agent.nousresearch.com/docs/user-guide/cli) | Commands, keybindings, personalities, sessions |
 | [Configuration](https://hermes-agent.nousresearch.com/docs/user-guide/configuration) | Config file, providers, models, all options |
-| [Messaging Gateway](https://hermes-agent.nousresearch.com/docs/user-guide/messaging) | Telegram, Discord, Slack, WhatsApp, Signal, Home Assistant |
+| [Messaging Gateway](https://hermes-agent.nousresearch.com/docs/user-guide/messaging) | Telegram, Discord, Slack, WhatsApp, Home Assistant |
 | [Security](https://hermes-agent.nousresearch.com/docs/user-guide/security) | Command approval, DM pairing, container isolation |
 | [Tools & Toolsets](https://hermes-agent.nousresearch.com/docs/user-guide/features/tools) | 40+ tools, toolset system, terminal backends |
 | [Skills System](https://hermes-agent.nousresearch.com/docs/user-guide/features/skills) | Procedural memory, Skills Hub, creating skills |
@@ -4,7 +4,7 @@ Provides a single resolution chain so every consumer (context compression,
 session search, web extraction, vision analysis, browser vision) picks up
 the best available backend without duplicating fallback logic.

-Resolution order for text tasks (auto mode):
+Resolution order for text tasks:
  1. OpenRouter  (OPENROUTER_API_KEY)
  2. Nous Portal (~/.hermes/auth.json active provider)
  3. Custom endpoint (OPENAI_BASE_URL + OPENAI_API_KEY)
@@ -14,19 +14,10 @@ Resolution order for text tasks (auto mode):
     — checked via PROVIDER_REGISTRY entries with auth_type='api_key'
  6. None

-Resolution order for vision/multimodal tasks (auto mode):
+Resolution order for vision/multimodal tasks:
  1. OpenRouter
  2. Nous Portal
-  3. None  (steps 3-5 are skipped — they may not support multimodal)
-
-Per-task provider overrides (e.g. AUXILIARY_VISION_PROVIDER,
-CONTEXT_COMPRESSION_PROVIDER) can force a specific provider for each task:
-"openrouter", "nous", "codex", or "main" (= steps 3-5).
-Default "auto" follows the chains above.
-
-Per-task model overrides (e.g. AUXILIARY_VISION_MODEL,
-AUXILIARY_WEB_EXTRACT_MODEL) let callers use a different model slug
-than the provider's default.
+  3. None  (custom endpoints can't substitute for Gemini multimodal)
 """

 import json
@@ -82,55 +73,6 @@ _CODEX_AUX_BASE_URL = "https://chatgpt.com/backend-api/codex"
 # read response.choices[0].message.content. This adapter translates those
 # calls to the Codex Responses API so callers don't need any changes.

-
-def _convert_content_for_responses(content: Any) -> Any:
-    """Convert chat.completions content to Responses API format.
-
-    chat.completions uses:
-      {"type": "text", "text": "..."}
-      {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}
-
-    Responses API uses:
-      {"type": "input_text", "text": "..."}
-      {"type": "input_image", "image_url": "data:image/png;base64,..."}
-
-    If content is a plain string, it's returned as-is (the Responses API
-    accepts strings directly for text-only messages).
-    """
-    if isinstance(content, str):
-        return content
-    if not isinstance(content, list):
-        return str(content) if content else ""
-
-    converted: List[Dict[str, Any]] = []
-    for part in content:
-        if not isinstance(part, dict):
-            continue
-        ptype = part.get("type", "")
-        if ptype == "text":
-            converted.append({"type": "input_text", "text": part.get("text", "")})
-        elif ptype == "image_url":
-            # chat.completions nests the URL: {"image_url": {"url": "..."}}
-            image_data = part.get("image_url", {})
-            url = image_data.get("url", "") if isinstance(image_data, dict) else str(image_data)
-            entry: Dict[str, Any] = {"type": "input_image", "image_url": url}
-            # Preserve detail if specified
-            detail = image_data.get("detail") if isinstance(image_data, dict) else None
-            if detail:
-                entry["detail"] = detail
-            converted.append(entry)
-        elif ptype in ("input_text", "input_image"):
-            # Already in Responses format — pass through
-            converted.append(part)
-        else:
-            # Unknown content type — try to preserve as text
-            text = part.get("text", "")
-            if text:
-                converted.append({"type": "input_text", "text": text})
-
-    return converted or ""
-
-
 class _CodexCompletionsAdapter:
    """Drop-in shim that accepts chat.completions.create() kwargs and
    routes them through the Codex Responses streaming API."""
@@ -144,31 +86,30 @@ class _CodexCompletionsAdapter:
        model = kwargs.get("model", self._model)
        temperature = kwargs.get("temperature")

-        # Separate system/instructions from conversation messages.
-        # Convert chat.completions multimodal content blocks to Responses
-        # API format (input_text / input_image instead of text / image_url).
+        # Separate system/instructions from conversation messages
        instructions = "You are a helpful assistant."
        input_msgs: List[Dict[str, Any]] = []
        for msg in messages:
            role = msg.get("role", "user")
            content = msg.get("content") or ""
            if role == "system":
-                instructions = content if isinstance(content, str) else str(content)
+                instructions = content
            else:
-                input_msgs.append({
-                    "role": role,
-                    "content": _convert_content_for_responses(content),
-                })
+                input_msgs.append({"role": role, "content": content})

        resp_kwargs: Dict[str, Any] = {
            "model": model,
            "instructions": instructions,
            "input": input_msgs or [{"role": "user", "content": ""}],
+            "stream": True,
            "store": False,
        }

-        # Note: the Codex endpoint (chatgpt.com/backend-api/codex) does NOT
-        # support max_output_tokens or temperature — omit to avoid 400 errors.
+        max_tokens = kwargs.get("max_output_tokens") or kwargs.get("max_completion_tokens") or kwargs.get("max_tokens")
+        if max_tokens is not None:
+            resp_kwargs["max_output_tokens"] = int(max_tokens)
+        if temperature is not None:
+            resp_kwargs["temperature"] = temperature

        # Tools support for flush_memories and similar callers
        tools = kwargs.get("tools")
@@ -396,128 +337,59 @@ def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]:
    return None, None


-# ── Provider resolution helpers ─────────────────────────────────────────────
-
-def _get_auxiliary_provider(task: str = "") -> str:
-    """Read the provider override for a specific auxiliary task.
-
-    Checks AUXILIARY_{TASK}_PROVIDER first (e.g. AUXILIARY_VISION_PROVIDER),
-    then CONTEXT_{TASK}_PROVIDER (for the compression section's summary_provider),
-    then falls back to "auto".  Returns one of: "auto", "openrouter", "nous", "main".
-    """
-    if task:
-        for prefix in ("AUXILIARY_", "CONTEXT_"):
-            val = os.getenv(f"{prefix}{task.upper()}_PROVIDER", "").strip().lower()
-            if val and val != "auto":
-                return val
-    return "auto"
-
-
-def _try_openrouter() -> Tuple[Optional[OpenAI], Optional[str]]:
-    or_key = os.getenv("OPENROUTER_API_KEY")
-    if not or_key:
-        return None, None
-    logger.debug("Auxiliary client: OpenRouter")
-    return OpenAI(api_key=or_key, base_url=OPENROUTER_BASE_URL,
-                   default_headers=_OR_HEADERS), _OPENROUTER_MODEL
-
-
-def _try_nous() -> Tuple[Optional[OpenAI], Optional[str]]:
-    nous = _read_nous_auth()
-    if not nous:
-        return None, None
-    global auxiliary_is_nous
-    auxiliary_is_nous = True
-    logger.debug("Auxiliary client: Nous Portal")
-    return (
-        OpenAI(api_key=_nous_api_key(nous), base_url=_nous_base_url()),
-        _NOUS_MODEL,
-    )
-
-
-def _try_custom_endpoint() -> Tuple[Optional[OpenAI], Optional[str]]:
-    custom_base = os.getenv("OPENAI_BASE_URL")
-    custom_key = os.getenv("OPENAI_API_KEY")
-    if not custom_base or not custom_key:
-        return None, None
-    model = os.getenv("OPENAI_MODEL") or os.getenv("LLM_MODEL") or "gpt-4o-mini"
-    logger.debug("Auxiliary client: custom endpoint (%s)", model)
-    return OpenAI(api_key=custom_key, base_url=custom_base), model
-
-
-def _try_codex() -> Tuple[Optional[Any], Optional[str]]:
-    codex_token = _read_codex_access_token()
-    if not codex_token:
-        return None, None
-    logger.debug("Auxiliary client: Codex OAuth (%s via Responses API)", _CODEX_AUX_MODEL)
-    real_client = OpenAI(api_key=codex_token, base_url=_CODEX_AUX_BASE_URL)
-    return CodexAuxiliaryClient(real_client, _CODEX_AUX_MODEL), _CODEX_AUX_MODEL
-
-
-def _resolve_forced_provider(forced: str) -> Tuple[Optional[OpenAI], Optional[str]]:
-    """Resolve a specific forced provider.  Returns (None, None) if creds missing."""
-    if forced == "openrouter":
-        client, model = _try_openrouter()
-        if client is None:
-            logger.warning("auxiliary.provider=openrouter but OPENROUTER_API_KEY not set")
-        return client, model
-
-    if forced == "nous":
-        client, model = _try_nous()
-        if client is None:
-            logger.warning("auxiliary.provider=nous but Nous Portal not configured (run: hermes login)")
-        return client, model
-
-    if forced == "codex":
-        client, model = _try_codex()
-        if client is None:
-            logger.warning("auxiliary.provider=codex but no Codex OAuth token found (run: hermes model)")
-        return client, model
-
-    if forced == "main":
-        # "main" = skip OpenRouter/Nous, use the main chat model's credentials.
-        for try_fn in (_try_custom_endpoint, _try_codex, _resolve_api_key_provider):
-            client, model = try_fn()
-            if client is not None:
-                return client, model
-        logger.warning("auxiliary.provider=main but no main endpoint credentials found")
-        return None, None
-
-    # Unknown provider name — fall through to auto
-    logger.warning("Unknown auxiliary.provider=%r, falling back to auto", forced)
-    return None, None
-
-
-def _resolve_auto() -> Tuple[Optional[OpenAI], Optional[str]]:
-    """Full auto-detection chain: OpenRouter → Nous → custom → Codex → API-key → None."""
-    for try_fn in (_try_openrouter, _try_nous, _try_custom_endpoint,
-                   _try_codex, _resolve_api_key_provider):
-        client, model = try_fn()
-        if client is not None:
-            return client, model
-    logger.debug("Auxiliary client: none available")
-    return None, None
-
-
 # ── Public API ──────────────────────────────────────────────────────────────

-def get_text_auxiliary_client(task: str = "") -> Tuple[Optional[OpenAI], Optional[str]]:
-    """Return (client, default_model_slug) for text-only auxiliary tasks.
+def get_text_auxiliary_client() -> Tuple[Optional[OpenAI], Optional[str]]:
+    """Return (client, model_slug) for text-only auxiliary tasks.

-    Args:
-        task: Optional task name ("compression", "web_extract") to check
-              for a task-specific provider override.
-
-    Callers may override the returned model with a per-task env var
-    (e.g. CONTEXT_COMPRESSION_MODEL, AUXILIARY_WEB_EXTRACT_MODEL).
+    Falls through OpenRouter -> Nous Portal -> custom endpoint -> Codex OAuth
+    -> direct API-key providers -> (None, None).
    """
-    forced = _get_auxiliary_provider(task)
-    if forced != "auto":
-        return _resolve_forced_provider(forced)
-    return _resolve_auto()
+    # 1. OpenRouter
+    or_key = os.getenv("OPENROUTER_API_KEY")
+    if or_key:
+        logger.debug("Auxiliary text client: OpenRouter")
+        return OpenAI(api_key=or_key, base_url=OPENROUTER_BASE_URL,
+                       default_headers=_OR_HEADERS), _OPENROUTER_MODEL
+
+    # 2. Nous Portal
+    nous = _read_nous_auth()
+    if nous:
+        global auxiliary_is_nous
+        auxiliary_is_nous = True
+        logger.debug("Auxiliary text client: Nous Portal")
+        return (
+            OpenAI(api_key=_nous_api_key(nous), base_url=_nous_base_url()),
+            _NOUS_MODEL,
+        )
+
+    # 3. Custom endpoint (both base URL and key must be set)
+    custom_base = os.getenv("OPENAI_BASE_URL")
+    custom_key = os.getenv("OPENAI_API_KEY")
+    if custom_base and custom_key:
+        model = os.getenv("OPENAI_MODEL") or os.getenv("LLM_MODEL") or "gpt-4o-mini"
+        logger.debug("Auxiliary text client: custom endpoint (%s)", model)
+        return OpenAI(api_key=custom_key, base_url=custom_base), model
+
+    # 4. Codex OAuth -- uses the Responses API (only endpoint the token
+    # can access), wrapped to look like a chat.completions client.
+    codex_token = _read_codex_access_token()
+    if codex_token:
+        logger.debug("Auxiliary text client: Codex OAuth (%s via Responses API)", _CODEX_AUX_MODEL)
+        real_client = OpenAI(api_key=codex_token, base_url=_CODEX_AUX_BASE_URL)
+        return CodexAuxiliaryClient(real_client, _CODEX_AUX_MODEL), _CODEX_AUX_MODEL
+
+    # 5. Direct API-key providers (z.ai/GLM, Kimi/Moonshot, MiniMax, etc.)
+    api_client, api_model = _resolve_api_key_provider()
+    if api_client is not None:
+        return api_client, api_model
+
+    # 6. Nothing available
+    logger.debug("Auxiliary text client: none available")
+    return None, None


-def get_async_text_auxiliary_client(task: str = ""):
+def get_async_text_auxiliary_client():
    """Return (async_client, model_slug) for async consumers.

    For standard providers returns (AsyncOpenAI, model). For Codex returns
@@ -526,7 +398,7 @@ def get_async_text_auxiliary_client(task: str = ""):
    """
    from openai import AsyncOpenAI

-    sync_client, model = get_text_auxiliary_client(task)
+    sync_client, model = get_text_auxiliary_client()
    if sync_client is None:
        return None, None

@@ -545,27 +417,29 @@ def get_async_text_auxiliary_client(task: str = ""):


 def get_vision_auxiliary_client() -> Tuple[Optional[OpenAI], Optional[str]]:
-    """Return (client, default_model_slug) for vision/multimodal auxiliary tasks.
+    """Return (client, model_slug) for vision/multimodal auxiliary tasks.

-    Checks AUXILIARY_VISION_PROVIDER for a forced provider, otherwise
-    auto-detects.  Callers may override the returned model with
-    AUXILIARY_VISION_MODEL.
-
-    In auto mode, only providers known to support multimodal are tried:
-    OpenRouter, Nous Portal, and Codex OAuth (gpt-5.3-codex supports
-    vision via the Responses API).  Custom endpoints and API-key
-    providers are skipped — they may not handle vision input.  To use
-    them, set AUXILIARY_VISION_PROVIDER explicitly.
+    Only OpenRouter and Nous Portal qualify — custom endpoints cannot
+    substitute for Gemini multimodal.
    """
-    forced = _get_auxiliary_provider("vision")
-    if forced != "auto":
-        return _resolve_forced_provider(forced)
-    # Auto: only multimodal-capable providers
-    for try_fn in (_try_openrouter, _try_nous, _try_codex):
-        client, model = try_fn()
-        if client is not None:
-            return client, model
-    logger.debug("Auxiliary vision client: none available (auto only tries OpenRouter/Nous/Codex)")
+    # 1. OpenRouter
+    or_key = os.getenv("OPENROUTER_API_KEY")
+    if or_key:
+        logger.debug("Auxiliary vision client: OpenRouter")
+        return OpenAI(api_key=or_key, base_url=OPENROUTER_BASE_URL,
+                       default_headers=_OR_HEADERS), _OPENROUTER_MODEL
+
+    # 2. Nous Portal
+    nous = _read_nous_auth()
+    if nous:
+        logger.debug("Auxiliary vision client: Nous Portal")
+        return (
+            OpenAI(api_key=_nous_api_key(nous), base_url=_nous_base_url()),
+            _NOUS_MODEL,
+        )
+
+    # 3. Nothing suitable
+    logger.debug("Auxiliary vision client: none available")
    return None, None


@@ -53,7 +53,7 @@ class ContextCompressor:
        self.last_completion_tokens = 0
        self.last_total_tokens = 0

-        self.client, default_model = get_text_auxiliary_client("compression")
+        self.client, default_model = get_text_auxiliary_client()
        self.summary_model = summary_model_override or default_model

    def update_from_response(self, usage: Dict[str, Any]):
@@ -342,9 +342,7 @@ Write only the summary, starting with "[CONTEXT SUMMARY]:" prefix."""
            compressed.append(msg)

        if summary:
-            last_head_role = messages[compress_start - 1].get("role", "user") if compress_start > 0 else "user"
-            summary_role = "user" if last_head_role in ("assistant", "tool") else "assistant"
-            compressed.append({"role": summary_role, "content": summary})
+            compressed.append({"role": "user", "content": summary})
        else:
            if not self.quiet_mode:
                print("   ⚠️  No summary model available — middle turns dropped without summary")
@@ -122,15 +122,6 @@ PLATFORM_HINTS = {
        "attachments, audio as file attachments. You can also include image URLs "
        "in markdown format ![alt](url) and they will be uploaded as attachments."
    ),
-    "signal": (
-        "You are on a text messaging communication platform, Signal. "
-        "Please do not use markdown as it does not render. "
-        "You can send media files natively: to deliver a file to the user, "
-        "include MEDIA:/absolute/path/to/file in your response. Images "
-        "(.png, .jpg, .webp) appear as photos, audio as attachments, and other "
-        "files arrive as downloadable documents. You can also include image "
-        "URLs in markdown format ![alt](url) and they will be sent as photos."
-    ),
    "cli": (
        "You are a CLI AI Agent. Try not to use markdown but simple text "
        "renderable inside a terminal."
@@ -8,7 +8,6 @@ the first 6 and last 4 characters for debuggability.
 """

 import logging
-import os
 import re
 from typing import Optional

@@ -16,7 +15,7 @@ logger = logging.getLogger(__name__)

 # Known API key prefixes -- match the prefix + contiguous token chars
 _PREFIX_PATTERNS = [
-    r"sk-[A-Za-z0-9_-]{10,}",           # OpenAI / OpenRouter / Anthropic (sk-ant-*)
+    r"sk-[A-Za-z0-9_-]{10,}",           # OpenAI / OpenRouter
    r"ghp_[A-Za-z0-9]{10,}",            # GitHub PAT (classic)
    r"github_pat_[A-Za-z0-9_]{10,}",    # GitHub PAT (fine-grained)
    r"xox[baprs]-[A-Za-z0-9-]{10,}",    # Slack tokens
@@ -26,18 +25,6 @@ _PREFIX_PATTERNS = [
    r"fc-[A-Za-z0-9]{10,}",             # Firecrawl
    r"bb_live_[A-Za-z0-9_-]{10,}",      # BrowserBase
    r"gAAAA[A-Za-z0-9_=-]{20,}",        # Codex encrypted tokens
-    r"AKIA[A-Z0-9]{16}",                # AWS Access Key ID
-    r"sk_live_[A-Za-z0-9]{10,}",        # Stripe secret key (live)
-    r"sk_test_[A-Za-z0-9]{10,}",        # Stripe secret key (test)
-    r"rk_live_[A-Za-z0-9]{10,}",        # Stripe restricted key
-    r"SG\.[A-Za-z0-9_-]{10,}",          # SendGrid API key
-    r"hf_[A-Za-z0-9]{10,}",             # HuggingFace token
-    r"r8_[A-Za-z0-9]{10,}",             # Replicate API token
-    r"npm_[A-Za-z0-9]{10,}",            # npm access token
-    r"pypi-[A-Za-z0-9_-]{10,}",         # PyPI API token
-    r"dop_v1_[A-Za-z0-9]{10,}",         # DigitalOcean PAT
-    r"doo_v1_[A-Za-z0-9]{10,}",         # DigitalOcean OAuth
-    r"am_[A-Za-z0-9_-]{10,}",           # AgentMail API key
 ]

 # ENV assignment patterns: KEY=value where KEY contains a secret-like name
@@ -65,22 +52,6 @@ _TELEGRAM_RE = re.compile(
    r"(bot)?(\d{8,}):([-A-Za-z0-9_]{30,})",
 )

-# Private key blocks: -----BEGIN RSA PRIVATE KEY----- ... -----END RSA PRIVATE KEY-----
-_PRIVATE_KEY_RE = re.compile(
-    r"-----BEGIN[A-Z ]*PRIVATE KEY-----[\s\S]*?-----END[A-Z ]*PRIVATE KEY-----"
-)
-
-# Database connection strings: protocol://user:PASSWORD@host
-# Catches postgres, mysql, mongodb, redis, amqp URLs and redacts the password
-_DB_CONNSTR_RE = re.compile(
-    r"((?:postgres(?:ql)?|mysql|mongodb(?:\+srv)?|redis|amqp)://[^:]+:)([^@]+)(@)",
-    re.IGNORECASE,
-)
-
-# E.164 phone numbers: +<country><number>, 7-15 digits
-# Negative lookahead prevents matching hex strings or identifiers
-_SIGNAL_PHONE_RE = re.compile(r"(\+[1-9]\d{6,14})(?![A-Za-z0-9])")
-
 # Compile known prefix patterns into one alternation
 _PREFIX_RE = re.compile(
    r"(?<![A-Za-z0-9_-])(" + "|".join(_PREFIX_PATTERNS) + r")(?![A-Za-z0-9_-])"
@@ -98,12 +69,9 @@ def redact_sensitive_text(text: str) -> str:
    """Apply all redaction patterns to a block of text.

    Safe to call on any string -- non-matching text passes through unchanged.
-    Disabled when security.redact_secrets is false in config.yaml.
    """
    if not text:
        return text
-    if os.getenv("HERMES_REDACT_SECRETS", "").lower() in ("0", "false", "no", "off"):
-        return text

    # Known prefixes (sk-, ghp_, etc.)
    text = _PREFIX_RE.sub(lambda m: _mask_token(m.group(1)), text)
@@ -133,20 +101,6 @@ def redact_sensitive_text(text: str) -> str:
        return f"{prefix}{digits}:***"
    text = _TELEGRAM_RE.sub(_redact_telegram, text)

-    # Private key blocks
-    text = _PRIVATE_KEY_RE.sub("[REDACTED PRIVATE KEY]", text)
-
-    # Database connection string passwords
-    text = _DB_CONNSTR_RE.sub(lambda m: f"{m.group(1)}***{m.group(3)}", text)
-
-    # E.164 phone numbers (Signal, WhatsApp)
-    def _redact_phone(m):
-        phone = m.group(1)
-        if len(phone) <= 8:
-            return phone[:2] + "****" + phone[-2:]
-        return phone[:4] + "****" + phone[-4:]
-    text = _SIGNAL_PHONE_RE.sub(_redact_phone, text)
-
    return text


@@ -209,58 +209,8 @@ compression:
  threshold: 0.85
  
  # Model to use for generating summaries (fast/cheap recommended)
-  # This model compresses the middle turns into a concise summary.
-  # IMPORTANT: it receives the full middle section of the conversation, so it
-  # MUST support a context length at least as large as your main model's.
+  # This model compresses the middle turns into a concise summary
  summary_model: "google/gemini-3-flash-preview"
-  
-  # Provider for the summary model (default: "auto")
-  # Options: "auto", "openrouter", "nous", "main"
-  # summary_provider: "auto"
-
-# =============================================================================
-# Auxiliary Models (Advanced — Experimental)
-# =============================================================================
-# Hermes uses lightweight "auxiliary" models for side tasks: image analysis,
-# browser screenshot analysis, web page summarization, and context compression.
-#
-# By default these use Gemini Flash via OpenRouter or Nous Portal and are
-# auto-detected from your credentials.  You do NOT need to change anything
-# here for normal usage.
-#
-# WARNING: Overriding these with providers other than OpenRouter or Nous Portal
-# is EXPERIMENTAL and may not work.  Not all models/providers support vision,
-# produce usable summaries, or accept the same API format.  Change at your own
-# risk — if things break, reset to "auto" / empty values.
-#
-# Each task has its own provider + model pair so you can mix providers.
-# For example: OpenRouter for vision (needs multimodal), but your main
-# local endpoint for compression (just needs text).
-#
-# Provider options:
-#   "auto"       - Best available: OpenRouter → Nous Portal → main endpoint (default)
-#   "openrouter" - Force OpenRouter (requires OPENROUTER_API_KEY)
-#   "nous"       - Force Nous Portal (requires: hermes login)
-#   "codex"      - Force Codex OAuth (requires: hermes model → Codex).
-#                  Uses gpt-5.3-codex which supports vision.
-#   "main"       - Use your custom endpoint (OPENAI_BASE_URL + OPENAI_API_KEY).
-#                  Works with OpenAI API, local models, or any OpenAI-compatible
-#                  endpoint.  Also falls back to Codex OAuth and API-key providers.
-#
-# Model: leave empty to use the provider's default.  When empty, OpenRouter
-# uses "google/gemini-3-flash-preview" and Nous uses "gemini-3-flash".
-# Other providers pick a sensible default automatically.
-#
-# auxiliary:
-#   # Image analysis: vision_analyze tool + browser screenshots
-#   vision:
-#     provider: "auto"
-#     model: ""              # e.g. "google/gemini-2.5-flash", "openai/gpt-4o"
-#
-#   # Web page scraping / summarization + browser page text extraction
-#   web_extract:
-#     provider: "auto"
-#     model: ""

 # =============================================================================
 # Persistent Memory
@@ -635,8 +585,3 @@ display:
  #   verbose: Full args, results, and debug logs (same as /verbose)
  # Toggle at runtime with /verbose in the CLI
  tool_progress: all
-
-  # Play terminal bell when agent finishes a response.
-  # Useful for long-running tasks — your terminal will ding when the agent is done.
-  # Works over SSH. Most terminals can be configured to flash the taskbar or play a sound.
-  bell_on_complete: false
@@ -161,7 +161,6 @@ def load_cli_config() -> Dict[str, Any]:
        },
        "browser": {
            "inactivity_timeout": 120,  # Auto-cleanup inactive browser sessions after 2 min
-            "record_sessions": False,  # Auto-record browser sessions as WebM videos
        },
        "compression": {
            "enabled": True,      # Auto-compress when approaching context limit
@@ -194,7 +193,6 @@ def load_cli_config() -> Dict[str, Any]:
        "toolsets": ["all"],
        "display": {
            "compact": False,
-            "resume_display": "full",
        },
        "clarify": {
            "timeout": 120,  # Seconds to wait for a clarify answer before auto-proceeding
@@ -334,43 +332,12 @@ def load_cli_config() -> Dict[str, Any]:
        "enabled": "CONTEXT_COMPRESSION_ENABLED",
        "threshold": "CONTEXT_COMPRESSION_THRESHOLD",
        "summary_model": "CONTEXT_COMPRESSION_MODEL",
-        "summary_provider": "CONTEXT_COMPRESSION_PROVIDER",
    }
    
    for config_key, env_var in compression_env_mappings.items():
        if config_key in compression_config:
            os.environ[env_var] = str(compression_config[config_key])
    
-    # Apply auxiliary model overrides to environment variables.
-    # Vision and web_extract each have their own provider + model pair.
-    # (Compression is handled in the compression section above.)
-    # Only set env vars for non-empty / non-default values so auto-detection
-    # still works.
-    auxiliary_config = defaults.get("auxiliary", {})
-    auxiliary_task_env = {
-        # config key → (provider env var, model env var)
-        "vision":      ("AUXILIARY_VISION_PROVIDER",      "AUXILIARY_VISION_MODEL"),
-        "web_extract": ("AUXILIARY_WEB_EXTRACT_PROVIDER",  "AUXILIARY_WEB_EXTRACT_MODEL"),
-    }
-    
-    for task_key, (prov_env, model_env) in auxiliary_task_env.items():
-        task_cfg = auxiliary_config.get(task_key, {})
-        if not isinstance(task_cfg, dict):
-            continue
-        prov = str(task_cfg.get("provider", "")).strip()
-        model = str(task_cfg.get("model", "")).strip()
-        if prov and prov != "auto":
-            os.environ[prov_env] = prov
-        if model:
-            os.environ[model_env] = model
-    
-    # Security settings
-    security_config = defaults.get("security", {})
-    if isinstance(security_config, dict):
-        redact = security_config.get("redact_secrets")
-        if redact is not None:
-            os.environ["HERMES_REDACT_SECRETS"] = str(redact).lower()
-
    return defaults

 # Load configuration at module startup
@@ -462,8 +429,7 @@ def _setup_worktree(repo_root: str = None) -> Optional[Dict[str, str]]:

    repo_root = repo_root or _git_repo_root()
    if not repo_root:
-        print("\033[31m✗ --worktree requires being inside a git repository.\033[0m")
-        print("  cd into your project repo first, then run hermes -w")
+        print("\033[33m⚠ --worktree: not inside a git repository, skipping.\033[0m")
        return None

    short_id = uuid.uuid4().hex[:8]
@@ -1041,19 +1007,11 @@ class HermesCLI:
        self.compact = compact if compact is not None else CLI_CONFIG["display"].get("compact", False)
        # tool_progress: "off", "new", "all", "verbose" (from config.yaml display section)
        self.tool_progress_mode = CLI_CONFIG["display"].get("tool_progress", "all")
-        # resume_display: "full" (show history) | "minimal" (one-liner only)
-        self.resume_display = CLI_CONFIG["display"].get("resume_display", "full")
-        # bell_on_complete: play terminal bell (\a) when agent finishes a response
-        self.bell_on_complete = CLI_CONFIG["display"].get("bell_on_complete", False)
        self.verbose = verbose if verbose is not None else (self.tool_progress_mode == "verbose")
        
        # Configuration - priority: CLI args > env vars > config file
        # Model can come from: CLI arg, LLM_MODEL env, OPENAI_MODEL env (custom endpoint), or config
        self.model = model or os.getenv("LLM_MODEL") or os.getenv("OPENAI_MODEL") or CLI_CONFIG["model"]["default"]
-        # Track whether model was explicitly chosen by the user or fell back
-        # to the global default.  Provider-specific normalisation may override
-        # the default silently but should warn when overriding an explicit choice.
-        self._model_is_default = not (model or os.getenv("LLM_MODEL") or os.getenv("OPENAI_MODEL"))

        self._explicit_api_key = api_key
        self._explicit_base_url = base_url
@@ -1128,10 +1086,6 @@ class HermesCLI:
        self._provider_require_params = pr.get("require_parameters", False)
        self._provider_data_collection = pr.get("data_collection")
        
-        # Fallback model config — tried when primary provider fails after retries
-        fb = CLI_CONFIG.get("fallback_model") or {}
-        self._fallback_model = fb if fb.get("provider") and fb.get("model") else None
-
        # Agent will be initialized on first use
        self.agent: Optional[AIAgent] = None
        self._app = None  # prompt_toolkit Application (set in run())
@@ -1172,60 +1126,6 @@ class HermesCLI:
            self._last_invalidate = now
            self._app.invalidate()

-    def _normalize_model_for_provider(self, resolved_provider: str) -> bool:
-        """Strip provider prefixes and swap the default model for Codex.
-
-        When the resolved provider is ``openai-codex``:
-
-        1. Strip any ``provider/`` prefix (the Codex Responses API only
-           accepts bare model slugs like ``gpt-5.4``, not ``openai/gpt-5.4``).
-        2. If the active model is still the *untouched default* (user never
-           explicitly chose a model), replace it with a Codex-compatible
-           default so the first session doesn't immediately error.
-
-        If the user explicitly chose a model — *any* model — we trust them
-        and let the API be the judge.  No allowlists, no slug checks.
-
-        Returns True when the active model was changed.
-        """
-        if resolved_provider != "openai-codex":
-            return False
-
-        current_model = (self.model or "").strip()
-        changed = False
-
-        # 1. Strip provider prefix ("openai/gpt-5.4" → "gpt-5.4")
-        if "/" in current_model:
-            slug = current_model.split("/", 1)[1]
-            if not self._model_is_default:
-                self.console.print(
-                    f"[yellow]⚠️  Stripped provider prefix from '{current_model}'; "
-                    f"using '{slug}' for OpenAI Codex.[/]"
-                )
-            self.model = slug
-            current_model = slug
-            changed = True
-
-        # 2. Replace untouched default with a Codex model
-        if self._model_is_default:
-            fallback_model = "gpt-5.3-codex"
-            try:
-                from hermes_cli.codex_models import get_codex_model_ids
-
-                available = get_codex_model_ids(
-                    access_token=self.api_key if self.api_key else None,
-                )
-                if available:
-                    fallback_model = available[0]
-            except Exception:
-                pass
-
-            if current_model != fallback_model:
-                self.model = fallback_model
-                changed = True
-
-        return changed
-
    def _ensure_runtime_credentials(self) -> bool:
        """
        Ensure runtime credentials are resolved before agent use.
@@ -1271,13 +1171,8 @@ class HermesCLI:
        self.api_key = api_key
        self.base_url = base_url

-        # Normalize model for the resolved provider (e.g. swap non-Codex
-        # models when provider is openai-codex).  Fixes #651.
-        model_changed = self._normalize_model_for_provider(resolved_provider)
-
-        # AIAgent/OpenAI client holds auth at init time, so rebuild if key,
-        # routing, or the effective model changed.
-        if (credentials_changed or routing_changed or model_changed) and self.agent is not None:
+        # AIAgent/OpenAI client holds auth at init time, so rebuild if key rotated
+        if (credentials_changed or routing_changed) and self.agent is not None:
            self.agent = None

        return True
@@ -1304,11 +1199,8 @@ class HermesCLI:
            except Exception as e:
                logger.debug("SQLite session store not available: %s", e)
        
-        # If resuming, validate the session exists and load its history.
-        # _preload_resumed_session() may have already loaded it (called from
-        # run() for immediate display).  In that case, conversation_history
-        # is non-empty and we skip the DB round-trip.
-        if self._resumed and self._session_db and not self.conversation_history:
+        # If resuming, validate the session exists and load its history
+        if self._resumed and self._session_db:
            session_meta = self._session_db.get_session(self.session_id)
            if not session_meta:
                _cprint(f"\033[1;31mSession not found: {self.session_id}{_RST}")
@@ -1363,7 +1255,6 @@ class HermesCLI:
                session_db=self._session_db,
                clarify_callback=self._clarify_callback,
                honcho_session_key=self.session_id,
-                fallback_model=self._fallback_model,
            )
            # Apply any pending title now that the session exists in the DB
            if self._pending_title and self._session_db:
@@ -1413,202 +1304,7 @@ class HermesCLI:
        self._show_tool_availability_warnings()
        
        self.console.print()
-
-    def _preload_resumed_session(self) -> bool:
-        """Load a resumed session's history from the DB early (before first chat).
-
-        Called from run() so the conversation history is available for display
-        before the user sends their first message.  Sets
-        ``self.conversation_history`` and prints the one-liner status.  Returns
-        True if history was loaded, False otherwise.
-
-        The corresponding block in ``_init_agent()`` checks whether history is
-        already populated and skips the DB round-trip.
-        """
-        if not self._resumed or not self._session_db:
-            return False
-
-        session_meta = self._session_db.get_session(self.session_id)
-        if not session_meta:
-            self.console.print(
-                f"[bold red]Session not found: {self.session_id}[/]"
-            )
-            self.console.print(
-                "[dim]Use a session ID from a previous CLI run "
-                "(hermes sessions list).[/]"
-            )
-            return False
-
-        restored = self._session_db.get_messages_as_conversation(self.session_id)
-        if restored:
-            self.conversation_history = restored
-            msg_count = len([m for m in restored if m.get("role") == "user"])
-            title_part = ""
-            if session_meta.get("title"):
-                title_part = f' "{session_meta["title"]}"'
-            self.console.print(
-                f"[#DAA520]↻ Resumed session [bold]{self.session_id}[/bold]"
-                f"{title_part} "
-                f"({msg_count} user message{'s' if msg_count != 1 else ''}, "
-                f"{len(restored)} total messages)[/]"
-            )
-        else:
-            self.console.print(
-                f"[#DAA520]Session {self.session_id} found but has no "
-                f"messages. Starting fresh.[/]"
-            )
-            return False
-
-        # Re-open the session (clear ended_at so it's active again)
-        try:
-            self._session_db._conn.execute(
-                "UPDATE sessions SET ended_at = NULL, end_reason = NULL "
-                "WHERE id = ?",
-                (self.session_id,),
-            )
-            self._session_db._conn.commit()
-        except Exception:
-            pass
-
-        return True
-
-    def _display_resumed_history(self):
-        """Render a compact recap of previous conversation messages.
-
-        Uses Rich markup with dim/muted styling so the recap is visually
-        distinct from the active conversation.  Caps the display at the
-        last ``MAX_DISPLAY_EXCHANGES`` user/assistant exchanges and shows
-        an indicator for earlier hidden messages.
-        """
-        if not self.conversation_history:
-            return
-
-        # Check config: resume_display setting
-        if self.resume_display == "minimal":
-            return
-
-        MAX_DISPLAY_EXCHANGES = 10   # max user+assistant pairs to show
-        MAX_USER_LEN = 300           # truncate user messages
-        MAX_ASST_LEN = 200           # truncate assistant text
-        MAX_ASST_LINES = 3           # max lines of assistant text
-
-        def _strip_reasoning(text: str) -> str:
-            """Remove <REASONING_SCRATCHPAD>...</REASONING_SCRATCHPAD> blocks
-            from displayed text (reasoning model internal thoughts)."""
-            import re
-            cleaned = re.sub(
-                r"<REASONING_SCRATCHPAD>.*?</REASONING_SCRATCHPAD>\s*",
-                "", text, flags=re.DOTALL,
-            )
-            # Also strip unclosed reasoning tags at the end
-            cleaned = re.sub(
-                r"<REASONING_SCRATCHPAD>.*$",
-                "", cleaned, flags=re.DOTALL,
-            )
-            return cleaned.strip()
-
-        # Collect displayable entries (skip system, tool-result messages)
-        entries = []  # list of (role, display_text)
-        for msg in self.conversation_history:
-            role = msg.get("role", "")
-            content = msg.get("content")
-            tool_calls = msg.get("tool_calls") or []
-
-            if role == "system":
-                continue
-            if role == "tool":
-                continue
-
-            if role == "user":
-                text = "" if content is None else str(content)
-                # Handle multimodal content (list of dicts)
-                if isinstance(content, list):
-                    parts = []
-                    for part in content:
-                        if isinstance(part, dict) and part.get("type") == "text":
-                            parts.append(part.get("text", ""))
-                        elif isinstance(part, dict) and part.get("type") == "image_url":
-                            parts.append("[image]")
-                    text = " ".join(parts)
-                if len(text) > MAX_USER_LEN:
-                    text = text[:MAX_USER_LEN] + "..."
-                entries.append(("user", text))
-
-            elif role == "assistant":
-                text = "" if content is None else str(content)
-                text = _strip_reasoning(text)
-                parts = []
-                if text:
-                    lines = text.splitlines()
-                    if len(lines) > MAX_ASST_LINES:
-                        text = "\n".join(lines[:MAX_ASST_LINES]) + " ..."
-                    if len(text) > MAX_ASST_LEN:
-                        text = text[:MAX_ASST_LEN] + "..."
-                    parts.append(text)
-                if tool_calls:
-                    tc_count = len(tool_calls)
-                    # Extract tool names
-                    names = []
-                    for tc in tool_calls:
-                        fn = tc.get("function", {})
-                        name = fn.get("name", "unknown") if isinstance(fn, dict) else "unknown"
-                        if name not in names:
-                            names.append(name)
-                    names_str = ", ".join(names[:4])
-                    if len(names) > 4:
-                        names_str += ", ..."
-                    noun = "call" if tc_count == 1 else "calls"
-                    parts.append(f"[{tc_count} tool {noun}: {names_str}]")
-                if not parts:
-                    # Skip pure-reasoning messages that have no visible output
-                    continue
-                entries.append(("assistant", " ".join(parts)))
-
-        if not entries:
-            return
-
-        # Determine if we need to truncate
-        skipped = 0
-        if len(entries) > MAX_DISPLAY_EXCHANGES * 2:
-            skipped = len(entries) - MAX_DISPLAY_EXCHANGES * 2
-            entries = entries[skipped:]
-
-        # Build the display using Rich
-        from rich.panel import Panel
-        from rich.text import Text
-
-        lines = Text()
-        if skipped:
-            lines.append(
-                f"  ... {skipped} earlier messages ...\n\n",
-                style="dim italic",
-            )
-
-        for i, (role, text) in enumerate(entries):
-            if role == "user":
-                lines.append("  ● You: ", style="dim bold #DAA520")
-                # Show first line inline, indent rest
-                msg_lines = text.splitlines()
-                lines.append(msg_lines[0] + "\n", style="dim")
-                for ml in msg_lines[1:]:
-                    lines.append(f"         {ml}\n", style="dim")
-            else:
-                lines.append("  ◆ Hermes: ", style="dim bold #8FBC8F")
-                msg_lines = text.splitlines()
-                lines.append(msg_lines[0] + "\n", style="dim")
-                for ml in msg_lines[1:]:
-                    lines.append(f"            {ml}\n", style="dim")
-            if i < len(entries) - 1:
-                lines.append("")  # small gap
-
-        panel = Panel(
-            lines,
-            title="[dim #DAA520]Previous Conversation[/]",
-            border_style="dim #8B8682",
-            padding=(0, 1),
-        )
-        self.console.print(panel)
-
+    
    def _try_attach_clipboard_image(self) -> bool:
        """Check clipboard for an image and attach it if found.

@@ -3135,12 +2831,6 @@ class HermesCLI:
                # nothing can interleave between the box borders.
                _cprint(f"\n{top}\n{response}\n\n{bot}")
            
-            # Play terminal bell when agent finishes (if enabled).
-            # Works over SSH — the bell propagates to the user's terminal.
-            if self.bell_on_complete:
-                sys.stdout.write("\a")
-                sys.stdout.flush()
-            
            # Combine all interrupt messages (user may have typed multiple while waiting)
            # and re-queue as one prompt for process_loop
            if pending_message and hasattr(self, '_pending_input'):
@@ -3191,13 +2881,6 @@ class HermesCLI:
    def run(self):
        """Run the interactive CLI loop with persistent input at bottom."""
        self.show_banner()
-
-        # If resuming a session, load history and display it immediately
-        # so the user has context before typing their first message.
-        if self._resumed:
-            if self._preload_resumed_session():
-                self._display_resumed_history()
-
        self.console.print("[#FFF8DC]Welcome to Hermes Agent! Type your message or /help for commands.[/]")
        self.console.print()
        
@@ -4061,10 +3744,6 @@ def main(
                _active_worktree = wt_info
                os.environ["TERMINAL_CWD"] = wt_info["path"]
                atexit.register(_cleanup_worktree, wt_info)
-            else:
-                # Worktree was explicitly requested but setup failed —
-                # don't silently run without isolation.
-                return
    else:
        wt_info = None
    
@@ -98,7 +98,6 @@ def _deliver_result(job: dict, content: str) -> None:
        "discord": Platform.DISCORD,
        "slack": Platform.SLACK,
        "whatsapp": Platform.WHATSAPP,
-        "signal": Platform.SIGNAL,
    }
    platform = platform_map.get(platform_name.lower())
    if not platform:
@@ -0,0 +1,7 @@
+# Documentation
+
+All documentation has moved to the website:
+
+**📖 [hermes-agent.nousresearch.com/docs](https://hermes-agent.nousresearch.com/docs/)**
+
+The documentation source files live in [`website/docs/`](../website/docs/).
@@ -0,0 +1,345 @@
+# send_file Integration Map — Hermes Agent Codebase Deep Dive
+
+## 1. environments/tool_context.py — Base64 File Transfer Implementation
+
+### upload_file() (lines 153-205)
+- Reads local file as raw bytes, base64-encodes to ASCII string
+- Creates parent dirs in sandbox via `self.terminal(f"mkdir -p {parent}")`
+- **Chunk size:** 60,000 chars (~60KB per shell command)
+- **Small files (<=60KB b64):** Single `printf '%s' '{b64}' | base64 -d > {remote_path}`
+- **Large files:** Writes chunks to `/tmp/_hermes_upload.b64` via `printf >> append`, then `base64 -d` to target
+- **Error handling:** Checks local file exists; returns `{exit_code, output}`
+- **Size limits:** No explicit limit, but shell arg limit ~2MB means chunking is necessary for files >~45KB raw
+- **No theoretical max** — but very large files would be slow (many terminal round trips)
+
+### download_file() (lines 234-278)
+- Runs `base64 {remote_path}` inside sandbox, captures stdout
+- Strips output, base64-decodes to raw bytes
+- Writes to host filesystem with parent dir creation
+- **Error handling:** Checks exit code, empty output, decode errors
+- Returns `{success: bool, bytes: int}` or `{success: false, error: str}`
+- **Size limit:** Bounded by terminal output buffer (practical limit ~few MB via base64 terminal output)
+
+### Promotion potential:
+- These methods work via `self.terminal()` — they're environment-agnostic
+- Could be directly lifted into a new tool that operates on the agent's current sandbox
+- For send_file, this `download_file()` pattern is the key: it extracts files from sandbox → host
+
+## 2. tools/environments/base.py — BaseEnvironment Interface
+
+### Current methods:
+- `execute(command, cwd, timeout, stdin_data)` → `{output, returncode}`
+- `cleanup()` — release resources
+- `stop()` — alias for cleanup
+- `_prepare_command()` — sudo transformation
+- `_build_run_kwargs()` — subprocess kwargs
+- `_timeout_result()` — standard timeout dict
+
+### What would need to be added for file transfer:
+- **Nothing required at this level.** File transfer can be implemented via `execute()` (base64 over terminal, like ToolContext does) or via environment-specific methods.
+- Optional: `upload_file(local_path, remote_path)` and `download_file(remote_path, local_path)` methods could be added to BaseEnvironment for optimized per-backend transfers, but the base64-over-terminal approach already works universally.
+
+## 3. tools/environments/docker.py — Docker Container Details
+
+### Container ID tracking:
+- `self._container_id` stored at init from `self._inner.container_id`
+- Inner is `minisweagent.environments.docker.DockerEnvironment`
+- Container ID is a standard Docker container hash
+
+### docker cp feasibility:
+- **YES**, `docker cp` could be used for optimized file transfer:
+  - `docker cp {container_id}:{remote_path} {local_path}` (download)
+  - `docker cp {local_path} {container_id}:{remote_path}` (upload)
+- Much faster than base64-over-terminal for large files
+- Container ID is directly accessible via `env._container_id` or `env._inner.container_id`
+
+### Volumes mounted:
+- **Persistent mode:** Bind mounts at `~/.hermes/sandboxes/docker/{task_id}/workspace` → `/workspace` and `.../home` → `/root`
+- **Ephemeral mode:** tmpfs at `/workspace` (10GB), `/home` (1GB), `/root` (1GB)
+- **User volumes:** From `config.yaml docker_volumes` (arbitrary `-v` mounts)
+- **Security tmpfs:** `/tmp` (512MB), `/var/tmp` (256MB), `/run` (64MB)
+
+### Direct host access for persistent mode:
+- If persistent, files at `/workspace/foo.txt` are just `~/.hermes/sandboxes/docker/{task_id}/workspace/foo.txt` on host — no transfer needed!
+
+## 4. tools/environments/ssh.py — SSH Connection Management
+
+### Connection management:
+- Uses SSH ControlMaster for persistent connection
+- Control socket at `/tmp/hermes-ssh/{user}@{host}:{port}.sock`
+- ControlPersist=300 (5 min keepalive)
+- BatchMode=yes (non-interactive)
+- Stores: `self.host`, `self.user`, `self.port`, `self.key_path`
+
+### SCP/SFTP feasibility:
+- **YES**, SCP can piggyback on the ControlMaster socket:
+  - `scp -o ControlPath={socket} {user}@{host}:{remote} {local}` (download)
+  - `scp -o ControlPath={socket} {local} {user}@{host}:{remote}` (upload)
+- Same SSH key and connection reuse — zero additional auth
+- Would be much faster than base64-over-terminal for large files
+
+## 5. tools/environments/modal.py — Modal Sandbox Filesystem
+
+### Filesystem API exposure:
+- **Not directly.** The inner `SwerexModalEnvironment` wraps Modal's sandbox
+- The sandbox object is accessible at: `env._inner.deployment._sandbox`
+- Modal's Python SDK exposes `sandbox.open()` for file I/O — but only via async API
+- Currently only used for `snapshot_filesystem()` during cleanup
+- **Could use:** `sandbox.open(path, "rb")` to read files or `sandbox.open(path, "wb")` to write
+- **Alternative:** Base64-over-terminal already works via `execute()` — simpler, no SDK dependency
+
+## 6. gateway/platforms/base.py — MEDIA: Tag Flow (Complete)
+
+### extract_media() (lines 587-620):
+- **Pattern:** `MEDIA:\S+` — extracts file paths after MEDIA: prefix
+- **Voice flag:** `[[audio_as_voice]]` global directive sets `is_voice=True` for all media in message
+- Returns `List[Tuple[str, bool]]` (path, is_voice) and cleaned content
+
+### _process_message_background() media routing (lines 752-786):
+- After extracting MEDIA tags, routes by file extension:
+  - `.ogg .opus .mp3 .wav .m4a` → `send_voice()`
+  - `.mp4 .mov .avi .mkv .3gp` → `send_video()`
+  - `.jpg .jpeg .png .webp .gif` → `send_image_file()`
+  - **Everything else** → `send_document()`
+- This routing already supports arbitrary files!
+
+### send_* method inventory (base class):
+- `send(chat_id, content, reply_to, metadata)` — ABSTRACT, text
+- `send_image(chat_id, image_url, caption, reply_to)` — URL-based images
+- `send_animation(chat_id, animation_url, caption, reply_to)` — GIF animations
+- `send_voice(chat_id, audio_path, caption, reply_to)` — voice messages
+- `send_video(chat_id, video_path, caption, reply_to)` — video files
+- `send_document(chat_id, file_path, caption, file_name, reply_to)` — generic files
+- `send_image_file(chat_id, image_path, caption, reply_to)` — local image files
+- `send_typing(chat_id)` — typing indicator
+- `edit_message(chat_id, message_id, content)` — edit sent messages
+
+### What's missing:
+- **Telegram:** No override for `send_document` — falls back to text! (`send_image_file` ✅ added)
+- **Discord:** No override for `send_document` — falls back to text! (`send_image_file` ✅ added)
+- **Slack:** No override for `send_document` — falls back to text! (`send_image_file` ✅ added)
+- **WhatsApp:** Has `send_document` and `send_image_file` via bridge — COMPLETE.
+- The base class defaults just send "📎 File: /path" as text — useless for actual file delivery.
+
+## 7. gateway/platforms/telegram.py — Send Method Analysis
+
+### Implemented send methods:
+- `send()` — MarkdownV2 text with fallback to plain
+- `send_voice()` — `.ogg`/`.opus` as `send_voice()`, others as `send_audio()`
+- `send_image()` — URL-based via `send_photo()`
+- `send_image_file()` — local file via `send_photo(photo=open(path, 'rb'))` ✅
+- `send_animation()` — GIF via `send_animation()`
+- `send_typing()` — "typing" chat action
+- `edit_message()` — edit text messages
+
+### MISSING:
+- **`send_document()` NOT overridden** — Need to add `self._bot.send_document(chat_id, document=open(file_path, 'rb'), ...)`
+- **`send_video()` NOT overridden** — Need to add `self._bot.send_video(...)`
+
+## 8. gateway/platforms/discord.py — Send Method Analysis
+
+### Implemented send methods:
+- `send()` — text messages with chunking
+- `send_voice()` — discord.File attachment
+- `send_image()` — downloads URL, creates discord.File attachment
+- `send_image_file()` — local file via discord.File attachment ✅
+- `send_typing()` — channel.typing()
+- `edit_message()` — edit text messages
+
+### MISSING:
+- **`send_document()` NOT overridden** — Need to add discord.File attachment
+- **`send_video()` NOT overridden** — Need to add discord.File attachment
+
+## 9. gateway/run.py — User File Attachment Handling
+
+### Current attachment flow:
+1. **Telegram photos** (line 509-529): Download via `photo.get_file()` → `cache_image_from_bytes()` → vision auto-analysis
+2. **Telegram voice** (line 532-541): Download → `cache_audio_from_bytes()` → STT transcription
+3. **Telegram audio** (line 542-551): Same pattern
+4. **Telegram documents** (line 553-617): Extension validation against `SUPPORTED_DOCUMENT_TYPES`, 20MB limit, content injection for text files
+5. **Discord attachments** (line 717-751): Content-type detection, image/audio caching, URL fallback for other types
+6. **Gateway run.py** (lines 818-883): Auto-analyzes images with vision, transcribes audio, enriches document messages with context notes
+
+### Key insight: Files are always cached to host filesystem first, then processed. The agent sees local file paths.
+
+## 10. tools/terminal_tool.py — Terminal Tool & Environment Interaction
+
+### How it manages environments:
+- Global dict `_active_environments: Dict[str, Any]` keyed by task_id
+- Per-task creation locks prevent duplicate sandbox creation
+- Auto-cleanup thread kills idle environments after `TERMINAL_LIFETIME_SECONDS`
+- `_get_env_config()` reads all TERMINAL_* env vars for backend selection
+- `_create_environment()` factory creates the right backend type
+
+### Could send_file piggyback?
+- **YES.** send_file needs access to the same environment to extract files from sandboxes.
+- It can reuse `_active_environments[task_id]` to get the environment, then:
+  - Docker: Use `docker cp` via `env._container_id`
+  - SSH: Use `scp` via `env.control_socket`
+  - Local: Just read the file directly
+  - Modal: Use base64-over-terminal via `env.execute()`
+- The file_tools.py module already does this with `ShellFileOperations` — read_file/write_file/search/patch all share the same env instance.
+
+## 11. tools/tts_tool.py — Working Example of File Delivery
+
+### Flow:
+1. Generate audio file to `~/.hermes/audio_cache/tts_TIMESTAMP.{ogg,mp3}`
+2. Return JSON with `media_tag: "MEDIA:/path/to/file"`
+3. For Telegram voice: prepend `[[audio_as_voice]]` directive
+4. The LLM includes the MEDIA tag in its response text
+5. `BasePlatformAdapter._process_message_background()` calls `extract_media()` to find the tag
+6. Routes by extension → `send_voice()` for audio files
+7. Platform adapter sends the file natively
+
+### Key pattern: Tool saves file to host → returns MEDIA: path → LLM echoes it → gateway extracts → platform delivers
+
+## 12. tools/image_generation_tool.py — Working Example of Image Delivery
+
+### Flow:
+1. Call FAL.ai API → get image URL
+2. Return JSON with `image: "https://fal.media/..."` URL
+3. The LLM includes the URL in markdown: `![description](URL)`
+4. `BasePlatformAdapter.extract_images()` finds `![alt](url)` patterns
+5. Routes through `send_image()` (URL) or `send_animation()` (GIF)
+6. Platform downloads and sends natively
+
+### Key difference from TTS: Images are URL-based, not local files. The gateway downloads at send time.
+
+---
+
+# INTEGRATION MAP: Where send_file Hooks In
+
+## Architecture Decision: MEDIA: Tag Protocol vs. New Tool
+
+The MEDIA: tag protocol is already the established pattern for file delivery. Two options:
+
+### Option A: Pure MEDIA: Tag (Minimal Change)
+- No new tool needed
+- Agent downloads file from sandbox to host using terminal (base64)
+- Saves to known location (e.g., `~/.hermes/file_cache/`)
+- Includes `MEDIA:/path` in response text
+- Existing routing in `_process_message_background()` handles delivery
+- **Problem:** Agent has to manually do base64 dance + know about MEDIA: convention
+
+### Option B: Dedicated send_file Tool (Recommended)
+- New tool that the agent calls with `(file_path, caption?)`
+- Tool handles the sandbox → host extraction automatically
+- Returns MEDIA: tag that gets routed through existing pipeline
+- Much cleaner agent experience
+
+## Implementation Plan for Option B
+
+### Files to CREATE:
+
+1. **`tools/send_file_tool.py`** — The new tool
+   - Accepts: `file_path` (path in sandbox), `caption` (optional)
+   - Detects environment backend from `_active_environments`
+   - Extracts file from sandbox:
+     - **local:** `shutil.copy()` or direct path
+     - **docker:** `docker cp {container_id}:{path} {local_cache}/` 
+     - **ssh:** `scp -o ControlPath=... {user}@{host}:{path} {local_cache}/`
+     - **modal:** base64-over-terminal via `env.execute("base64 {path}")`
+   - Saves to `~/.hermes/file_cache/{uuid}_{filename}`
+   - Returns: `MEDIA:/cached/path` in response for gateway to pick up
+   - Register with `registry.register(name="send_file", toolset="file", ...)`
+
+### Files to MODIFY:
+
+2. **`gateway/platforms/telegram.py`** — Add missing send methods:
+   ```python
+   async def send_document(self, chat_id, file_path, caption=None, file_name=None, reply_to=None):
+       with open(file_path, "rb") as f:
+           msg = await self._bot.send_document(
+               chat_id=int(chat_id), document=f,
+               caption=caption, filename=file_name or os.path.basename(file_path))
+       return SendResult(success=True, message_id=str(msg.message_id))
+   
+   async def send_image_file(self, chat_id, image_path, caption=None, reply_to=None):
+       with open(image_path, "rb") as f:
+           msg = await self._bot.send_photo(chat_id=int(chat_id), photo=f, caption=caption)
+       return SendResult(success=True, message_id=str(msg.message_id))
+   
+   async def send_video(self, chat_id, video_path, caption=None, reply_to=None):
+       with open(video_path, "rb") as f:
+           msg = await self._bot.send_video(chat_id=int(chat_id), video=f, caption=caption)
+       return SendResult(success=True, message_id=str(msg.message_id))
+   ```
+
+3. **`gateway/platforms/discord.py`** — Add missing send methods:
+   ```python
+   async def send_document(self, chat_id, file_path, caption=None, file_name=None, reply_to=None):
+       channel = self._client.get_channel(int(chat_id)) or await self._client.fetch_channel(int(chat_id))
+       with open(file_path, "rb") as f:
+           file = discord.File(io.BytesIO(f.read()), filename=file_name or os.path.basename(file_path))
+           msg = await channel.send(content=caption, file=file)
+       return SendResult(success=True, message_id=str(msg.id))
+   
+   async def send_image_file(self, chat_id, image_path, caption=None, reply_to=None):
+       # Same pattern as send_document with image filename
+   
+   async def send_video(self, chat_id, video_path, caption=None, reply_to=None):
+       # Same pattern, discord renders video attachments inline
+   ```
+
+4. **`toolsets.py`** — Add `"send_file"` to `_HERMES_CORE_TOOLS` list
+
+5. **`agent/prompt_builder.py`** — Update platform hints to mention send_file tool
+
+### Code that can be REUSED (zero rewrite):
+
+- `BasePlatformAdapter.extract_media()` — Already extracts MEDIA: tags
+- `BasePlatformAdapter._process_message_background()` — Already routes by extension
+- `ToolContext.download_file()` — Base64-over-terminal extraction pattern
+- `tools/terminal_tool.py` _active_environments dict — Environment access
+- `tools/registry.py` — Tool registration infrastructure
+- `gateway/platforms/base.py` send_document/send_image_file/send_video signatures — Already defined
+
+### Code that needs to be WRITTEN from scratch:
+
+1. `tools/send_file_tool.py` (~150 lines):
+   - File extraction from each environment backend type
+   - Local file cache management
+   - Registry registration
+   
+2. Telegram `send_document` + `send_image_file` + `send_video` overrides (~40 lines)
+3. Discord `send_document` + `send_image_file` + `send_video` overrides (~50 lines)
+
+### Total effort: ~240 lines of new code, ~5 lines of config changes
+
+## Key Environment-Specific Extract Strategies
+
+| Backend    | Extract Method                 | Speed    | Complexity |
+|------------|-------------------------------|----------|------------|
+| local      | shutil.copy / direct path     | Instant  | None       |
+| docker     | `docker cp container:path .`  | Fast     | Low        |
+| docker+vol | Direct host path access       | Instant  | None       |
+| ssh        | `scp -o ControlPath=...`      | Fast     | Low        |
+| modal      | base64-over-terminal          | Moderate | Medium     |
+| singularity| Direct path (overlay mount)   | Fast     | Low        |
+
+## Data Flow Summary
+
+```
+Agent calls send_file(file_path="/workspace/output.pdf", caption="Here's the report")
+    │
+    ▼
+send_file_tool.py:
+    1. Get environment from _active_environments[task_id]
+    2. Detect backend type (docker/ssh/modal/local)
+    3. Extract file to ~/.hermes/file_cache/{uuid}_{filename}
+    4. Return: '{"success": true, "media_tag": "MEDIA:/home/user/.hermes/file_cache/abc123_output.pdf"}'
+    │
+    ▼
+LLM includes MEDIA: tag in its response text
+    │
+    ▼
+BasePlatformAdapter._process_message_background():
+    1. extract_media(response) → finds MEDIA:/path
+    2. Checks extension: .pdf → send_document()
+    3. Calls platform-specific send_document(chat_id, file_path, caption)
+    │
+    ▼
+TelegramAdapter.send_document() / DiscordAdapter.send_document():
+    Opens file, sends via platform API as native document attachment
+    User receives downloadable file in chat
+```
@@ -18,9 +18,14 @@ Benchmarks (eval-only):
    - benchmarks/terminalbench_2/: Terminal-Bench 2.0 evaluation
 """

-from environments.agent_loop import AgentResult, HermesAgentLoop
-from environments.tool_context import ToolContext
-from environments.hermes_base_env import HermesAgentBaseEnv, HermesAgentEnvConfig
+try:
+    from environments.agent_loop import AgentResult, HermesAgentLoop
+    from environments.tool_context import ToolContext
+    from environments.hermes_base_env import HermesAgentBaseEnv, HermesAgentEnvConfig
+except ImportError:
+    # atroposlib not installed — environments are unavailable but
+    # submodules like tool_call_parsers can still be imported directly.
+    pass

 __all__ = [
    "AgentResult",
@@ -249,23 +249,62 @@ class HermesAgentLoop:
            reasoning = _extract_reasoning_from_message(assistant_msg)
            reasoning_per_turn.append(reasoning)

-            # Check for tool calls -- standard OpenAI spec
+            # Check for tool calls -- standard OpenAI spec.
+            # Fallback: if response has no structured tool_calls but content
+            # contains raw tool call tags (e.g. <tool_call>), parse them using
+            # hermes-agent's standalone parsers. This handles the case where
+            # ManagedServer's ToolCallTranslator couldn't parse because vLLM
+            # isn't installed.
+            if (
+                not assistant_msg.tool_calls
+                and assistant_msg.content
+                and self.tool_schemas
+                and "<tool_call>" in (assistant_msg.content or "")
+            ):
+                try:
+                    from environments.tool_call_parsers import get_parser
+                    fallback_parser = get_parser("hermes")
+                    parsed_content, parsed_calls = fallback_parser.parse(
+                        assistant_msg.content
+                    )
+                    if parsed_calls:
+                        assistant_msg.tool_calls = parsed_calls
+                        if parsed_content is not None:
+                            assistant_msg.content = parsed_content
+                        logger.debug(
+                            "Fallback parser extracted %d tool calls from raw content",
+                            len(parsed_calls),
+                        )
+                except Exception:
+                    pass  # Fall through to no tool calls
+
            if assistant_msg.tool_calls:
+                # Normalize tool calls to dicts — they may come as objects
+                # (OpenAI API) or dicts (vLLM ToolCallTranslator).
+                def _tc_to_dict(tc):
+                    if isinstance(tc, dict):
+                        return {
+                            "id": tc.get("id", f"call_{uuid.uuid4().hex[:8]}"),
+                            "type": "function",
+                            "function": {
+                                "name": tc.get("function", {}).get("name", tc.get("name", "")),
+                                "arguments": tc.get("function", {}).get("arguments", tc.get("arguments", "{}")),
+                            },
+                        }
+                    return {
+                        "id": tc.id,
+                        "type": "function",
+                        "function": {
+                            "name": tc.function.name,
+                            "arguments": tc.function.arguments,
+                        },
+                    }
+
                # Build the assistant message dict for conversation history
                msg_dict: Dict[str, Any] = {
                    "role": "assistant",
                    "content": assistant_msg.content or "",
-                    "tool_calls": [
-                        {
-                            "id": tc.id,
-                            "type": "function",
-                            "function": {
-                                "name": tc.function.name,
-                                "arguments": tc.function.arguments,
-                            },
-                        }
-                        for tc in assistant_msg.tool_calls
-                    ],
+                    "tool_calls": [_tc_to_dict(tc) for tc in assistant_msg.tool_calls],
                }

                # Preserve reasoning_content for multi-turn chat template handling
@@ -278,8 +317,13 @@ class HermesAgentLoop:

                # Execute each tool call via hermes-agent's dispatch
                for tc in assistant_msg.tool_calls:
-                    tool_name = tc.function.name
-                    tool_args_raw = tc.function.arguments
+                    # Handle both object (OpenAI) and dict (vLLM) formats
+                    if isinstance(tc, dict):
+                        tool_name = tc.get("function", {}).get("name", tc.get("name", ""))
+                        tool_args_raw = tc.get("function", {}).get("arguments", tc.get("arguments", "{}"))
+                    else:
+                        tool_name = tc.function.name
+                        tool_args_raw = tc.function.arguments

                    # Validate tool name
                    if tool_name not in self.valid_tool_names:
@@ -390,10 +434,11 @@ class HermesAgentLoop:
                            pass

                    # Add tool response to conversation
+                    tc_id = tc.get("id", "") if isinstance(tc, dict) else tc.id
                    messages.append(
                        {
                            "role": "tool",
-                            "tool_call_id": tc.id,
+                            "tool_call_id": tc_id,
                            "content": tool_result,
                        }
                    )
@@ -0,0 +1,38 @@
+# OpenThoughts-TBLite Evaluation -- Docker Backend (Local Compute)
+#
+# Runs tasks in Docker containers on the local machine.
+# Sandboxed like Modal but no cloud costs. Good for dev/testing.
+#
+# Usage:
+#   python environments/benchmarks/tblite/tblite_env.py evaluate \
+#       --config environments/benchmarks/tblite/local.yaml
+#
+#   # Override concurrency:
+#   python environments/benchmarks/tblite/tblite_env.py evaluate \
+#       --config environments/benchmarks/tblite/local.yaml \
+#       --env.eval_concurrency 4
+
+env:
+  enabled_toolsets: ["terminal", "file"]
+  max_agent_turns: 60
+  max_token_length: 32000
+  agent_temperature: 0.8
+  terminal_backend: "docker"
+  terminal_timeout: 300
+  tool_pool_size: 16
+  dataset_name: "NousResearch/openthoughts-tblite"
+  test_timeout: 600
+  task_timeout: 1200
+  eval_concurrency: 8          # max 8 tasks at once
+  tokenizer_name: "NousResearch/Hermes-3-Llama-3.1-8B"
+  use_wandb: false
+  wandb_name: "openthoughts-tblite-local"
+  ensure_scores_are_not_same: false
+  data_dir_to_save_evals: "environments/benchmarks/evals/openthoughts-tblite-local"
+
+openai:
+  base_url: "https://openrouter.ai/api/v1"
+  model_name: "anthropic/claude-sonnet-4"
+  server_type: "openai"
+  health_check: false
+  # api_key loaded from OPENROUTER_API_KEY in .env
@@ -0,0 +1,40 @@
+# OpenThoughts-TBLite Evaluation -- Local vLLM Backend
+#
+# Runs against a local vLLM server with Docker sandboxes.
+#
+# Start the vLLM server from the atropos directory:
+#   python -m example_trainer.vllm_api_server \
+#       --model Qwen/Qwen3-4B-Instruct-2507 \
+#       --port 9001 \
+#       --gpu-memory-utilization 0.8 \
+#       --max-model-len=32000
+#
+# Then run:
+#   python environments/benchmarks/tblite/tblite_env.py evaluate \
+#       --config environments/benchmarks/tblite/local_vllm.yaml
+
+env:
+  enabled_toolsets: ["terminal", "file"]
+  max_agent_turns: 60
+  max_token_length: 16000
+  agent_temperature: 0.6
+  terminal_backend: "docker"
+  terminal_timeout: 300
+  tool_pool_size: 16
+  dataset_name: "NousResearch/openthoughts-tblite"
+  test_timeout: 600
+  task_timeout: 1200
+  eval_concurrency: 8
+  tool_call_parser: "hermes"
+  system_prompt: "You are an expert terminal agent. You MUST use the provided tools to complete tasks. Use the terminal tool to run shell commands, read_file to read files, write_file to write files, search_files to search, and patch to edit files. Do NOT write out solutions as text - execute them using the tools. Always start by exploring the environment with terminal commands."
+  tokenizer_name: "Qwen/Qwen3-4B-Instruct-2507"
+  use_wandb: false
+  wandb_name: "tblite-qwen3-4b-instruct"
+  ensure_scores_are_not_same: false
+  data_dir_to_save_evals: "environments/benchmarks/evals/tblite-qwen3-4b-local"
+
+openai:
+  base_url: "http://localhost:9001"
+  model_name: "Qwen/Qwen3-4B-Instruct-2507"
+  server_type: "vllm"
+  health_check: false
@@ -118,6 +118,14 @@ class TerminalBench2EvalConfig(HermesAgentEnvConfig):
        "Tasks exceeding this are scored as FAIL. Default 30 minutes.",
    )

+    # --- Eval concurrency ---
+    eval_concurrency: int = Field(
+        default=0,
+        description="Maximum number of tasks to evaluate in parallel. "
+        "0 means unlimited (all tasks run concurrently). "
+        "Set to 8 for local backends to avoid overwhelming the machine.",
+    )
+

 # Tasks that cannot run properly on Modal and are excluded from scoring.
 MODAL_INCOMPATIBLE_TASKS = {
@@ -429,8 +437,13 @@ class TerminalBench2EvalEnv(HermesAgentBaseEnv):
                    "error": "no_image",
                }

-            # --- 2. Register per-task Modal image override ---
-            register_task_env_overrides(task_id, {"modal_image": modal_image})
+            # --- 2. Register per-task image override ---
+            # Set both modal_image and docker_image so the task image is used
+            # regardless of which backend is configured.
+            register_task_env_overrides(task_id, {
+                "modal_image": modal_image,
+                "docker_image": modal_image,
+            })
            logger.info(
                "Task %s: registered image override for task_id %s",
                task_name, task_id[:8],
@@ -445,17 +458,37 @@ class TerminalBench2EvalEnv(HermesAgentBaseEnv):
            messages.append({"role": "user", "content": self.format_prompt(eval_item)})

            # --- 4. Run agent loop ---
-            agent = HermesAgentLoop(
-                server=self.server,
-                tool_schemas=tools,
-                valid_tool_names=valid_names,
-                max_turns=self.config.max_agent_turns,
-                task_id=task_id,
-                temperature=self.config.agent_temperature,
-                max_tokens=self.config.max_token_length,
-                extra_body=self.config.extra_body,
-            )
-            result = await agent.run(messages)
+            # Use ManagedServer (Phase 2) for vLLM/SGLang backends to get
+            # token-level tracking via /generate. Falls back to direct
+            # ServerManager (Phase 1) for OpenAI endpoints.
+            if self._use_managed_server():
+                async with self.server.managed_server(
+                    tokenizer=self.tokenizer,
+                    preserve_think_blocks=bool(self.config.thinking_mode),
+                ) as managed:
+                    agent = HermesAgentLoop(
+                        server=managed,
+                        tool_schemas=tools,
+                        valid_tool_names=valid_names,
+                        max_turns=self.config.max_agent_turns,
+                        task_id=task_id,
+                        temperature=self.config.agent_temperature,
+                        max_tokens=self.config.max_token_length,
+                        extra_body=self.config.extra_body,
+                    )
+                    result = await agent.run(messages)
+            else:
+                agent = HermesAgentLoop(
+                    server=self.server,
+                    tool_schemas=tools,
+                    valid_tool_names=valid_names,
+                    max_turns=self.config.max_agent_turns,
+                    task_id=task_id,
+                    temperature=self.config.agent_temperature,
+                    max_tokens=self.config.max_token_length,
+                    extra_body=self.config.extra_body,
+                )
+                result = await agent.run(messages)

            # --- 5. Verify -- run test suite in the agent's sandbox ---
            # Skip verification if the agent produced no meaningful output
@@ -655,13 +688,19 @@ class TerminalBench2EvalEnv(HermesAgentBaseEnv):

    async def _eval_with_timeout(self, item: Dict[str, Any]) -> Dict:
        """
-        Wrap rollout_and_score_eval with a per-task wall-clock timeout.
+        Wrap rollout_and_score_eval with a per-task wall-clock timeout
+        and optional concurrency limit via semaphore.

        If the task exceeds task_timeout seconds, it's automatically scored
        as FAIL. This prevents any single task from hanging indefinitely.
        """
        task_name = item.get("task_name", "unknown")
        category = item.get("category", "unknown")
+
+        # Acquire concurrency semaphore if configured
+        if self._eval_semaphore:
+            await self._eval_semaphore.acquire()
+
        try:
            return await asyncio.wait_for(
                self.rollout_and_score_eval(item),
@@ -679,6 +718,9 @@ class TerminalBench2EvalEnv(HermesAgentBaseEnv):
            }
            self._save_result(out)
            return out
+        finally:
+            if self._eval_semaphore:
+                self._eval_semaphore.release()

    async def evaluate(self, *args, **kwargs) -> None:
        """
@@ -696,6 +738,13 @@ class TerminalBench2EvalEnv(HermesAgentBaseEnv):
        """
        start_time = time.time()

+        # Set up concurrency limit if configured
+        if self.config.eval_concurrency > 0:
+            self._eval_semaphore = asyncio.Semaphore(self.config.eval_concurrency)
+            print(f"  Eval concurrency: {self.config.eval_concurrency} tasks at a time")
+        else:
+            self._eval_semaphore = None
+
        # Route all logging through tqdm.write() so the progress bar stays
        # pinned at the bottom while log lines scroll above it.
        from tqdm import tqdm
@@ -229,6 +229,12 @@ class HermesAgentBaseEnv(BaseEnv):
        from environments.agent_loop import resize_tool_pool
        resize_tool_pool(config.tool_pool_size)

+        # Set tool_parser on the ServerManager so ManagedServer uses it
+        # for bidirectional tool call translation (raw text ↔ OpenAI tool_calls).
+        if hasattr(self.server, 'tool_parser'):
+            self.server.tool_parser = config.tool_call_parser
+            print(f"🔧 Tool parser: {config.tool_call_parser}")
+
        # Current group's resolved tools (set in collect_trajectories)
        self._current_group_tools: Optional[Tuple[List[Dict], Set[str]]] = None

@@ -466,22 +472,14 @@ class HermesAgentBaseEnv(BaseEnv):
        # Run the agent loop
        result: AgentResult
        if self._use_managed_server():
-            # Phase 2: ManagedServer with parser -- exact tokens + logprobs
-            # Load the tool call parser from registry based on config
-            from environments.tool_call_parsers import get_parser
-            try:
-                tc_parser = get_parser(self.config.tool_call_parser)
-            except KeyError:
-                logger.warning(
-                    "Tool call parser '%s' not found, falling back to 'hermes'",
-                    self.config.tool_call_parser,
-                )
-                tc_parser = get_parser("hermes")
-
+            # Phase 2: ManagedServer with ToolCallTranslator -- exact tokens + logprobs
+            # tool_parser is set on ServerManager in __init__ and passed through
+            # to ManagedServer, which uses ToolCallTranslator for bidirectional
+            # translation between raw text and OpenAI tool_calls.
            try:
                async with self.server.managed_server(
                    tokenizer=self.tokenizer,
-                    tool_call_parser=tc_parser,
+                    preserve_think_blocks=bool(self.config.thinking_mode),
                ) as managed:
                    agent = HermesAgentLoop(
                        server=managed,
@@ -114,11 +114,27 @@ def _patch_swerex_modal():
        self._worker = _AsyncWorker()
        self._worker.start()

+        # Pre-build a modal.Image with pip fix for Modal's legacy image builder.
+        # Modal requires `python -m pip` to work during image build, but some
+        # task images (e.g., TBLite's broken-python) have intentionally broken pip.
+        # Fix: remove stale pip dist-info and reinstall via ensurepip before Modal
+        # tries to use it. This is a no-op for images where pip already works.
+        import modal as _modal
+        image_spec = self.config.image
+        if isinstance(image_spec, str):
+            image_spec = _modal.Image.from_registry(
+                image_spec,
+                setup_dockerfile_commands=[
+                    "RUN rm -rf /usr/local/lib/python*/site-packages/pip* 2>/dev/null; "
+                    "python -m ensurepip --upgrade --default-pip 2>/dev/null || true",
+                ],
+            )
+
        # Create AND start the deployment entirely on the worker's loop/thread
        # so all gRPC channels and async state are bound to that loop
        async def _create_and_start():
            deployment = ModalDeployment(
-                image=self.config.image,
+                image=image_spec,
                startup_timeout=self.config.startup_timeout,
                runtime_timeout=self.config.runtime_timeout,
                deployment_timeout=self.config.deployment_timeout,
@@ -40,8 +40,8 @@ def build_channel_directory(adapters: Dict[Any, Any]) -> Dict[str, Any]:
        except Exception as e:
            logger.warning("Channel directory: failed to build %s: %s", platform.value, e)

-    # Telegram, WhatsApp & Signal can't enumerate chats -- pull from session history
-    for plat_name in ("telegram", "whatsapp", "signal"):
+    # Telegram & WhatsApp can't enumerate chats -- pull from session history
+    for plat_name in ("telegram", "whatsapp"):
        if plat_name not in platforms:
            platforms[plat_name] = _build_from_sessions(plat_name)

@@ -52,7 +52,7 @@ def build_channel_directory(adapters: Dict[Any, Any]) -> Dict[str, Any]:

    try:
        DIRECTORY_PATH.parent.mkdir(parents=True, exist_ok=True)
-        with open(DIRECTORY_PATH, "w", encoding="utf-8") as f:
+        with open(DIRECTORY_PATH, "w") as f:
            json.dump(directory, f, indent=2, ensure_ascii=False)
    except Exception as e:
        logger.warning("Channel directory: failed to write: %s", e)
@@ -115,7 +115,7 @@ def _build_from_sessions(platform_name: str) -> List[Dict[str, str]]:

    entries = []
    try:
-        with open(sessions_path, encoding="utf-8") as f:
+        with open(sessions_path) as f:
            data = json.load(f)

        seen_ids = set()
@@ -147,7 +147,7 @@ def load_directory() -> Dict[str, Any]:
    if not DIRECTORY_PATH.exists():
        return {"updated_at": None, "platforms": {}}
    try:
-        with open(DIRECTORY_PATH, encoding="utf-8") as f:
+        with open(DIRECTORY_PATH) as f:
            return json.load(f)
    except Exception:
        return {"updated_at": None, "platforms": {}}
@@ -26,7 +26,6 @@ class Platform(Enum):
    DISCORD = "discord"
    WHATSAPP = "whatsapp"
    SLACK = "slack"
-    SIGNAL = "signal"
    HOMEASSISTANT = "homeassistant"


@@ -156,16 +155,7 @@ class GatewayConfig:
        """Return list of platforms that are enabled and configured."""
        connected = []
        for platform, config in self.platforms.items():
-            if not config.enabled:
-                continue
-            # Platforms that use token/api_key auth
-            if config.token or config.api_key:
-                connected.append(platform)
-            # WhatsApp uses enabled flag only (bridge handles auth)
-            elif platform == Platform.WHATSAPP:
-                connected.append(platform)
-            # Signal uses extra dict for config (http_url + account)
-            elif platform == Platform.SIGNAL and config.extra.get("http_url"):
+            if config.enabled and (config.token or config.api_key):
                connected.append(platform)
        return connected
    
@@ -389,26 +379,6 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
                name=os.getenv("SLACK_HOME_CHANNEL_NAME", ""),
            )
    
-    # Signal
-    signal_url = os.getenv("SIGNAL_HTTP_URL")
-    signal_account = os.getenv("SIGNAL_ACCOUNT")
-    if signal_url and signal_account:
-        if Platform.SIGNAL not in config.platforms:
-            config.platforms[Platform.SIGNAL] = PlatformConfig()
-        config.platforms[Platform.SIGNAL].enabled = True
-        config.platforms[Platform.SIGNAL].extra.update({
-            "http_url": signal_url,
-            "account": signal_account,
-            "ignore_stories": os.getenv("SIGNAL_IGNORE_STORIES", "true").lower() in ("true", "1", "yes"),
-        })
-        signal_home = os.getenv("SIGNAL_HOME_CHANNEL")
-        if signal_home:
-            config.platforms[Platform.SIGNAL].home_channel = HomeChannel(
-                platform=Platform.SIGNAL,
-                chat_id=signal_home,
-                name=os.getenv("SIGNAL_HOME_CHANNEL_NAME", "Home"),
-            )
-
    # Home Assistant
    hass_token = os.getenv("HASS_TOKEN")
    if hass_token:
@@ -73,7 +73,7 @@ def _find_session_id(platform: str, chat_id: str) -> Optional[str]:
        return None

    try:
-        with open(_SESSIONS_INDEX, encoding="utf-8") as f:
+        with open(_SESSIONS_INDEX) as f:
            data = json.load(f)
    except Exception:
        return None
@@ -103,7 +103,7 @@ def _append_to_jsonl(session_id: str, message: dict) -> None:
    """Append a message to the JSONL transcript file."""
    transcript_path = _SESSIONS_DIR / f"{session_id}.jsonl"
    try:
-        with open(transcript_path, "a", encoding="utf-8") as f:
+        with open(transcript_path, "a") as f:
            f.write(json.dumps(message, ensure_ascii=False) + "\n")
    except Exception as e:
        logger.debug("Mirror JSONL write failed: %s", e)
@@ -1,313 +0,0 @@
-# Adding a New Messaging Platform
-
-Checklist for integrating a new messaging platform into the Hermes gateway.
-Use this as a reference when building a new adapter — every item here is a
-real integration point that exists in the codebase. Missing any of them will
-cause broken functionality, missing features, or inconsistent behavior.
-
---
-
-## 1. Core Adapter (`gateway/platforms/<platform>.py`)
-
-The adapter is a subclass of `BasePlatformAdapter` from `gateway/platforms/base.py`.
-
-### Required methods
-
-| Method | Purpose |
-|--------|---------|
-| `__init__(self, config)` | Parse config, init state. Call `super().__init__(config, Platform.YOUR_PLATFORM)` |
-| `connect() -> bool` | Connect to the platform, start listeners. Return True on success |
-| `disconnect()` | Stop listeners, close connections, cancel tasks |
-| `send(chat_id, text, ...) -> SendResult` | Send a text message |
-| `send_typing(chat_id)` | Send typing indicator |
-| `send_image(chat_id, image_url, caption) -> SendResult` | Send an image |
-| `get_chat_info(chat_id) -> dict` | Return `{name, type, chat_id}` for a chat |
-
-### Optional methods (have default stubs in base)
-
-| Method | Purpose |
-|--------|---------|
-| `send_document(chat_id, path, caption)` | Send a file attachment |
-| `send_voice(chat_id, path)` | Send a voice message |
-| `send_video(chat_id, path, caption)` | Send a video |
-| `send_animation(chat_id, path, caption)` | Send a GIF/animation |
-| `send_image_file(chat_id, path, caption)` | Send image from local file |
-
-### Required function
-
-```python
-def check_<platform>_requirements() -> bool:
-    """Check if this platform's dependencies are available."""
-```
-
-### Key patterns to follow
-
- Use `self.build_source(...)` to construct `SessionSource` objects
- Call `self.handle_message(event)` to dispatch inbound messages to the gateway
- Use `MessageEvent`, `MessageType`, `SendResult` from base
- Use `cache_image_from_bytes`, `cache_audio_from_bytes`, `cache_document_from_bytes` for attachments
- Filter self-messages (prevent reply loops)
- Filter sync/echo messages if the platform has them
- Redact sensitive identifiers (phone numbers, tokens) in all log output
- Implement reconnection with exponential backoff + jitter for streaming connections
- Set `MAX_MESSAGE_LENGTH` if the platform has message size limits
-
---
-
-## 2. Platform Enum (`gateway/config.py`)
-
-Add the platform to the `Platform` enum:
-
-```python
-class Platform(Enum):
-    ...
-    YOUR_PLATFORM = "your_platform"
-```
-
-Add env var loading in `_apply_env_overrides()`:
-
-```python
-# Your Platform
-your_token = os.getenv("YOUR_PLATFORM_TOKEN")
-if your_token:
-    if Platform.YOUR_PLATFORM not in config.platforms:
-        config.platforms[Platform.YOUR_PLATFORM] = PlatformConfig()
-    config.platforms[Platform.YOUR_PLATFORM].enabled = True
-    config.platforms[Platform.YOUR_PLATFORM].token = your_token
-```
-
-Update `get_connected_platforms()` if your platform doesn't use token/api_key
-(e.g., WhatsApp uses `enabled` flag, Signal uses `extra` dict).
-
---
-
-## 3. Adapter Factory (`gateway/run.py`)
-
-Add to `_create_adapter()`:
-
-```python
-elif platform == Platform.YOUR_PLATFORM:
-    from gateway.platforms.your_platform import YourAdapter, check_your_requirements
-    if not check_your_requirements():
-        logger.warning("Your Platform: dependencies not met")
-        return None
-    return YourAdapter(config)
-```
-
---
-
-## 4. Authorization Maps (`gateway/run.py`)
-
-Add to BOTH dicts in `_is_user_authorized()`:
-
-```python
-platform_env_map = {
-    ...
-    Platform.YOUR_PLATFORM: "YOUR_PLATFORM_ALLOWED_USERS",
-}
-platform_allow_all_map = {
-    ...
-    Platform.YOUR_PLATFORM: "YOUR_PLATFORM_ALLOW_ALL_USERS",
-}
-```
-
---
-
-## 5. Session Source (`gateway/session.py`)
-
-If your platform needs extra identity fields (e.g., Signal's UUID alongside
-phone number), add them to the `SessionSource` dataclass with `Optional` defaults,
-and update `to_dict()`, `from_dict()`, and `build_source()` in base.py.
-
---
-
-## 6. System Prompt Hints (`agent/prompt_builder.py`)
-
-Add a `PLATFORM_HINTS` entry so the agent knows what platform it's on:
-
-```python
-PLATFORM_HINTS = {
-    ...
-    "your_platform": (
-        "You are on Your Platform. "
-        "Describe formatting capabilities, media support, etc."
-    ),
-}
-```
-
-Without this, the agent won't know it's on your platform and may use
-inappropriate formatting (e.g., markdown on platforms that don't render it).
-
---
-
-## 7. Toolset (`toolsets.py`)
-
-Add a named toolset for your platform:
-
-```python
-"hermes-your-platform": {
-    "description": "Your Platform bot toolset",
-    "tools": _HERMES_CORE_TOOLS,
-    "includes": []
-},
-```
-
-And add it to the `hermes-gateway` composite:
-
-```python
-"hermes-gateway": {
-    "includes": [..., "hermes-your-platform"]
-}
-```
-
---
-
-## 8. Cron Delivery (`cron/scheduler.py`)
-
-Add to `platform_map` in `_deliver_result()`:
-
-```python
-platform_map = {
-    ...
-    "your_platform": Platform.YOUR_PLATFORM,
-}
-```
-
-Without this, `schedule_cronjob(deliver="your_platform")` silently fails.
-
---
-
-## 9. Send Message Tool (`tools/send_message_tool.py`)
-
-Add to `platform_map` in `send_message_tool()`:
-
-```python
-platform_map = {
-    ...
-    "your_platform": Platform.YOUR_PLATFORM,
-}
-```
-
-Add routing in `_send_to_platform()`:
-
-```python
-elif platform == Platform.YOUR_PLATFORM:
-    return await _send_your_platform(pconfig, chat_id, message)
-```
-
-Implement `_send_your_platform()` — a standalone async function that sends
-a single message without requiring the full adapter (for use by cron jobs
-and the send_message tool outside the gateway process).
-
-Update the tool schema `target` description to include your platform example.
-
---
-
-## 10. Cronjob Tool Schema (`tools/cronjob_tools.py`)
-
-Update the `deliver` parameter description and docstring to mention your
-platform as a delivery option.
-
---
-
-## 11. Channel Directory (`gateway/channel_directory.py`)
-
-If your platform can't enumerate chats (most can't), add it to the
-session-based discovery list:
-
-```python
-for plat_name in ("telegram", "whatsapp", "signal", "your_platform"):
-```
-
---
-
-## 12. Status Display (`hermes_cli/status.py`)
-
-Add to the `platforms` dict in the Messaging Platforms section:
-
-```python
-platforms = {
-    ...
-    "Your Platform": ("YOUR_PLATFORM_TOKEN", "YOUR_PLATFORM_HOME_CHANNEL"),
-}
-```
-
---
-
-## 13. Gateway Setup Wizard (`hermes_cli/gateway.py`)
-
-Add to the `_PLATFORMS` list:
-
-```python
-{
-    "key": "your_platform",
-    "label": "Your Platform",
-    "emoji": "📱",
-    "token_var": "YOUR_PLATFORM_TOKEN",
-    "setup_instructions": [...],
-    "vars": [...],
-}
-```
-
-If your platform needs custom setup logic (connectivity testing, QR codes,
-policy choices), add a `_setup_your_platform()` function and route to it
-in the platform selection switch.
-
-Update `_platform_status()` if your platform's "configured" check differs
-from the standard `bool(get_env_value(token_var))`.
-
---
-
-## 14. Phone/ID Redaction (`agent/redact.py`)
-
-If your platform uses sensitive identifiers (phone numbers, etc.), add a
-regex pattern and redaction function to `agent/redact.py`. This ensures
-identifiers are masked in ALL log output, not just your adapter's logs.
-
---
-
-## 15. Documentation
-
-| File | What to update |
-|------|---------------|
-| `README.md` | Platform list in feature table + documentation table |
-| `AGENTS.md` | Gateway description + env var config section |
-| `website/docs/user-guide/messaging/<platform>.md` | **NEW** — Full setup guide (see existing platform docs for template) |
-| `website/docs/user-guide/messaging/index.md` | Architecture diagram, toolset table, security examples, Next Steps links |
-| `website/docs/reference/environment-variables.md` | All env vars for the platform |
-
---
-
-## 16. Tests (`tests/gateway/test_<platform>.py`)
-
-Recommended test coverage:
-
- Platform enum exists with correct value
- Config loading from env vars via `_apply_env_overrides`
- Adapter init (config parsing, allowlist handling, default values)
- Helper functions (redaction, parsing, file type detection)
- Session source round-trip (to_dict → from_dict)
- Authorization integration (platform in allowlist maps)
- Send message tool routing (platform in platform_map)
-
-Optional but valuable:
- Async tests for message handling flow (mock the platform API)
- SSE/WebSocket reconnection logic
- Attachment processing
- Group message filtering
-
---
-
-## Quick Verification
-
-After implementing everything, verify with:
-
-```bash
-# All tests pass
-python -m pytest tests/ -q
-
-# Grep for your platform name to find any missed integration points
-grep -r "telegram\|discord\|whatsapp\|slack" gateway/ tools/ agent/ cron/ hermes_cli/ toolsets.py \
-  --include="*.py" -l | sort -u
-# Check each file in the output — if it mentions other platforms but not yours, you missed it
-```
@@ -838,8 +838,6 @@ class BasePlatformAdapter(ABC):
        user_name: Optional[str] = None,
        thread_id: Optional[str] = None,
        chat_topic: Optional[str] = None,
-        user_id_alt: Optional[str] = None,
-        chat_id_alt: Optional[str] = None,
    ) -> SessionSource:
        """Helper to build a SessionSource for this platform."""
        # Normalize empty topic to None
@@ -854,8 +852,6 @@ class BasePlatformAdapter(ABC):
            user_name=user_name,
            thread_id=str(thread_id) if thread_id else None,
            chat_topic=chat_topic.strip() if chat_topic else None,
-            user_id_alt=user_id_alt,
-            chat_id_alt=chat_id_alt,
        )
    
    @abstractmethod
@@ -592,89 +592,6 @@ class DiscordAdapter(BasePlatformAdapter):
            except Exception as e:
                logger.debug("Discord followup failed: %s", e)

-        @tree.command(name="compress", description="Compress conversation context")
-        async def slash_compress(interaction: discord.Interaction):
-            await interaction.response.defer(ephemeral=True)
-            event = self._build_slash_event(interaction, "/compress")
-            await self.handle_message(event)
-            try:
-                await interaction.followup.send("Done~", ephemeral=True)
-            except Exception as e:
-                logger.debug("Discord followup failed: %s", e)
-
-        @tree.command(name="title", description="Set or show the session title")
-        @discord.app_commands.describe(name="Session title. Leave empty to show current.")
-        async def slash_title(interaction: discord.Interaction, name: str = ""):
-            await interaction.response.defer(ephemeral=True)
-            event = self._build_slash_event(interaction, f"/title {name}".strip())
-            await self.handle_message(event)
-            try:
-                await interaction.followup.send("Done~", ephemeral=True)
-            except Exception as e:
-                logger.debug("Discord followup failed: %s", e)
-
-        @tree.command(name="resume", description="Resume a previously-named session")
-        @discord.app_commands.describe(name="Session name to resume. Leave empty to list sessions.")
-        async def slash_resume(interaction: discord.Interaction, name: str = ""):
-            await interaction.response.defer(ephemeral=True)
-            event = self._build_slash_event(interaction, f"/resume {name}".strip())
-            await self.handle_message(event)
-            try:
-                await interaction.followup.send("Done~", ephemeral=True)
-            except Exception as e:
-                logger.debug("Discord followup failed: %s", e)
-
-        @tree.command(name="usage", description="Show token usage for this session")
-        async def slash_usage(interaction: discord.Interaction):
-            await interaction.response.defer(ephemeral=True)
-            event = self._build_slash_event(interaction, "/usage")
-            await self.handle_message(event)
-            try:
-                await interaction.followup.send("Done~", ephemeral=True)
-            except Exception as e:
-                logger.debug("Discord followup failed: %s", e)
-
-        @tree.command(name="provider", description="Show available providers")
-        async def slash_provider(interaction: discord.Interaction):
-            await interaction.response.defer(ephemeral=True)
-            event = self._build_slash_event(interaction, "/provider")
-            await self.handle_message(event)
-            try:
-                await interaction.followup.send("Done~", ephemeral=True)
-            except Exception as e:
-                logger.debug("Discord followup failed: %s", e)
-
-        @tree.command(name="help", description="Show available commands")
-        async def slash_help(interaction: discord.Interaction):
-            await interaction.response.defer(ephemeral=True)
-            event = self._build_slash_event(interaction, "/help")
-            await self.handle_message(event)
-            try:
-                await interaction.followup.send("Done~", ephemeral=True)
-            except Exception as e:
-                logger.debug("Discord followup failed: %s", e)
-
-        @tree.command(name="insights", description="Show usage insights and analytics")
-        @discord.app_commands.describe(days="Number of days to analyze (default: 7)")
-        async def slash_insights(interaction: discord.Interaction, days: int = 7):
-            await interaction.response.defer(ephemeral=True)
-            event = self._build_slash_event(interaction, f"/insights {days}")
-            await self.handle_message(event)
-            try:
-                await interaction.followup.send("Done~", ephemeral=True)
-            except Exception as e:
-                logger.debug("Discord followup failed: %s", e)
-
-        @tree.command(name="reload-mcp", description="Reload MCP servers from config")
-        async def slash_reload_mcp(interaction: discord.Interaction):
-            await interaction.response.defer(ephemeral=True)
-            event = self._build_slash_event(interaction, "/reload-mcp")
-            await self.handle_message(event)
-            try:
-                await interaction.followup.send("Done~", ephemeral=True)
-            except Exception as e:
-                logger.debug("Discord followup failed: %s", e)
-
        @tree.command(name="update", description="Update Hermes Agent to the latest version")
        async def slash_update(interaction: discord.Interaction):
            await interaction.response.defer(ephemeral=True)
@@ -1,716 +0,0 @@
-"""Signal messenger platform adapter.
-
-Connects to a signal-cli daemon running in HTTP mode.
-Inbound messages arrive via SSE (Server-Sent Events) streaming.
-Outbound messages and actions use JSON-RPC 2.0 over HTTP.
-
-Based on PR #268 by ibhagwan, rebuilt with bug fixes.
-
-Requires:
-  - signal-cli installed and running: signal-cli daemon --http 127.0.0.1:8080
-  - SIGNAL_HTTP_URL and SIGNAL_ACCOUNT environment variables set
-"""
-
-import asyncio
-import base64
-import json
-import logging
-import os
-import random
-import re
-import time
-from datetime import datetime, timezone
-from pathlib import Path
-from typing import Dict, List, Optional, Any
-from urllib.parse import unquote
-
-import httpx
-
-from gateway.config import Platform, PlatformConfig
-from gateway.platforms.base import (
-    BasePlatformAdapter,
-    MessageEvent,
-    MessageType,
-    SendResult,
-    cache_image_from_bytes,
-    cache_audio_from_bytes,
-    cache_document_from_bytes,
-    cache_image_from_url,
-)
-
-logger = logging.getLogger(__name__)
-
-# ---------------------------------------------------------------------------
-# Constants
-# ---------------------------------------------------------------------------
-SIGNAL_MAX_ATTACHMENT_SIZE = 100 * 1024 * 1024  # 100 MB
-MAX_MESSAGE_LENGTH = 8000  # Signal message size limit
-TYPING_INTERVAL = 8.0  # seconds between typing indicator refreshes
-SSE_RETRY_DELAY_INITIAL = 2.0
-SSE_RETRY_DELAY_MAX = 60.0
-HEALTH_CHECK_INTERVAL = 30.0  # seconds between health checks
-HEALTH_CHECK_STALE_THRESHOLD = 120.0  # seconds without SSE activity before concern
-
-# E.164 phone number pattern for redaction
-_PHONE_RE = re.compile(r"\+[1-9]\d{6,14}")
-
-
-# ---------------------------------------------------------------------------
-# Helpers
-# ---------------------------------------------------------------------------
-
-def _redact_phone(phone: str) -> str:
-    """Redact a phone number for logging: +15551234567 -> +155****4567."""
-    if not phone:
-        return "<none>"
-    if len(phone) <= 8:
-        return phone[:2] + "****" + phone[-2:] if len(phone) > 4 else "****"
-    return phone[:4] + "****" + phone[-4:]
-
-
-def _parse_comma_list(value: str) -> List[str]:
-    """Split a comma-separated string into a list, stripping whitespace."""
-    return [v.strip() for v in value.split(",") if v.strip()]
-
-
-def _guess_extension(data: bytes) -> str:
-    """Guess file extension from magic bytes."""
-    if data[:4] == b"\x89PNG":
-        return ".png"
-    if data[:2] == b"\xff\xd8":
-        return ".jpg"
-    if data[:4] == b"GIF8":
-        return ".gif"
-    if len(data) >= 12 and data[:4] == b"RIFF" and data[8:12] == b"WEBP":
-        return ".webp"
-    if data[:4] == b"%PDF":
-        return ".pdf"
-    if len(data) >= 8 and data[4:8] == b"ftyp":
-        return ".mp4"
-    if data[:4] == b"OggS":
-        return ".ogg"
-    if len(data) >= 2 and data[0] == 0xFF and (data[1] & 0xE0) == 0xE0:
-        return ".mp3"
-    if data[:2] == b"PK":
-        return ".zip"
-    return ".bin"
-
-
-def _is_image_ext(ext: str) -> bool:
-    return ext.lower() in (".jpg", ".jpeg", ".png", ".gif", ".webp")
-
-
-def _is_audio_ext(ext: str) -> bool:
-    return ext.lower() in (".mp3", ".wav", ".ogg", ".m4a", ".aac")
-
-
-def _render_mentions(text: str, mentions: list) -> str:
-    """Replace Signal mention placeholders (\\uFFFC) with readable @identifiers.
-
-    Signal encodes @mentions as the Unicode object replacement character
-    with out-of-band metadata containing the mentioned user's UUID/number.
-    """
-    if not mentions or "\uFFFC" not in text:
-        return text
-    # Sort mentions by start position (reverse) to replace from end to start
-    # so indices don't shift as we replace
-    sorted_mentions = sorted(mentions, key=lambda m: m.get("start", 0), reverse=True)
-    for mention in sorted_mentions:
-        start = mention.get("start", 0)
-        length = mention.get("length", 1)
-        # Use the mention's number or UUID as the replacement
-        identifier = mention.get("number") or mention.get("uuid") or "user"
-        replacement = f"@{identifier}"
-        text = text[:start] + replacement + text[start + length:]
-    return text
-
-
-def check_signal_requirements() -> bool:
-    """Check if Signal is configured (has URL and account)."""
-    return bool(os.getenv("SIGNAL_HTTP_URL") and os.getenv("SIGNAL_ACCOUNT"))
-
-
-# ---------------------------------------------------------------------------
-# Signal Adapter
-# ---------------------------------------------------------------------------
-
-class SignalAdapter(BasePlatformAdapter):
-    """Signal messenger adapter using signal-cli HTTP daemon."""
-
-    platform = Platform.SIGNAL
-
-    def __init__(self, config: PlatformConfig):
-        super().__init__(config, Platform.SIGNAL)
-
-        extra = config.extra or {}
-        self.http_url = extra.get("http_url", "http://127.0.0.1:8080").rstrip("/")
-        self.account = extra.get("account", "")
-        self.ignore_stories = extra.get("ignore_stories", True)
-
-        # Parse allowlists — group policy is derived from presence of group allowlist
-        group_allowed_str = os.getenv("SIGNAL_GROUP_ALLOWED_USERS", "")
-        self.group_allow_from = set(_parse_comma_list(group_allowed_str))
-
-        # HTTP client
-        self.client: Optional[httpx.AsyncClient] = None
-
-        # Background tasks
-        self._sse_task: Optional[asyncio.Task] = None
-        self._health_monitor_task: Optional[asyncio.Task] = None
-        self._typing_tasks: Dict[str, asyncio.Task] = {}
-        self._running = False
-        self._last_sse_activity = 0.0
-        self._sse_response: Optional[httpx.Response] = None
-
-        # Normalize account for self-message filtering
-        self._account_normalized = self.account.strip()
-
-        logger.info("Signal adapter initialized: url=%s account=%s groups=%s",
-                     self.http_url, _redact_phone(self.account),
-                     "enabled" if self.group_allow_from else "disabled")
-
-    # ------------------------------------------------------------------
-    # Lifecycle
-    # ------------------------------------------------------------------
-
-    async def connect(self) -> bool:
-        """Connect to signal-cli daemon and start SSE listener."""
-        if not self.http_url or not self.account:
-            logger.error("Signal: SIGNAL_HTTP_URL and SIGNAL_ACCOUNT are required")
-            return False
-
-        self.client = httpx.AsyncClient(timeout=30.0)
-
-        # Health check — verify signal-cli daemon is reachable
-        try:
-            resp = await self.client.get(f"{self.http_url}/api/v1/check", timeout=10.0)
-            if resp.status_code != 200:
-                logger.error("Signal: health check failed (status %d)", resp.status_code)
-                return False
-        except Exception as e:
-            logger.error("Signal: cannot reach signal-cli at %s: %s", self.http_url, e)
-            return False
-
-        self._running = True
-        self._last_sse_activity = time.time()
-        self._sse_task = asyncio.create_task(self._sse_listener())
-        self._health_monitor_task = asyncio.create_task(self._health_monitor())
-
-        logger.info("Signal: connected to %s", self.http_url)
-        return True
-
-    async def disconnect(self) -> None:
-        """Stop SSE listener and clean up."""
-        self._running = False
-
-        if self._sse_task:
-            self._sse_task.cancel()
-            try:
-                await self._sse_task
-            except asyncio.CancelledError:
-                pass
-
-        if self._health_monitor_task:
-            self._health_monitor_task.cancel()
-            try:
-                await self._health_monitor_task
-            except asyncio.CancelledError:
-                pass
-
-        # Cancel all typing tasks
-        for task in self._typing_tasks.values():
-            task.cancel()
-        self._typing_tasks.clear()
-
-        if self.client:
-            await self.client.aclose()
-            self.client = None
-
-        logger.info("Signal: disconnected")
-
-    # ------------------------------------------------------------------
-    # SSE Streaming (inbound messages)
-    # ------------------------------------------------------------------
-
-    async def _sse_listener(self) -> None:
-        """Listen for SSE events from signal-cli daemon."""
-        url = f"{self.http_url}/api/v1/events?account={self.account}"
-        backoff = SSE_RETRY_DELAY_INITIAL
-
-        while self._running:
-            try:
-                logger.debug("Signal SSE: connecting to %s", url)
-                async with self.client.stream(
-                    "GET", url,
-                    headers={"Accept": "text/event-stream"},
-                    timeout=None,
-                ) as response:
-                    self._sse_response = response
-                    backoff = SSE_RETRY_DELAY_INITIAL  # Reset on successful connection
-                    self._last_sse_activity = time.time()
-                    logger.info("Signal SSE: connected")
-
-                    buffer = ""
-                    async for chunk in response.aiter_text():
-                        if not self._running:
-                            break
-                        buffer += chunk
-                        while "\n" in buffer:
-                            line, buffer = buffer.split("\n", 1)
-                            line = line.strip()
-                            if not line:
-                                continue
-                            # Parse SSE data lines
-                            if line.startswith("data:"):
-                                data_str = line[5:].strip()
-                                if not data_str:
-                                    continue
-                                self._last_sse_activity = time.time()
-                                try:
-                                    data = json.loads(data_str)
-                                    await self._handle_envelope(data)
-                                except json.JSONDecodeError:
-                                    logger.debug("Signal SSE: invalid JSON: %s", data_str[:100])
-                                except Exception:
-                                    logger.exception("Signal SSE: error handling event")
-
-            except asyncio.CancelledError:
-                break
-            except httpx.HTTPError as e:
-                if self._running:
-                    logger.warning("Signal SSE: HTTP error: %s (reconnecting in %.0fs)", e, backoff)
-            except Exception as e:
-                if self._running:
-                    logger.warning("Signal SSE: error: %s (reconnecting in %.0fs)", e, backoff)
-
-            if self._running:
-                # Add 20% jitter to prevent thundering herd on reconnection
-                jitter = backoff * 0.2 * random.random()
-                await asyncio.sleep(backoff + jitter)
-                backoff = min(backoff * 2, SSE_RETRY_DELAY_MAX)
-
-        self._sse_response = None
-
-    # ------------------------------------------------------------------
-    # Health Monitor
-    # ------------------------------------------------------------------
-
-    async def _health_monitor(self) -> None:
-        """Monitor SSE connection health and force reconnect if stale."""
-        while self._running:
-            await asyncio.sleep(HEALTH_CHECK_INTERVAL)
-            if not self._running:
-                break
-
-            elapsed = time.time() - self._last_sse_activity
-            if elapsed > HEALTH_CHECK_STALE_THRESHOLD:
-                logger.warning("Signal: SSE idle for %.0fs, checking daemon health", elapsed)
-                try:
-                    resp = await self.client.get(
-                        f"{self.http_url}/api/v1/check", timeout=10.0
-                    )
-                    if resp.status_code == 200:
-                        # Daemon is alive but SSE is idle — update activity to
-                        # avoid repeated warnings (connection may just be quiet)
-                        self._last_sse_activity = time.time()
-                        logger.debug("Signal: daemon healthy, SSE idle")
-                    else:
-                        logger.warning("Signal: health check failed (%d), forcing reconnect", resp.status_code)
-                        self._force_reconnect()
-                except Exception as e:
-                    logger.warning("Signal: health check error: %s, forcing reconnect", e)
-                    self._force_reconnect()
-
-    def _force_reconnect(self) -> None:
-        """Force SSE reconnection by closing the current response."""
-        if self._sse_response and not self._sse_response.is_stream_consumed:
-            try:
-                asyncio.create_task(self._sse_response.aclose())
-            except Exception:
-                pass
-            self._sse_response = None
-
-    # ------------------------------------------------------------------
-    # Message Handling
-    # ------------------------------------------------------------------
-
-    async def _handle_envelope(self, envelope: dict) -> None:
-        """Process an incoming signal-cli envelope."""
-        # Unwrap nested envelope if present
-        envelope_data = envelope.get("envelope", envelope)
-
-        # Filter syncMessage envelopes (sent transcripts, read receipts, etc.)
-        # signal-cli may set syncMessage to null vs omitting it, so check key existence
-        if "syncMessage" in envelope_data:
-            return
-
-        # Extract sender info
-        sender = (
-            envelope_data.get("sourceNumber")
-            or envelope_data.get("sourceUuid")
-            or envelope_data.get("source")
-        )
-        sender_name = envelope_data.get("sourceName", "")
-        sender_uuid = envelope_data.get("sourceUuid", "")
-
-        if not sender:
-            logger.debug("Signal: ignoring envelope with no sender")
-            return
-
-        # Self-message filtering — prevent reply loops
-        if self._account_normalized and sender == self._account_normalized:
-            return
-
-        # Filter stories
-        if self.ignore_stories and envelope_data.get("storyMessage"):
-            return
-
-        # Get data message — also check editMessage (edited messages contain
-        # their updated dataMessage inside editMessage.dataMessage)
-        data_message = (
-            envelope_data.get("dataMessage")
-            or (envelope_data.get("editMessage") or {}).get("dataMessage")
-        )
-        if not data_message:
-            return
-
-        # Check for group message
-        group_info = data_message.get("groupInfo")
-        group_id = group_info.get("groupId") if group_info else None
-        is_group = bool(group_id)
-
-        # Group message filtering — derived from SIGNAL_GROUP_ALLOWED_USERS:
-        # - No env var set → groups disabled (default safe behavior)
-        # - Env var set with group IDs → only those groups allowed
-        # - Env var set with "*" → all groups allowed
-        # DM auth is fully handled by run.py (_is_user_authorized)
-        if is_group:
-            if not self.group_allow_from:
-                logger.debug("Signal: ignoring group message (no SIGNAL_GROUP_ALLOWED_USERS)")
-                return
-            if "*" not in self.group_allow_from and group_id not in self.group_allow_from:
-                logger.debug("Signal: group %s not in allowlist", group_id[:8] if group_id else "?")
-                return
-
-        # Build chat info
-        chat_id = sender if not is_group else f"group:{group_id}"
-        chat_type = "group" if is_group else "dm"
-
-        # Extract text and render mentions
-        text = data_message.get("message", "")
-        mentions = data_message.get("mentions", [])
-        if text and mentions:
-            text = _render_mentions(text, mentions)
-
-        # Process attachments
-        attachments_data = data_message.get("attachments", [])
-        image_paths = []
-        audio_path = None
-        document_paths = []
-
-        if attachments_data and not getattr(self, "ignore_attachments", False):
-            for att in attachments_data:
-                att_id = att.get("id")
-                att_size = att.get("size", 0)
-                if not att_id:
-                    continue
-                if att_size > SIGNAL_MAX_ATTACHMENT_SIZE:
-                    logger.warning("Signal: attachment too large (%d bytes), skipping", att_size)
-                    continue
-                try:
-                    cached_path, ext = await self._fetch_attachment(att_id)
-                    if cached_path:
-                        if _is_image_ext(ext):
-                            image_paths.append(cached_path)
-                        elif _is_audio_ext(ext):
-                            audio_path = cached_path
-                        else:
-                            document_paths.append(cached_path)
-                except Exception:
-                    logger.exception("Signal: failed to fetch attachment %s", att_id)
-
-        # Build session source
-        source = self.build_source(
-            chat_id=chat_id,
-            chat_name=group_info.get("groupName") if group_info else sender_name,
-            chat_type=chat_type,
-            user_id=sender,
-            user_name=sender_name or sender,
-            user_id_alt=sender_uuid if sender_uuid else None,
-            chat_id_alt=group_id if is_group else None,
-        )
-
-        # Determine message type
-        msg_type = MessageType.TEXT
-        if audio_path:
-            msg_type = MessageType.VOICE
-        elif image_paths:
-            msg_type = MessageType.IMAGE
-
-        # Parse timestamp from envelope data (milliseconds since epoch)
-        ts_ms = envelope_data.get("timestamp", 0)
-        if ts_ms:
-            try:
-                timestamp = datetime.fromtimestamp(ts_ms / 1000, tz=timezone.utc)
-            except (ValueError, OSError):
-                timestamp = datetime.now(tz=timezone.utc)
-        else:
-            timestamp = datetime.now(tz=timezone.utc)
-
-        # Build and dispatch event
-        event = MessageEvent(
-            source=source,
-            text=text or "",
-            message_type=msg_type,
-            image_paths=image_paths,
-            audio_path=audio_path,
-            document_paths=document_paths,
-            timestamp=timestamp,
-        )
-
-        logger.debug("Signal: message from %s in %s: %s",
-                      _redact_phone(sender), chat_id[:20], (text or "")[:50])
-
-        await self.handle_message(event)
-
-    # ------------------------------------------------------------------
-    # Attachment Handling
-    # ------------------------------------------------------------------
-
-    async def _fetch_attachment(self, attachment_id: str) -> tuple:
-        """Fetch an attachment via JSON-RPC and cache it. Returns (path, ext)."""
-        result = await self._rpc("getAttachment", {
-            "account": self.account,
-            "attachmentId": attachment_id,
-        })
-
-        if not result:
-            return None, ""
-
-        # Result is base64-encoded file content
-        raw_data = base64.b64decode(result)
-        ext = _guess_extension(raw_data)
-
-        if _is_image_ext(ext):
-            path = cache_image_from_bytes(raw_data, ext)
-        elif _is_audio_ext(ext):
-            path = cache_audio_from_bytes(raw_data, ext)
-        else:
-            path = cache_document_from_bytes(raw_data, ext)
-
-        return path, ext
-
-    # ------------------------------------------------------------------
-    # JSON-RPC Communication
-    # ------------------------------------------------------------------
-
-    async def _rpc(self, method: str, params: dict, rpc_id: str = None) -> Any:
-        """Send a JSON-RPC 2.0 request to signal-cli daemon."""
-        if not self.client:
-            logger.warning("Signal: RPC called but client not connected")
-            return None
-
-        if rpc_id is None:
-            rpc_id = f"{method}_{int(time.time() * 1000)}"
-
-        payload = {
-            "jsonrpc": "2.0",
-            "method": method,
-            "params": params,
-            "id": rpc_id,
-        }
-
-        try:
-            resp = await self.client.post(
-                f"{self.http_url}/api/v1/rpc",
-                json=payload,
-                timeout=30.0,
-            )
-            resp.raise_for_status()
-            data = resp.json()
-
-            if "error" in data:
-                logger.warning("Signal RPC error (%s): %s", method, data["error"])
-                return None
-
-            return data.get("result")
-
-        except Exception as e:
-            logger.warning("Signal RPC %s failed: %s", method, e)
-            return None
-
-    # ------------------------------------------------------------------
-    # Sending
-    # ------------------------------------------------------------------
-
-    async def send(
-        self,
-        chat_id: str,
-        text: str,
-        reply_to_message_id: Optional[str] = None,
-        **kwargs,
-    ) -> SendResult:
-        """Send a text message."""
-        await self._stop_typing_indicator(chat_id)
-
-        params: Dict[str, Any] = {
-            "account": self.account,
-            "message": text,
-        }
-
-        if chat_id.startswith("group:"):
-            params["groupId"] = chat_id[6:]
-        else:
-            params["recipient"] = [chat_id]
-
-        result = await self._rpc("send", params)
-
-        if result is not None:
-            return SendResult(success=True)
-        return SendResult(success=False, error="RPC send failed")
-
-    async def send_typing(self, chat_id: str) -> None:
-        """Send a typing indicator."""
-        params: Dict[str, Any] = {
-            "account": self.account,
-        }
-
-        if chat_id.startswith("group:"):
-            params["groupId"] = chat_id[6:]
-        else:
-            params["recipient"] = [chat_id]
-
-        await self._rpc("sendTyping", params, rpc_id="typing")
-
-    async def send_image(
-        self,
-        chat_id: str,
-        image_url: str,
-        caption: Optional[str] = None,
-        **kwargs,
-    ) -> SendResult:
-        """Send an image. Supports http(s):// and file:// URLs."""
-        await self._stop_typing_indicator(chat_id)
-
-        # Resolve image to local path
-        if image_url.startswith("file://"):
-            file_path = unquote(image_url[7:])
-        else:
-            # Download remote image to cache
-            try:
-                file_path = await cache_image_from_url(image_url)
-            except Exception as e:
-                logger.warning("Signal: failed to download image: %s", e)
-                return SendResult(success=False, error=str(e))
-
-        if not file_path or not Path(file_path).exists():
-            return SendResult(success=False, error="Image file not found")
-
-        # Validate size
-        file_size = Path(file_path).stat().st_size
-        if file_size > SIGNAL_MAX_ATTACHMENT_SIZE:
-            return SendResult(success=False, error=f"Image too large ({file_size} bytes)")
-
-        params: Dict[str, Any] = {
-            "account": self.account,
-            "message": caption or "",
-            "attachments": [file_path],
-        }
-
-        if chat_id.startswith("group:"):
-            params["groupId"] = chat_id[6:]
-        else:
-            params["recipient"] = [chat_id]
-
-        result = await self._rpc("send", params)
-        if result is not None:
-            return SendResult(success=True)
-        return SendResult(success=False, error="RPC send with attachment failed")
-
-    async def send_document(
-        self,
-        chat_id: str,
-        file_path: str,
-        caption: Optional[str] = None,
-        filename: Optional[str] = None,
-        **kwargs,
-    ) -> SendResult:
-        """Send a document/file attachment."""
-        await self._stop_typing_indicator(chat_id)
-
-        if not Path(file_path).exists():
-            return SendResult(success=False, error="File not found")
-
-        params: Dict[str, Any] = {
-            "account": self.account,
-            "message": caption or "",
-            "attachments": [file_path],
-        }
-
-        if chat_id.startswith("group:"):
-            params["groupId"] = chat_id[6:]
-        else:
-            params["recipient"] = [chat_id]
-
-        result = await self._rpc("send", params)
-        if result is not None:
-            return SendResult(success=True)
-        return SendResult(success=False, error="RPC send document failed")
-
-    # ------------------------------------------------------------------
-    # Typing Indicators
-    # ------------------------------------------------------------------
-
-    async def _start_typing_indicator(self, chat_id: str) -> None:
-        """Start a typing indicator loop for a chat."""
-        if chat_id in self._typing_tasks:
-            return  # Already running
-
-        async def _typing_loop():
-            try:
-                while True:
-                    await self.send_typing(chat_id)
-                    await asyncio.sleep(TYPING_INTERVAL)
-            except asyncio.CancelledError:
-                pass
-
-        self._typing_tasks[chat_id] = asyncio.create_task(_typing_loop())
-
-    async def _stop_typing_indicator(self, chat_id: str) -> None:
-        """Stop a typing indicator loop for a chat."""
-        task = self._typing_tasks.pop(chat_id, None)
-        if task:
-            task.cancel()
-            try:
-                await task
-            except asyncio.CancelledError:
-                pass
-
-    # ------------------------------------------------------------------
-    # Chat Info
-    # ------------------------------------------------------------------
-
-    async def get_chat_info(self, chat_id: str) -> Dict[str, Any]:
-        """Get information about a chat/contact."""
-        if chat_id.startswith("group:"):
-            return {
-                "name": chat_id,
-                "type": "group",
-                "chat_id": chat_id,
-            }
-
-        # Try to resolve contact name
-        result = await self._rpc("getContact", {
-            "account": self.account,
-            "contactAddress": chat_id,
-        })
-
-        name = chat_id
-        if result and isinstance(result, dict):
-            name = result.get("name") or result.get("profileName") or chat_id
-
-        return {
-            "name": name,
-            "type": "dm",
-            "chat_id": chat_id,
-        }
@@ -155,14 +155,6 @@ class TelegramAdapter(BasePlatformAdapter):
                    BotCommand("status", "Show session info"),
                    BotCommand("stop", "Stop the running agent"),
                    BotCommand("sethome", "Set this chat as the home channel"),
-                    BotCommand("compress", "Compress conversation context"),
-                    BotCommand("title", "Set or show the session title"),
-                    BotCommand("resume", "Resume a previously-named session"),
-                    BotCommand("usage", "Show token usage for this session"),
-                    BotCommand("provider", "Show available providers"),
-                    BotCommand("insights", "Show usage insights and analytics"),
-                    BotCommand("update", "Update Hermes to the latest version"),
-                    BotCommand("reload_mcp", "Reload MCP servers from config"),
                    BotCommand("help", "Show available commands"),
                ])
            except Exception as e:
@@ -86,29 +86,10 @@ if _config_path.exists():
                "enabled": "CONTEXT_COMPRESSION_ENABLED",
                "threshold": "CONTEXT_COMPRESSION_THRESHOLD",
                "summary_model": "CONTEXT_COMPRESSION_MODEL",
-                "summary_provider": "CONTEXT_COMPRESSION_PROVIDER",
            }
            for _cfg_key, _env_var in _compression_env_map.items():
                if _cfg_key in _compression_cfg:
                    os.environ[_env_var] = str(_compression_cfg[_cfg_key])
-        # Auxiliary model overrides (vision, web_extract).
-        # Each task has provider + model; bridge non-default values to env vars.
-        _auxiliary_cfg = _cfg.get("auxiliary", {})
-        if _auxiliary_cfg and isinstance(_auxiliary_cfg, dict):
-            _aux_task_env = {
-                "vision":      ("AUXILIARY_VISION_PROVIDER",      "AUXILIARY_VISION_MODEL"),
-                "web_extract": ("AUXILIARY_WEB_EXTRACT_PROVIDER",  "AUXILIARY_WEB_EXTRACT_MODEL"),
-            }
-            for _task_key, (_prov_env, _model_env) in _aux_task_env.items():
-                _task_cfg = _auxiliary_cfg.get(_task_key, {})
-                if not isinstance(_task_cfg, dict):
-                    continue
-                _prov = str(_task_cfg.get("provider", "")).strip()
-                _model = str(_task_cfg.get("model", "")).strip()
-                if _prov and _prov != "auto":
-                    os.environ[_prov_env] = _prov
-                if _model:
-                    os.environ[_model_env] = _model
        _agent_cfg = _cfg.get("agent", {})
        if _agent_cfg and isinstance(_agent_cfg, dict):
            if "max_turns" in _agent_cfg:
@@ -118,12 +99,6 @@ if _config_path.exists():
        _tz_cfg = _cfg.get("timezone", "")
        if _tz_cfg and isinstance(_tz_cfg, str) and "HERMES_TIMEZONE" not in os.environ:
            os.environ["HERMES_TIMEZONE"] = _tz_cfg.strip()
-        # Security settings
-        _security_cfg = _cfg.get("security", {})
-        if isinstance(_security_cfg, dict):
-            _redact = _security_cfg.get("redact_secrets")
-            if _redact is not None:
-                os.environ["HERMES_REDACT_SECRETS"] = str(_redact).lower()
    except Exception:
        pass  # Non-fatal; gateway can still run with .env values

@@ -200,7 +175,6 @@ class GatewayRunner:
        self._ephemeral_system_prompt = self._load_ephemeral_system_prompt()
        self._reasoning_config = self._load_reasoning_config()
        self._provider_routing = self._load_provider_routing()
-        self._fallback_model = self._load_fallback_model()

        # Wire process registry into session store for reset protection
        from tools.process_registry import process_registry
@@ -400,26 +374,6 @@ class GatewayRunner:
            pass
        return {}

-    @staticmethod
-    def _load_fallback_model() -> dict | None:
-        """Load fallback model config from config.yaml.
-
-        Returns a dict with 'provider' and 'model' keys, or None if
-        not configured / both fields empty.
-        """
-        try:
-            import yaml as _y
-            cfg_path = _hermes_home / "config.yaml"
-            if cfg_path.exists():
-                with open(cfg_path) as _f:
-                    cfg = _y.safe_load(_f) or {}
-                fb = cfg.get("fallback_model", {}) or {}
-                if fb.get("provider") and fb.get("model"):
-                    return fb
-        except Exception:
-            pass
-        return None
-
    async def start(self) -> bool:
        """
        Start the gateway and all configured platform adapters.
@@ -618,13 +572,6 @@ class GatewayRunner:
                return None
            return SlackAdapter(config)

-        elif platform == Platform.SIGNAL:
-            from gateway.platforms.signal import SignalAdapter, check_signal_requirements
-            if not check_signal_requirements():
-                logger.warning("Signal: SIGNAL_HTTP_URL or SIGNAL_ACCOUNT not configured")
-                return None
-            return SignalAdapter(config)
-
        elif platform == Platform.HOMEASSISTANT:
            from gateway.platforms.homeassistant import HomeAssistantAdapter, check_ha_requirements
            if not check_ha_requirements():
@@ -660,14 +607,12 @@ class GatewayRunner:
            Platform.DISCORD: "DISCORD_ALLOWED_USERS",
            Platform.WHATSAPP: "WHATSAPP_ALLOWED_USERS",
            Platform.SLACK: "SLACK_ALLOWED_USERS",
-            Platform.SIGNAL: "SIGNAL_ALLOWED_USERS",
        }
        platform_allow_all_map = {
            Platform.TELEGRAM: "TELEGRAM_ALLOW_ALL_USERS",
            Platform.DISCORD: "DISCORD_ALLOW_ALL_USERS",
            Platform.WHATSAPP: "WHATSAPP_ALLOW_ALL_USERS",
            Platform.SLACK: "SLACK_ALLOW_ALL_USERS",
-            Platform.SIGNAL: "SIGNAL_ALLOW_ALL_USERS",
        }

        # Per-platform allow-all flag (e.g., DISCORD_ALLOW_ALL_USERS=true)
@@ -765,8 +710,8 @@ class GatewayRunner:
        # Emit command:* hook for any recognized slash command
        _known_commands = {"new", "reset", "help", "status", "stop", "model",
                          "personality", "retry", "undo", "sethome", "set-home",
-                          "compress", "usage", "insights", "reload-mcp", "reload_mcp",
-                          "update", "title", "resume", "provider"}
+                          "compress", "usage", "insights", "reload-mcp", "update",
+                          "title"}
        if command and command in _known_commands:
            await self.hooks.emit(f"command:{command}", {
                "platform": source.platform.value if source.platform else "",
@@ -814,7 +759,7 @@ class GatewayRunner:
        if command == "insights":
            return await self._handle_insights_command(event)

-        if command in ("reload-mcp", "reload_mcp"):
+        if command == "reload-mcp":
            return await self._handle_reload_mcp_command(event)

        if command == "update":
@@ -822,9 +767,6 @@ class GatewayRunner:

        if command == "title":
            return await self._handle_title_command(event)
-
-        if command == "resume":
-            return await self._handle_resume_command(event)
        
        # Skill slash commands: /skill-name loads the skill and sends to agent
        if command:
@@ -906,188 +848,160 @@ class GatewayRunner:
        # every new message rehydrates an oversized transcript, causing
        # repeated truncation/context failures.  Detect this early and
        # compress proactively — before the agent even starts.  (#628)
-        #
-        # Thresholds are derived from the SAME compression config the
-        # agent uses (compression.threshold × model context length) so
-        # CLI and messaging platforms behave identically.
        # -----------------------------------------------------------------
        if history and len(history) >= 4:
-            from agent.model_metadata import (
-                estimate_messages_tokens_rough,
-                get_model_context_length,
-            )
+            from agent.model_metadata import estimate_messages_tokens_rough

-            # Read model + compression config from config.yaml — same
-            # source of truth the agent itself uses.
-            _hyg_model = "anthropic/claude-sonnet-4.6"
-            _hyg_threshold_pct = 0.85
-            _hyg_compression_enabled = True
+            # Read thresholds from config.yaml → session_hygiene section
+            _hygiene_cfg = {}
            try:
                _hyg_cfg_path = _hermes_home / "config.yaml"
                if _hyg_cfg_path.exists():
                    import yaml as _hyg_yaml
                    with open(_hyg_cfg_path) as _hyg_f:
                        _hyg_data = _hyg_yaml.safe_load(_hyg_f) or {}
-
-                    # Resolve model name (same logic as run_sync)
-                    _model_cfg = _hyg_data.get("model", {})
-                    if isinstance(_model_cfg, str):
-                        _hyg_model = _model_cfg
-                    elif isinstance(_model_cfg, dict):
-                        _hyg_model = _model_cfg.get("default", _hyg_model)
-
-                    # Read compression settings
-                    _comp_cfg = _hyg_data.get("compression", {})
-                    if isinstance(_comp_cfg, dict):
-                        _hyg_threshold_pct = float(
-                            _comp_cfg.get("threshold", _hyg_threshold_pct)
-                        )
-                        _hyg_compression_enabled = str(
-                            _comp_cfg.get("enabled", True)
-                        ).lower() in ("true", "1", "yes")
+                    _hygiene_cfg = _hyg_data.get("session_hygiene", {})
+                    if not isinstance(_hygiene_cfg, dict):
+                        _hygiene_cfg = {}
            except Exception:
                pass

-            # Also check env overrides (same as run_agent.py)
-            _hyg_threshold_pct = float(
-                os.getenv("CONTEXT_COMPRESSION_THRESHOLD", str(_hyg_threshold_pct))
+            _compress_token_threshold = int(
+                _hygiene_cfg.get("auto_compress_tokens", 100_000)
+            )
+            _compress_msg_threshold = int(
+                _hygiene_cfg.get("auto_compress_messages", 200)
+            )
+            _warn_token_threshold = int(
+                _hygiene_cfg.get("warn_tokens", 200_000)
            )
-            if os.getenv("CONTEXT_COMPRESSION_ENABLED", "").lower() in ("false", "0", "no"):
-                _hyg_compression_enabled = False

-            if _hyg_compression_enabled:
-                _hyg_context_length = get_model_context_length(_hyg_model)
-                _compress_token_threshold = int(
-                    _hyg_context_length * _hyg_threshold_pct
+            _msg_count = len(history)
+            _approx_tokens = estimate_messages_tokens_rough(history)
+
+            _needs_compress = (
+                _approx_tokens >= _compress_token_threshold
+                or _msg_count >= _compress_msg_threshold
+            )
+
+            if _needs_compress:
+                logger.info(
+                    "Session hygiene: %s messages, ~%s tokens — auto-compressing "
+                    "(thresholds: %s msgs / %s tokens)",
+                    _msg_count, f"{_approx_tokens:,}",
+                    _compress_msg_threshold, f"{_compress_token_threshold:,}",
                )
-                # Warn if still huge after compression (95% of context)
-                _warn_token_threshold = int(_hyg_context_length * 0.95)
-
-                _msg_count = len(history)
-                _approx_tokens = estimate_messages_tokens_rough(history)
-
-                _needs_compress = _approx_tokens >= _compress_token_threshold
-
-                if _needs_compress:
-                    logger.info(
-                        "Session hygiene: %s messages, ~%s tokens — auto-compressing "
-                        "(threshold: %s%% of %s = %s tokens)",
-                        _msg_count, f"{_approx_tokens:,}",
-                        int(_hyg_threshold_pct * 100),
-                        f"{_hyg_context_length:,}",
-                        f"{_compress_token_threshold:,}",
-                    )
-
-                    _hyg_adapter = self.adapters.get(source.platform)
-                    if _hyg_adapter:
-                        try:
-                            await _hyg_adapter.send(
-                                source.chat_id,
-                                f"🗜️ Session is large ({_msg_count} messages, "
-                                f"~{_approx_tokens:,} tokens). Auto-compressing..."
-                            )
-                        except Exception:
-                            pass

+                _hyg_adapter = self.adapters.get(source.platform)
+                if _hyg_adapter:
                    try:
-                        from run_agent import AIAgent
-
-                        _hyg_runtime = _resolve_runtime_agent_kwargs()
-                        if _hyg_runtime.get("api_key"):
-                            _hyg_msgs = [
-                                {"role": m.get("role"), "content": m.get("content")}
-                                for m in history
-                                if m.get("role") in ("user", "assistant")
-                                and m.get("content")
-                            ]
-
-                            if len(_hyg_msgs) >= 4:
-                                _hyg_agent = AIAgent(
-                                    **_hyg_runtime,
-                                    max_iterations=4,
-                                    quiet_mode=True,
-                                    enabled_toolsets=["memory"],
-                                    session_id=session_entry.session_id,
-                                )
-
-                                loop = asyncio.get_event_loop()
-                                _compressed, _ = await loop.run_in_executor(
-                                    None,
-                                    lambda: _hyg_agent._compress_context(
-                                        _hyg_msgs, "",
-                                        approx_tokens=_approx_tokens,
-                                    ),
-                                )
-
-                                self.session_store.rewrite_transcript(
-                                    session_entry.session_id, _compressed
-                                )
-                                history = _compressed
-                                _new_count = len(_compressed)
-                                _new_tokens = estimate_messages_tokens_rough(
-                                    _compressed
-                                )
-
-                                logger.info(
-                                    "Session hygiene: compressed %s → %s msgs, "
-                                    "~%s → ~%s tokens",
-                                    _msg_count, _new_count,
-                                    f"{_approx_tokens:,}", f"{_new_tokens:,}",
-                                )
-
-                                if _hyg_adapter:
-                                    try:
-                                        await _hyg_adapter.send(
-                                            source.chat_id,
-                                            f"🗜️ Compressed: {_msg_count} → "
-                                            f"{_new_count} messages, "
-                                            f"~{_approx_tokens:,} → "
-                                            f"~{_new_tokens:,} tokens"
-                                        )
-                                    except Exception:
-                                        pass
-
-                                # Still too large after compression — warn user
-                                if _new_tokens >= _warn_token_threshold:
-                                    logger.warning(
-                                        "Session hygiene: still ~%s tokens after "
-                                        "compression — suggesting /reset",
-                                        f"{_new_tokens:,}",
-                                    )
-                                    if _hyg_adapter:
-                                        try:
-                                            await _hyg_adapter.send(
-                                                source.chat_id,
-                                                "⚠️ Session is still very large "
-                                                "after compression "
-                                                f"(~{_new_tokens:,} tokens). "
-                                                "Consider using /reset to start "
-                                                "fresh if you experience issues."
-                                            )
-                                        except Exception:
-                                            pass
-
-                    except Exception as e:
-                        logger.warning(
-                            "Session hygiene auto-compress failed: %s", e
+                        await _hyg_adapter.send(
+                            source.chat_id,
+                            f"🗜️ Session is large ({_msg_count} messages, "
+                            f"~{_approx_tokens:,} tokens). Auto-compressing..."
                        )
-                        # Compression failed and session is dangerously large
-                        if _approx_tokens >= _warn_token_threshold:
-                            _hyg_adapter = self.adapters.get(source.platform)
+                    except Exception:
+                        pass
+
+                try:
+                    from run_agent import AIAgent
+
+                    _hyg_runtime = _resolve_runtime_agent_kwargs()
+                    if _hyg_runtime.get("api_key"):
+                        _hyg_msgs = [
+                            {"role": m.get("role"), "content": m.get("content")}
+                            for m in history
+                            if m.get("role") in ("user", "assistant")
+                            and m.get("content")
+                        ]
+
+                        if len(_hyg_msgs) >= 4:
+                            _hyg_agent = AIAgent(
+                                **_hyg_runtime,
+                                max_iterations=4,
+                                quiet_mode=True,
+                                enabled_toolsets=["memory"],
+                                session_id=session_entry.session_id,
+                            )
+
+                            loop = asyncio.get_event_loop()
+                            _compressed, _ = await loop.run_in_executor(
+                                None,
+                                lambda: _hyg_agent._compress_context(
+                                    _hyg_msgs, "",
+                                    approx_tokens=_approx_tokens,
+                                ),
+                            )
+
+                            self.session_store.rewrite_transcript(
+                                session_entry.session_id, _compressed
+                            )
+                            history = _compressed
+                            _new_count = len(_compressed)
+                            _new_tokens = estimate_messages_tokens_rough(
+                                _compressed
+                            )
+
+                            logger.info(
+                                "Session hygiene: compressed %s → %s msgs, "
+                                "~%s → ~%s tokens",
+                                _msg_count, _new_count,
+                                f"{_approx_tokens:,}", f"{_new_tokens:,}",
+                            )
+
                            if _hyg_adapter:
                                try:
                                    await _hyg_adapter.send(
                                        source.chat_id,
-                                        f"⚠️ Session is very large "
-                                        f"({_msg_count} messages, "
-                                        f"~{_approx_tokens:,} tokens) and "
-                                        "auto-compression failed. Consider "
-                                        "using /compress or /reset to avoid "
-                                        "issues."
+                                        f"🗜️ Compressed: {_msg_count} → "
+                                        f"{_new_count} messages, "
+                                        f"~{_approx_tokens:,} → "
+                                        f"~{_new_tokens:,} tokens"
                                    )
                                except Exception:
                                    pass

+                            # Still too large after compression — warn user
+                            if _new_tokens >= _warn_token_threshold:
+                                logger.warning(
+                                    "Session hygiene: still ~%s tokens after "
+                                    "compression — suggesting /reset",
+                                    f"{_new_tokens:,}",
+                                )
+                                if _hyg_adapter:
+                                    try:
+                                        await _hyg_adapter.send(
+                                            source.chat_id,
+                                            "⚠️ Session is still very large "
+                                            "after compression "
+                                            f"(~{_new_tokens:,} tokens). "
+                                            "Consider using /reset to start "
+                                            "fresh if you experience issues."
+                                        )
+                                    except Exception:
+                                        pass
+
+                except Exception as e:
+                    logger.warning(
+                        "Session hygiene auto-compress failed: %s", e
+                    )
+                    # Compression failed and session is dangerously large
+                    if _approx_tokens >= _warn_token_threshold:
+                        _hyg_adapter = self.adapters.get(source.platform)
+                        if _hyg_adapter:
+                            try:
+                                await _hyg_adapter.send(
+                                    source.chat_id,
+                                    f"⚠️ Session is very large "
+                                    f"({_msg_count} messages, "
+                                    f"~{_approx_tokens:,} tokens) and "
+                                    "auto-compression failed. Consider "
+                                    "using /compress or /reset to avoid "
+                                    "issues."
+                                )
+                            except Exception:
+                                pass
+
        # First-message onboarding -- only on the very first interaction ever
        if not history and not self.session_store.has_any_sessions():
            context_prompt += (
@@ -1392,7 +1306,6 @@ class GatewayRunner:
            "`/sethome` — Set this chat as the home channel",
            "`/compress` — Compress conversation context",
            "`/title [name]` — Set or show the session title",
-            "`/resume [name]` — Resume a previously-named session",
            "`/usage` — Show token usage for this session",
            "`/insights [days]` — Show usage insights and analytics",
            "`/reload-mcp` — Reload MCP servers from config",
@@ -1817,79 +1730,6 @@ class GatewayRunner:
            else:
                return "No title set. Usage: `/title My Session Name`"

-    async def _handle_resume_command(self, event: MessageEvent) -> str:
-        """Handle /resume command — switch to a previously-named session."""
-        if not self._session_db:
-            return "Session database not available."
-
-        source = event.source
-        session_key = build_session_key(source)
-        name = event.get_command_args().strip()
-
-        if not name:
-            # List recent titled sessions for this user/platform
-            try:
-                user_source = source.platform.value if source.platform else None
-                sessions = self._session_db.list_sessions_rich(
-                    source=user_source, limit=10
-                )
-                titled = [s for s in sessions if s.get("title")]
-                if not titled:
-                    return (
-                        "No named sessions found.\n"
-                        "Use `/title My Session` to name your current session, "
-                        "then `/resume My Session` to return to it later."
-                    )
-                lines = ["📋 **Named Sessions**\n"]
-                for s in titled[:10]:
-                    title = s["title"]
-                    preview = s.get("preview", "")[:40]
-                    preview_part = f" — _{preview}_" if preview else ""
-                    lines.append(f"• **{title}**{preview_part}")
-                lines.append("\nUsage: `/resume <session name>`")
-                return "\n".join(lines)
-            except Exception as e:
-                logger.debug("Failed to list titled sessions: %s", e)
-                return f"Could not list sessions: {e}"
-
-        # Resolve the name to a session ID
-        target_id = self._session_db.resolve_session_by_title(name)
-        if not target_id:
-            return (
-                f"No session found matching '**{name}**'.\n"
-                "Use `/resume` with no arguments to see available sessions."
-            )
-
-        # Check if already on that session
-        current_entry = self.session_store.get_or_create_session(source)
-        if current_entry.session_id == target_id:
-            return f"📌 Already on session **{name}**."
-
-        # Flush memories for current session before switching
-        try:
-            asyncio.create_task(self._async_flush_memories(current_entry.session_id))
-        except Exception as e:
-            logger.debug("Memory flush on resume failed: %s", e)
-
-        # Clear any running agent for this session key
-        if session_key in self._running_agents:
-            del self._running_agents[session_key]
-
-        # Switch the session entry to point at the old session
-        new_entry = self.session_store.switch_session(session_key, target_id)
-        if not new_entry:
-            return "Failed to switch session."
-
-        # Get the title for confirmation
-        title = self._session_db.get_session_title(target_id) or name
-
-        # Count messages for context
-        history = self.session_store.load_transcript(target_id)
-        msg_count = len([m for m in history if m.get("role") == "user"]) if history else 0
-        msg_part = f" ({msg_count} message{'s' if msg_count != 1 else ''})" if msg_count else ""
-
-        return f"↻ Resumed session **{title}**{msg_part}. Conversation restored."
-
    async def _handle_usage_command(self, event: MessageEvent) -> str:
        """Handle /usage command -- show token usage for the session's last agent run."""
        source = event.source
@@ -2687,7 +2527,6 @@ class GatewayRunner:
                platform=platform_key,
                honcho_session_key=session_key,
                session_db=self._session_db,
-                fallback_model=self._fallback_model,
            )
            
            # Store agent reference for interrupt support
@@ -45,8 +45,6 @@ class SessionSource:
    user_name: Optional[str] = None
    thread_id: Optional[str] = None  # For forum topics, Discord threads, etc.
    chat_topic: Optional[str] = None  # Channel topic/description (Discord, Slack)
-    user_id_alt: Optional[str] = None  # Signal UUID (alternative to phone number)
-    chat_id_alt: Optional[str] = None  # Signal group internal ID
    
    @property
    def description(self) -> str:
@@ -70,7 +68,7 @@ class SessionSource:
        return ", ".join(parts)
    
    def to_dict(self) -> Dict[str, Any]:
-        d = {
+        return {
            "platform": self.platform.value,
            "chat_id": self.chat_id,
            "chat_name": self.chat_name,
@@ -80,11 +78,6 @@ class SessionSource:
            "thread_id": self.thread_id,
            "chat_topic": self.chat_topic,
        }
-        if self.user_id_alt:
-            d["user_id_alt"] = self.user_id_alt
-        if self.chat_id_alt:
-            d["chat_id_alt"] = self.chat_id_alt
-        return d
    
    @classmethod
    def from_dict(cls, data: Dict[str, Any]) -> "SessionSource":
@@ -97,8 +90,6 @@ class SessionSource:
            user_name=data.get("user_name"),
            thread_id=data.get("thread_id"),
            chat_topic=data.get("chat_topic"),
-            user_id_alt=data.get("user_id_alt"),
-            chat_id_alt=data.get("chat_id_alt"),
        )
    
    @classmethod
@@ -342,7 +333,7 @@ class SessionStore:
        
        if sessions_file.exists():
            try:
-                with open(sessions_file, "r", encoding="utf-8") as f:
+                with open(sessions_file, "r") as f:
                    data = json.load(f)
                    for key, entry_data in data.items():
                        self._entries[key] = SessionEntry.from_dict(entry_data)
@@ -357,7 +348,7 @@ class SessionStore:
        sessions_file = self.sessions_dir / "sessions.json"
        
        data = {key: entry.to_dict() for key, entry in self._entries.items()}
-        with open(sessions_file, "w", encoding="utf-8") as f:
+        with open(sessions_file, "w") as f:
            json.dump(data, f, indent=2)
    
    def _generate_session_key(self, source: SessionSource) -> str:
@@ -602,49 +593,7 @@ class SessionStore:
                logger.debug("Session DB operation failed: %s", e)
        
        return new_entry
-
-    def switch_session(self, session_key: str, target_session_id: str) -> Optional[SessionEntry]:
-        """Switch a session key to point at an existing session ID.
-
-        Used by ``/resume`` to restore a previously-named session.
-        Ends the current session in SQLite (like reset), but instead of
-        generating a fresh session ID, re-uses ``target_session_id`` so the
-        old transcript is loaded on the next message.
-        """
-        self._ensure_loaded()
-
-        if session_key not in self._entries:
-            return None
-
-        old_entry = self._entries[session_key]
-
-        # Don't switch if already on that session
-        if old_entry.session_id == target_session_id:
-            return old_entry
-
-        # End the current session in SQLite
-        if self._db:
-            try:
-                self._db.end_session(old_entry.session_id, "session_switch")
-            except Exception as e:
-                logger.debug("Session DB end_session failed: %s", e)
-
-        now = datetime.now()
-        new_entry = SessionEntry(
-            session_key=session_key,
-            session_id=target_session_id,
-            created_at=now,
-            updated_at=now,
-            origin=old_entry.origin,
-            display_name=old_entry.display_name,
-            platform=old_entry.platform,
-            chat_type=old_entry.chat_type,
-        )
-
-        self._entries[session_key] = new_entry
-        self._save()
-        return new_entry
-
+    
    def list_sessions(self, active_minutes: Optional[int] = None) -> List[SessionEntry]:
        """List all sessions, optionally filtered by activity."""
        self._ensure_loaded()
@@ -681,7 +630,7 @@ class SessionStore:
        
        # Also write legacy JSONL (keeps existing tooling working during transition)
        transcript_path = self.get_transcript_path(session_id)
-        with open(transcript_path, "a", encoding="utf-8") as f:
+        with open(transcript_path, "a") as f:
            f.write(json.dumps(message, ensure_ascii=False) + "\n")
    
    def rewrite_transcript(self, session_id: str, messages: List[Dict[str, Any]]) -> None:
@@ -708,7 +657,7 @@ class SessionStore:
        
        # JSONL: overwrite the file
        transcript_path = self.get_transcript_path(session_id)
-        with open(transcript_path, "w", encoding="utf-8") as f:
+        with open(transcript_path, "w") as f:
            for msg in messages:
                f.write(json.dumps(msg, ensure_ascii=False) + "\n")

@@ -730,7 +679,7 @@ class SessionStore:
            return []
        
        messages = []
-        with open(transcript_path, "r", encoding="utf-8") as f:
+        with open(transcript_path, "r") as f:
            for line in f:
                line = line.strip()
                if line:
@@ -285,8 +285,8 @@ def _convert_to_png(path: Path) -> bool:
        logger.debug("Pillow BMP→PNG conversion failed: %s", e)

    # Fall back to ImageMagick convert
-    tmp = path.with_suffix(".bmp")
    try:
+        tmp = path.with_suffix(".bmp")
        path.rename(tmp)
        r = subprocess.run(
            ["convert", str(tmp), "png:" + str(path)],
@@ -297,12 +297,8 @@ def _convert_to_png(path: Path) -> bool:
            return True
    except FileNotFoundError:
        logger.debug("ImageMagick not installed — cannot convert BMP to PNG")
-        if tmp.exists() and not path.exists():
-            tmp.rename(path)
    except Exception as e:
        logger.debug("ImageMagick BMP→PNG conversion failed: %s", e)
-        if tmp.exists() and not path.exists():
-            tmp.rename(path)

    # Can't convert — BMP is still usable as-is for most APIs
    return path.exists() and path.stat().st_size > 0
@@ -94,6 +94,8 @@ def _read_cache_models(codex_home: Path) -> List[str]:
            if not isinstance(slug, str) or not slug.strip():
                continue
            slug = slug.strip()
+            if "codex" not in slug.lower():
+                continue
            if item.get("supported_in_api") is False:
                continue
            visibility = item.get("visibility")
@@ -81,34 +81,17 @@ DEFAULT_CONFIG = {
    
    "browser": {
        "inactivity_timeout": 120,
-        "record_sessions": False,  # Auto-record browser sessions as WebM videos
    },
    
    "compression": {
        "enabled": True,
        "threshold": 0.85,
        "summary_model": "google/gemini-3-flash-preview",
-        "summary_provider": "auto",
-    },
-    
-    # Auxiliary model overrides (advanced).  By default Hermes auto-selects
-    # the provider and model for each side task.  Set these to override.
-    "auxiliary": {
-        "vision": {
-            "provider": "auto",    # auto | openrouter | nous | main
-            "model": "",           # e.g. "google/gemini-2.5-flash", "gpt-4o"
-        },
-        "web_extract": {
-            "provider": "auto",
-            "model": "",
-        },
    },
    
    "display": {
        "compact": False,
        "personality": "kawaii",
-        "resume_display": "full",  # "full" (show previous messages) | "minimal" (one-liner only)
-        "bell_on_complete": False,  # Play terminal bell (\a) when agent finishes a response
    },
    
    # Text-to-speech configuration
@@ -439,7 +422,7 @@ OPTIONAL_ENV_VARS = {
        "category": "setting",
    },
    "HERMES_MAX_ITERATIONS": {
-        "description": "Maximum tool-calling iterations per conversation (default: 90)",
+        "description": "Maximum tool-calling iterations per conversation (default: 60)",
        "prompt": "Max iterations",
        "url": None,
        "password": False,
@@ -759,36 +742,6 @@ def load_config() -> Dict[str, Any]:
    return config


-_COMMENTED_SECTIONS = """
-# ── Security ──────────────────────────────────────────────────────────
-# API keys, tokens, and passwords are redacted from tool output by default.
-# Set to false to see full values (useful for debugging auth issues).
-#
-# security:
-#   redact_secrets: false
-
-# ── Fallback Model ────────────────────────────────────────────────────
-# Automatic provider failover when primary is unavailable.
-# Uncomment and configure to enable. Triggers on rate limits (429),
-# overload (529), service errors (503), or connection failures.
-#
-# Supported providers:
-#   openrouter   (OPENROUTER_API_KEY)  — routes to any model
-#   openai-codex (OAuth — hermes login) — OpenAI Codex
-#   nous         (OAuth — hermes login) — Nous Portal
-#   zai          (ZAI_API_KEY)         — Z.AI / GLM
-#   kimi-coding  (KIMI_API_KEY)        — Kimi / Moonshot
-#   minimax      (MINIMAX_API_KEY)     — MiniMax
-#   minimax-cn   (MINIMAX_CN_API_KEY)  — MiniMax (China)
-#
-# For custom OpenAI-compatible endpoints, add base_url and api_key_env.
-#
-# fallback_model:
-#   provider: openrouter
-#   model: anthropic/claude-sonnet-4
-"""
-
-
 def save_config(config: Dict[str, Any]):
    """Save configuration to ~/.hermes/config.yaml."""
    ensure_hermes_home()
@@ -796,18 +749,6 @@ def save_config(config: Dict[str, Any]):
    
    with open(config_path, 'w') as f:
        yaml.dump(config, f, default_flow_style=False, sort_keys=False)
-        # Append commented-out sections for features that are off by default
-        # or only relevant when explicitly configured. Skip sections the
-        # user has already uncommented and configured.
-        sections = []
-        sec = config.get("security", {})
-        if not sec or sec.get("redact_secrets") is None:
-            sections.append("security")
-        fb = config.get("fallback_model", {})
-        if not fb or not (fb.get("provider") and fb.get("model")):
-            sections.append("fallback")
-        if sections:
-            f.write(_COMMENTED_SECTIONS)


 def load_env() -> Dict[str, str]:
@@ -971,31 +912,6 @@ def show_config():
    if enabled:
        print(f"  Threshold:    {compression.get('threshold', 0.85) * 100:.0f}%")
        print(f"  Model:        {compression.get('summary_model', 'google/gemini-3-flash-preview')}")
-        comp_provider = compression.get('summary_provider', 'auto')
-        if comp_provider != 'auto':
-            print(f"  Provider:     {comp_provider}")
-    
-    # Auxiliary models
-    auxiliary = config.get('auxiliary', {})
-    aux_tasks = {
-        "Vision":      auxiliary.get('vision', {}),
-        "Web extract": auxiliary.get('web_extract', {}),
-    }
-    has_overrides = any(
-        t.get('provider', 'auto') != 'auto' or t.get('model', '')
-        for t in aux_tasks.values()
-    )
-    if has_overrides:
-        print()
-        print(color("◆ Auxiliary Models (overrides)", Colors.CYAN, Colors.BOLD))
-        for label, task_cfg in aux_tasks.items():
-            prov = task_cfg.get('provider', 'auto')
-            mdl = task_cfg.get('model', '')
-            if prov != 'auto' or mdl:
-                parts = [f"provider={prov}"]
-                if mdl:
-                    parts.append(f"model={mdl}")
-                print(f"  {label:12s}  {', '.join(parts)}")
    
    # Messaging
    print()
@@ -1053,7 +969,7 @@ def set_config_value(key: str, value: str):
        'FAL_KEY', 'TELEGRAM_BOT_TOKEN', 'DISCORD_BOT_TOKEN',
        'TERMINAL_SSH_HOST', 'TERMINAL_SSH_USER', 'TERMINAL_SSH_KEY',
        'SUDO_PASSWORD', 'SLACK_BOT_TOKEN', 'SLACK_APP_TOKEN',
-        'GITHUB_TOKEN', 'HONCHO_API_KEY', 'WANDB_API_KEY',
+        'GITHUB_TOKEN', 'HONCHO_API_KEY', 'NOUS_API_KEY', 'WANDB_API_KEY',
        'TINKER_API_KEY',
    ]
    
@@ -507,12 +507,6 @@ _PLATFORMS = [
        "emoji": "📲",
        "token_var": "WHATSAPP_ENABLED",
    },
-    {
-        "key": "signal",
-        "label": "Signal",
-        "emoji": "📡",
-        "token_var": "SIGNAL_HTTP_URL",
-    },
 ]


@@ -531,13 +525,6 @@ def _platform_status(platform: dict) -> str:
                return "configured + paired"
            return "enabled, not paired"
        return "not configured"
-    if platform.get("key") == "signal":
-        account = get_env_value("SIGNAL_ACCOUNT")
-        if val and account:
-            return "configured"
-        if val or account:
-            return "partially configured"
-        return "not configured"
    if val:
        return "configured"
    return "not configured"
@@ -663,121 +650,6 @@ def _is_service_running() -> bool:
    return len(find_gateway_pids()) > 0


-def _setup_signal():
-    """Interactive setup for Signal messenger."""
-    import shutil
-
-    print()
-    print(color("  ─── 📡 Signal Setup ───", Colors.CYAN))
-
-    existing_url = get_env_value("SIGNAL_HTTP_URL")
-    existing_account = get_env_value("SIGNAL_ACCOUNT")
-    if existing_url and existing_account:
-        print()
-        print_success("Signal is already configured.")
-        if not prompt_yes_no("  Reconfigure Signal?", False):
-            return
-
-    # Check if signal-cli is available
-    print()
-    if shutil.which("signal-cli"):
-        print_success("signal-cli found on PATH.")
-    else:
-        print_warning("signal-cli not found on PATH.")
-        print_info("  Signal requires signal-cli running as an HTTP daemon.")
-        print_info("  Install options:")
-        print_info("    Linux:  sudo apt install signal-cli")
-        print_info("            or download from https://github.com/AsamK/signal-cli")
-        print_info("    macOS:  brew install signal-cli")
-        print_info("    Docker: bbernhard/signal-cli-rest-api")
-        print()
-        print_info("  After installing, link your account and start the daemon:")
-        print_info("    signal-cli link -n \"HermesAgent\"")
-        print_info("    signal-cli --account +YOURNUMBER daemon --http 127.0.0.1:8080")
-        print()
-
-    # HTTP URL
-    print()
-    print_info("  Enter the URL where signal-cli HTTP daemon is running.")
-    default_url = existing_url or "http://127.0.0.1:8080"
-    try:
-        url = input(f"  HTTP URL [{default_url}]: ").strip() or default_url
-    except (EOFError, KeyboardInterrupt):
-        print("\n  Setup cancelled.")
-        return
-
-    # Test connectivity
-    print_info("  Testing connection...")
-    try:
-        import httpx
-        resp = httpx.get(f"{url.rstrip('/')}/api/v1/check", timeout=10.0)
-        if resp.status_code == 200:
-            print_success("  signal-cli daemon is reachable!")
-        else:
-            print_warning(f"  signal-cli responded with status {resp.status_code}.")
-            if not prompt_yes_no("  Continue anyway?", False):
-                return
-    except Exception as e:
-        print_warning(f"  Could not reach signal-cli at {url}: {e}")
-        if not prompt_yes_no("  Save this URL anyway? (you can start signal-cli later)", True):
-            return
-
-    save_env_value("SIGNAL_HTTP_URL", url)
-
-    # Account phone number
-    print()
-    print_info("  Enter your Signal account phone number in E.164 format.")
-    print_info("  Example: +15551234567")
-    default_account = existing_account or ""
-    try:
-        account = input(f"  Account number{f' [{default_account}]' if default_account else ''}: ").strip()
-        if not account:
-            account = default_account
-    except (EOFError, KeyboardInterrupt):
-        print("\n  Setup cancelled.")
-        return
-
-    if not account:
-        print_error("  Account number is required.")
-        return
-
-    save_env_value("SIGNAL_ACCOUNT", account)
-
-    # Allowed users
-    print()
-    print_info("  The gateway DENIES all users by default for security.")
-    print_info("  Enter phone numbers or UUIDs of allowed users (comma-separated).")
-    existing_allowed = get_env_value("SIGNAL_ALLOWED_USERS") or ""
-    default_allowed = existing_allowed or account
-    try:
-        allowed = input(f"  Allowed users [{default_allowed}]: ").strip() or default_allowed
-    except (EOFError, KeyboardInterrupt):
-        print("\n  Setup cancelled.")
-        return
-
-    save_env_value("SIGNAL_ALLOWED_USERS", allowed)
-
-    # Group messaging
-    print()
-    if prompt_yes_no("  Enable group messaging? (disabled by default for security)", False):
-        print()
-        print_info("  Enter group IDs to allow, or * for all groups.")
-        existing_groups = get_env_value("SIGNAL_GROUP_ALLOWED_USERS") or ""
-        try:
-            groups = input(f"  Group IDs [{existing_groups or '*'}]: ").strip() or existing_groups or "*"
-        except (EOFError, KeyboardInterrupt):
-            print("\n  Setup cancelled.")
-            return
-        save_env_value("SIGNAL_GROUP_ALLOWED_USERS", groups)
-
-    print()
-    print_success("Signal configured!")
-    print_info(f"  URL: {url}")
-    print_info(f"  Account: {account}")
-    print_info(f"  DM auth: via SIGNAL_ALLOWED_USERS + DM pairing")
-    print_info(f"  Groups: {'enabled' if get_env_value('SIGNAL_GROUP_ALLOWED_USERS') else 'disabled'}")
-
-
 def gateway_setup():
    """Interactive setup for messaging platforms + gateway service."""

@@ -830,8 +702,6 @@ def gateway_setup():

        if platform["key"] == "whatsapp":
            _setup_whatsapp()
-        elif platform["key"] == "signal":
-            _setup_signal()
        else:
            _setup_standard_platform(platform)

@@ -21,7 +21,6 @@ Usage:
    hermes version             # Show version
    hermes update              # Update to latest version
    hermes uninstall           # Uninstall Hermes Agent
-    hermes sessions browse     # Interactive session picker with search
 """

 import argparse
@@ -107,279 +106,6 @@ def _has_any_provider_configured() -> bool:
    return False


-def _session_browse_picker(sessions: list) -> Optional[str]:
-    """Interactive curses-based session browser with live search filtering.
-
-    Returns the selected session ID, or None if cancelled.
-    Uses curses (not simple_term_menu) to avoid the ghost-duplication rendering
-    bug in tmux/iTerm when arrow keys are used.
-    """
-    if not sessions:
-        print("No sessions found.")
-        return None
-
-    # Try curses-based picker first
-    try:
-        import curses
-        import time as _time
-        from datetime import datetime
-
-        result_holder = [None]
-
-        def _relative_time(ts):
-            if not ts:
-                return "?"
-            delta = _time.time() - ts
-            if delta < 60:
-                return "just now"
-            elif delta < 3600:
-                return f"{int(delta / 60)}m ago"
-            elif delta < 86400:
-                return f"{int(delta / 3600)}h ago"
-            elif delta < 172800:
-                return "yesterday"
-            elif delta < 604800:
-                return f"{int(delta / 86400)}d ago"
-            else:
-                return datetime.fromtimestamp(ts).strftime("%Y-%m-%d")
-
-        def _format_row(s, max_x):
-            """Format a session row for display."""
-            title = (s.get("title") or "").strip()
-            preview = (s.get("preview") or "").strip()
-            source = s.get("source", "")[:6]
-            last_active = _relative_time(s.get("last_active"))
-            sid = s["id"][:18]
-
-            # Adaptive column widths based on terminal width
-            # Layout: [arrow 3] [title/preview flexible] [active 12] [src 6] [id 18]
-            fixed_cols = 3 + 12 + 6 + 18 + 6  # arrow + active + src + id + padding
-            name_width = max(20, max_x - fixed_cols)
-
-            if title:
-                name = title[:name_width]
-            elif preview:
-                name = preview[:name_width]
-            else:
-                name = sid
-
-            return f"{name:<{name_width}}  {last_active:<10}  {source:<5} {sid}"
-
-        def _match(s, query):
-            """Check if a session matches the search query (case-insensitive)."""
-            q = query.lower()
-            return (
-                q in (s.get("title") or "").lower()
-                or q in (s.get("preview") or "").lower()
-                or q in s.get("id", "").lower()
-                or q in (s.get("source") or "").lower()
-            )
-
-        def _curses_browse(stdscr):
-            curses.curs_set(0)
-            if curses.has_colors():
-                curses.start_color()
-                curses.use_default_colors()
-                curses.init_pair(1, curses.COLOR_GREEN, -1)   # selected
-                curses.init_pair(2, curses.COLOR_YELLOW, -1)  # header
-                curses.init_pair(3, curses.COLOR_CYAN, -1)    # search
-                curses.init_pair(4, 8, -1)                    # dim
-
-            cursor = 0
-            scroll_offset = 0
-            search_text = ""
-            filtered = list(sessions)
-
-            while True:
-                stdscr.clear()
-                max_y, max_x = stdscr.getmaxyx()
-                if max_y < 5 or max_x < 40:
-                    # Terminal too small
-                    try:
-                        stdscr.addstr(0, 0, "Terminal too small")
-                    except curses.error:
-                        pass
-                    stdscr.refresh()
-                    stdscr.getch()
-                    return
-
-                # Header line
-                if search_text:
-                    header = f"  Browse sessions — filter: {search_text}█"
-                    header_attr = curses.A_BOLD
-                    if curses.has_colors():
-                        header_attr |= curses.color_pair(3)
-                else:
-                    header = "  Browse sessions — ↑↓ navigate  Enter select  Type to filter  Esc quit"
-                    header_attr = curses.A_BOLD
-                    if curses.has_colors():
-                        header_attr |= curses.color_pair(2)
-                try:
-                    stdscr.addnstr(0, 0, header, max_x - 1, header_attr)
-                except curses.error:
-                    pass
-
-                # Column header line
-                fixed_cols = 3 + 12 + 6 + 18 + 6
-                name_width = max(20, max_x - fixed_cols)
-                col_header = f"   {'Title / Preview':<{name_width}}  {'Active':<10}  {'Src':<5} {'ID'}"
-                try:
-                    dim_attr = curses.color_pair(4) if curses.has_colors() else curses.A_DIM
-                    stdscr.addnstr(1, 0, col_header, max_x - 1, dim_attr)
-                except curses.error:
-                    pass
-
-                # Compute visible area
-                visible_rows = max_y - 4  # header + col header + blank + footer
-                if visible_rows < 1:
-                    visible_rows = 1
-
-                # Clamp cursor and scroll
-                if not filtered:
-                    try:
-                        msg = "  No sessions match the filter."
-                        stdscr.addnstr(3, 0, msg, max_x - 1, curses.A_DIM)
-                    except curses.error:
-                        pass
-                else:
-                    if cursor >= len(filtered):
-                        cursor = len(filtered) - 1
-                    if cursor < 0:
-                        cursor = 0
-                    if cursor < scroll_offset:
-                        scroll_offset = cursor
-                    elif cursor >= scroll_offset + visible_rows:
-                        scroll_offset = cursor - visible_rows + 1
-
-                    for draw_i, i in enumerate(range(
-                        scroll_offset,
-                        min(len(filtered), scroll_offset + visible_rows)
-                    )):
-                        y = draw_i + 3
-                        if y >= max_y - 1:
-                            break
-                        s = filtered[i]
-                        arrow = " → " if i == cursor else "   "
-                        row = arrow + _format_row(s, max_x - 3)
-                        attr = curses.A_NORMAL
-                        if i == cursor:
-                            attr = curses.A_BOLD
-                            if curses.has_colors():
-                                attr |= curses.color_pair(1)
-                        try:
-                            stdscr.addnstr(y, 0, row, max_x - 1, attr)
-                        except curses.error:
-                            pass
-
-                # Footer
-                footer_y = max_y - 1
-                if filtered:
-                    footer = f"  {cursor + 1}/{len(filtered)} sessions"
-                    if len(filtered) < len(sessions):
-                        footer += f" (filtered from {len(sessions)})"
-                else:
-                    footer = f"  0/{len(sessions)} sessions"
-                try:
-                    stdscr.addnstr(footer_y, 0, footer, max_x - 1,
-                                   curses.color_pair(4) if curses.has_colors() else curses.A_DIM)
-                except curses.error:
-                    pass
-
-                stdscr.refresh()
-                key = stdscr.getch()
-
-                if key in (curses.KEY_UP, ):
-                    if filtered:
-                        cursor = (cursor - 1) % len(filtered)
-                elif key in (curses.KEY_DOWN, ):
-                    if filtered:
-                        cursor = (cursor + 1) % len(filtered)
-                elif key in (curses.KEY_ENTER, 10, 13):
-                    if filtered:
-                        result_holder[0] = filtered[cursor]["id"]
-                    return
-                elif key == 27:  # Esc
-                    if search_text:
-                        # First Esc clears the search
-                        search_text = ""
-                        filtered = list(sessions)
-                        cursor = 0
-                        scroll_offset = 0
-                    else:
-                        # Second Esc exits
-                        return
-                elif key in (curses.KEY_BACKSPACE, 127, 8):
-                    if search_text:
-                        search_text = search_text[:-1]
-                        if search_text:
-                            filtered = [s for s in sessions if _match(s, search_text)]
-                        else:
-                            filtered = list(sessions)
-                        cursor = 0
-                        scroll_offset = 0
-                elif key == ord('q') and not search_text:
-                    return
-                elif 32 <= key <= 126:
-                    # Printable character → add to search filter
-                    search_text += chr(key)
-                    filtered = [s for s in sessions if _match(s, search_text)]
-                    cursor = 0
-                    scroll_offset = 0
-
-        curses.wrapper(_curses_browse)
-        return result_holder[0]
-
-    except Exception:
-        pass
-
-    # Fallback: numbered list (Windows without curses, etc.)
-    import time as _time
-    from datetime import datetime
-
-    def _relative_time_fb(ts):
-        if not ts:
-            return "?"
-        delta = _time.time() - ts
-        if delta < 60:
-            return "just now"
-        elif delta < 3600:
-            return f"{int(delta / 60)}m ago"
-        elif delta < 86400:
-            return f"{int(delta / 3600)}h ago"
-        elif delta < 172800:
-            return "yesterday"
-        elif delta < 604800:
-            return f"{int(delta / 86400)}d ago"
-        else:
-            return datetime.fromtimestamp(ts).strftime("%Y-%m-%d")
-
-    print("\n  Browse sessions  (enter number to resume, q to cancel)\n")
-    for i, s in enumerate(sessions):
-        title = (s.get("title") or "").strip()
-        preview = (s.get("preview") or "").strip()
-        label = title or preview or s["id"]
-        if len(label) > 50:
-            label = label[:47] + "..."
-        last_active = _relative_time_fb(s.get("last_active"))
-        src = s.get("source", "")[:6]
-        print(f"  {i + 1:>3}. {label:<50}  {last_active:<10}  {src}")
-
-    while True:
-        try:
-            val = input(f"\n  Select [1-{len(sessions)}]: ").strip()
-            if not val or val.lower() in ("q", "quit", "exit"):
-                return None
-            idx = int(val) - 1
-            if 0 <= idx < len(sessions):
-                return sessions[idx]["id"]
-            print(f"  Invalid selection. Enter 1-{len(sessions)} or q to cancel.")
-        except ValueError:
-            print(f"  Invalid input. Enter a number or q to cancel.")
-        except (KeyboardInterrupt, EOFError):
-            print()
-            return None
-
-
 def _resolve_last_cli_session() -> Optional[str]:
    """Look up the most recent CLI session ID from SQLite. Returns None if unavailable."""
    try:
@@ -1543,7 +1269,6 @@ Examples:
    hermes -w                     Start in isolated git worktree
    hermes gateway install        Install as system service
    hermes sessions list          List past sessions
-    hermes sessions browse        Interactive session picker
    hermes sessions rename ID T   Rename/title a session
    hermes update                 Update to latest version

@@ -2028,13 +1753,6 @@ For more help on a command:
    sessions_rename.add_argument("session_id", help="Session ID to rename")
    sessions_rename.add_argument("title", nargs="+", help="New title for the session")

-    sessions_browse = sessions_subparsers.add_parser(
-        "browse",
-        help="Interactive session picker — browse, search, and resume sessions",
-    )
-    sessions_browse.add_argument("--source", help="Filter by source (cli, telegram, discord, etc.)")
-    sessions_browse.add_argument("--limit", type=int, default=50, help="Max sessions to load (default: 50)")
-
    def cmd_sessions(args):
        import json as _json
        try:
@@ -2141,34 +1859,6 @@ For more help on a command:
            except ValueError as e:
                print(f"Error: {e}")

-        elif action == "browse":
-            limit = getattr(args, "limit", 50) or 50
-            source = getattr(args, "source", None)
-            sessions = db.list_sessions_rich(source=source, limit=limit)
-            db.close()
-            if not sessions:
-                print("No sessions found.")
-                return
-
-            selected_id = _session_browse_picker(sessions)
-            if not selected_id:
-                print("Cancelled.")
-                return
-
-            # Launch hermes --resume <id> by replacing the current process
-            print(f"Resuming session: {selected_id}")
-            import shutil
-            hermes_bin = shutil.which("hermes")
-            if hermes_bin:
-                os.execvp(hermes_bin, ["hermes", "--resume", selected_id])
-            else:
-                # Fallback: re-invoke via python -m
-                os.execvp(
-                    sys.executable,
-                    [sys.executable, "-m", "hermes_cli.main", "--resume", selected_id],
-                )
-            return  # won't reach here after execvp
-
        elif action == "stats":
            total = db.session_count()
            msgs = db.message_count()
@@ -2178,6 +1868,7 @@ For more help on a command:
                c = db.session_count(source=src)
                if c > 0:
                    print(f"  {src}: {c} sessions")
+            import os
            db_path = db.db_path
            if db_path.exists():
                size_mb = os.path.getsize(db_path) / (1024 * 1024)
@@ -870,8 +870,8 @@ def setup_model_provider(config: dict):
                config['model'] = custom
                save_env_value("LLM_MODEL", custom)
        elif selected_provider == "openai-codex":
-            from hermes_cli.codex_models import get_codex_model_ids
-            codex_models = get_codex_model_ids()
+            from hermes_cli.codex_models import get_codex_models
+            codex_models = get_codex_models()
            model_choices = codex_models + [f"Keep current ({current_model})"]
            default_codex = 0
            if current_model in codex_models:
@@ -1264,7 +1264,7 @@ def setup_agent_settings(config: dict):
    # ── Max Iterations ──
    print_header("Agent Settings")

-    current_max = get_env_value('HERMES_MAX_ITERATIONS') or '90'
+    current_max = get_env_value('HERMES_MAX_ITERATIONS') or '60'
    print_info("Maximum tool-calling iterations per conversation.")
    print_info("Higher = more complex tasks, but costs more tokens.")
    print_info("Recommended: 30-60 for most tasks, 100+ for open exploration.")
@@ -1660,18 +1660,14 @@ def setup_gateway(config: dict):
 # Section 5: Tool Configuration (delegates to unified tools_config.py)
 # =============================================================================

-def setup_tools(config: dict, first_install: bool = False):
+def setup_tools(config: dict):
    """Configure tools — delegates to the unified tools_command() in tools_config.py.
    
    Both `hermes setup tools` and `hermes tools` use the same flow:
    platform selection → toolset toggles → provider/API key configuration.
-    
-    Args:
-        first_install: When True, uses the simplified first-install flow
-            (no platform menu, prompts for all unconfigured API keys).
    """
    from hermes_cli.tools_config import tools_command
-    tools_command(first_install=first_install, config=config)
+    tools_command()


 # =============================================================================
@@ -1824,7 +1820,7 @@ def run_setup_wizard(args):
    setup_gateway(config)

    # Section 5: Tools
-    setup_tools(config, first_install=not is_existing)
+    setup_tools(config)

    # Save and show summary
    save_config(config)
@@ -408,11 +408,10 @@ def do_inspect(identifier: str, console: Optional[Console] = None) -> None:

 def do_list(source_filter: str = "all", console: Optional[Console] = None) -> None:
    """List installed skills, distinguishing builtins from hub-installed."""
-    from tools.skills_hub import HubLockFile, ensure_hub_dirs
+    from tools.skills_hub import HubLockFile, SKILLS_DIR
    from tools.skills_tool import _find_all_skills

    c = console or _console
-    ensure_hub_dirs()
    lock = HubLockFile()
    hub_installed = {e["name"]: e for e in lock.list_installed()}

@@ -206,8 +206,6 @@ def show_status(args):
        "Telegram": ("TELEGRAM_BOT_TOKEN", "TELEGRAM_HOME_CHANNEL"),
        "Discord": ("DISCORD_BOT_TOKEN", "DISCORD_HOME_CHANNEL"),
        "WhatsApp": ("WHATSAPP_ENABLED", None),
-        "Signal": ("SIGNAL_HTTP_URL", "SIGNAL_HOME_CHANNEL"),
-        "Slack": ("SLACK_BOT_TOKEN", None),
    }
    
    for name, (token_var, home_var) in platforms.items():
@@ -96,11 +96,6 @@ CONFIGURABLE_TOOLSETS = [
    ("homeassistant",    "🏠 Home Assistant",           "smart home device control"),
 ]

-# Toolsets that are OFF by default for new installs.
-# They're still in _HERMES_CORE_TOOLS (available at runtime if enabled),
-# but the setup checklist won't pre-select them for first-time users.
-_DEFAULT_OFF_TOOLSETS = {"moa", "homeassistant", "rl"}
-
 # Platform display config
 PLATFORMS = {
    "cli":      {"label": "🖥️  CLI",       "default_toolset": "hermes-cli"},
@@ -147,8 +142,6 @@ TOOL_CATEGORIES = {
    },
    "web": {
        "name": "Web Search & Extract",
-        "setup_title": "Select Search Provider",
-        "setup_note": "A free DuckDuckGo search skill is also included — skip this if you don't need Firecrawl.",
        "icon": "🔍",
        "providers": [
            {
@@ -602,18 +595,11 @@ def _configure_tool_category(ts_key: str, cat: dict, config: dict):
        print(color(f"  --- {icon} {name} ({provider['name']}) ---", Colors.CYAN))
        if provider.get("tag"):
            _print_info(f"  {provider['tag']}")
-        # For single-provider tools, show a note if available
-        if cat.get("setup_note"):
-            _print_info(f"  {cat['setup_note']}")
        _configure_provider(provider, config)
    else:
        # Multiple providers - let user choose
        print()
-        # Use custom title if provided (e.g. "Select Search Provider")
-        title = cat.get("setup_title", f"Choose a provider")
-        print(color(f"  --- {icon} {name} - {title} ---", Colors.CYAN))
-        if cat.get("setup_note"):
-            _print_info(f"  {cat['setup_note']}")
+        print(color(f"  --- {icon} {name} - Choose a provider ---", Colors.CYAN))
        print()

        # Plain text labels only (no ANSI codes in menu items)
@@ -631,9 +617,6 @@ def _configure_tool_category(ts_key: str, cat: dict, config: dict):
                    configured = " [configured]"
            provider_choices.append(f"{p['name']}{tag}{configured}")

-        # Add skip option
-        provider_choices.append("Skip — keep defaults / configure later")
-
        # Detect current provider as default
        default_idx = 0
        for i, p in enumerate(providers):
@@ -645,13 +628,7 @@ def _configure_tool_category(ts_key: str, cat: dict, config: dict):
                default_idx = i
                break

-        provider_idx = _prompt_choice(f"  {title}:", provider_choices, default_idx)
-
-        # Skip selected
-        if provider_idx >= len(providers):
-            _print_info(f"  Skipped {name}")
-            return
-
+        provider_idx = _prompt_choice("  Select provider:", provider_choices, default_idx)
        _configure_provider(providers[provider_idx], config)


@@ -858,19 +835,9 @@ def _reconfigure_simple_requirements(ts_key: str):

 # ─── Main Entry Point ─────────────────────────────────────────────────────────

-def tools_command(args=None, first_install: bool = False, config: dict = None):
-    """Entry point for `hermes tools` and `hermes setup tools`.
-
-    Args:
-        first_install: When True (set by the setup wizard on fresh installs),
-            skip the platform menu, go straight to the CLI checklist, and
-            prompt for API keys on all enabled tools that need them.
-        config: Optional config dict to use.  When called from the setup
-            wizard, the wizard passes its own dict so that platform_toolsets
-            are written into it and survive the wizard's final save_config().
-    """
-    if config is None:
-        config = load_config()
+def tools_command(args=None):
+    """Entry point for `hermes tools` and `hermes setup tools`."""
+    config = load_config()
    enabled_platforms = _get_enabled_platforms()

    print()
@@ -879,57 +846,6 @@ def tools_command(args=None, first_install: bool = False, config: dict = None):
    print(color("  Tools that need API keys will be configured when enabled.", Colors.DIM))
    print()

-    # ── First-time install: linear flow, no platform menu ──
-    if first_install:
-        for pkey in enabled_platforms:
-            pinfo = PLATFORMS[pkey]
-            current_enabled = _get_platform_tools(config, pkey)
-
-            # Uncheck toolsets that should be off by default
-            checklist_preselected = current_enabled - _DEFAULT_OFF_TOOLSETS
-
-            # Show checklist
-            new_enabled = _prompt_toolset_checklist(pinfo["label"], checklist_preselected)
-
-            added = new_enabled - current_enabled
-            removed = current_enabled - new_enabled
-            if added:
-                for ts in sorted(added):
-                    label = next((l for k, l, _ in CONFIGURABLE_TOOLSETS if k == ts), ts)
-                    print(color(f"  + {label}", Colors.GREEN))
-            if removed:
-                for ts in sorted(removed):
-                    label = next((l for k, l, _ in CONFIGURABLE_TOOLSETS if k == ts), ts)
-                    print(color(f"  - {label}", Colors.RED))
-
-            # Walk through ALL selected tools that have provider options or
-            # need API keys.  This ensures browser (Local vs Browserbase),
-            # TTS (Edge vs OpenAI vs ElevenLabs), etc. are shown even when
-            # a free provider exists.
-            to_configure = [
-                ts_key for ts_key in sorted(new_enabled)
-                if TOOL_CATEGORIES.get(ts_key) or TOOLSET_ENV_REQUIREMENTS.get(ts_key)
-            ]
-
-            if to_configure:
-                print()
-                print(color(f"  Configuring {len(to_configure)} tool(s):", Colors.YELLOW))
-                for ts_key in to_configure:
-                    label = next((l for k, l, _ in CONFIGURABLE_TOOLSETS if k == ts_key), ts_key)
-                    print(color(f"    • {label}", Colors.DIM))
-                print(color("  You can skip any tool you don't need right now.", Colors.DIM))
-                print()
-                for ts_key in to_configure:
-                    _configure_toolset(ts_key, config)
-
-            _save_platform_tools(config, pkey, new_enabled)
-            save_config(config)
-            print(color(f"  ✓ Saved {pinfo['label']} tool configuration", Colors.GREEN))
-            print()
-
-        return
-
-    # ── Returning user: platform menu loop ──
    # Build platform choices
    platform_choices = []
    platform_keys = []
@@ -980,10 +896,11 @@ def tools_command(args=None, first_install: bool = False, config: dict = None):
                    print(color(f"  - {label}", Colors.RED))

            # Configure newly enabled toolsets that need API keys
-            for ts_key in sorted(added):
-                if (TOOL_CATEGORIES.get(ts_key) or TOOLSET_ENV_REQUIREMENTS.get(ts_key)):
-                    if not _toolset_has_keys(ts_key):
-                        _configure_toolset(ts_key, config)
+            if added:
+                for ts_key in sorted(added):
+                    if TOOL_CATEGORIES.get(ts_key) or TOOLSET_ENV_REQUIREMENTS.get(ts_key):
+                        if not _toolset_has_keys(ts_key):
+                            _configure_toolset(ts_key, config)

            _save_platform_tools(config, pkey, new_enabled)
            save_config(config)
@@ -1,207 +0,0 @@
---
-name: solana
-description: Query Solana blockchain data with USD pricing — wallet balances, token portfolios with values, transaction details, NFTs, whale detection, and live network stats. Uses Solana RPC + CoinGecko. No API key required.
-version: 0.2.0
-author: Deniz Alagoz (gizdusum), enhanced by Hermes Agent
-license: MIT
-metadata:
-  hermes:
-    tags: [Solana, Blockchain, Crypto, Web3, RPC, DeFi, NFT]
-    related_skills: []
---
-
-# Solana Blockchain Skill
-
-Query Solana on-chain data enriched with USD pricing via CoinGecko.
-8 commands: wallet portfolio, token info, transactions, activity, NFTs,
-whale detection, network stats, and price lookup.
-
-No API key needed. Uses only Python standard library (urllib, json, argparse).
-
---
-
-## When to Use
-
- User asks for a Solana wallet balance, token holdings, or portfolio value
- User wants to inspect a specific transaction by signature
- User wants SPL token metadata, price, supply, or top holders
- User wants recent transaction history for an address
- User wants NFTs owned by a wallet
- User wants to find large SOL transfers (whale detection)
- User wants Solana network health, TPS, epoch, or SOL price
- User asks "what's the price of BONK/JUP/SOL?"
-
---
-
-## Prerequisites
-
-The helper script uses only Python standard library (urllib, json, argparse).
-No external packages required.
-
-Pricing data comes from CoinGecko's free API (no key needed, rate-limited
-to ~10-30 requests/minute). For faster lookups, use `--no-prices` flag.
-
---
-
-## Quick Reference
-
-RPC endpoint (default): https://api.mainnet-beta.solana.com
-Override: export SOLANA_RPC_URL=https://your-private-rpc.com
-
-Helper script path: ~/.hermes/skills/blockchain/solana/scripts/solana_client.py
-
-```
-python3 solana_client.py wallet   <address> [--limit N] [--all] [--no-prices]
-python3 solana_client.py tx       <signature>
-python3 solana_client.py token    <mint_address>
-python3 solana_client.py activity <address> [--limit N]
-python3 solana_client.py nft      <address>
-python3 solana_client.py whales   [--min-sol N]
-python3 solana_client.py stats
-python3 solana_client.py price    <mint_or_symbol>
-```
-
---
-
-## Procedure
-
-### 0. Setup Check
-
-```bash
-python3 --version
-
-# Optional: set a private RPC for better rate limits
-export SOLANA_RPC_URL="https://api.mainnet-beta.solana.com"
-
-# Confirm connectivity
-python3 ~/.hermes/skills/blockchain/solana/scripts/solana_client.py stats
-```
-
-### 1. Wallet Portfolio
-
-Get SOL balance, SPL token holdings with USD values, NFT count, and
-portfolio total. Tokens sorted by value, dust filtered, known tokens
-labeled by name (BONK, JUP, USDC, etc.).
-
-```bash
-python3 ~/.hermes/skills/blockchain/solana/scripts/solana_client.py \
-  wallet 9WzDXwBbmkg8ZTbNMqUxvQRAyrZzDsGYdLVL9zYtAWWM
-```
-
-Flags:
- `--limit N` — show top N tokens (default: 20)
- `--all` — show all tokens, no dust filter, no limit
- `--no-prices` — skip CoinGecko price lookups (faster, RPC-only)
-
-Output includes: SOL balance + USD value, token list with prices sorted
-by value, dust count, NFT summary, total portfolio value in USD.
-
-### 2. Transaction Details
-
-Inspect a full transaction by its base58 signature. Shows balance changes
-in both SOL and USD.
-
-```bash
-python3 ~/.hermes/skills/blockchain/solana/scripts/solana_client.py \
-  tx 5j7s8K...your_signature_here
-```
-
-Output: slot, timestamp, fee, status, balance changes (SOL + USD),
-program invocations.
-
-### 3. Token Info
-
-Get SPL token metadata, current price, market cap, supply, decimals,
-mint/freeze authorities, and top 5 holders.
-
-```bash
-python3 ~/.hermes/skills/blockchain/solana/scripts/solana_client.py \
-  token DezXAZ8z7PnrnRJjz3wXBoRgixCa6xjnB7YaB1pPB263
-```
-
-Output: name, symbol, decimals, supply, price, market cap, top 5
-holders with percentages.
-
-### 4. Recent Activity
-
-List recent transactions for an address (default: last 10, max: 25).
-
-```bash
-python3 ~/.hermes/skills/blockchain/solana/scripts/solana_client.py \
-  activity 9WzDXwBbmkg8ZTbNMqUxvQRAyrZzDsGYdLVL9zYtAWWM --limit 25
-```
-
-### 5. NFT Portfolio
-
-List NFTs owned by a wallet (heuristic: SPL tokens with amount=1, decimals=0).
-
-```bash
-python3 ~/.hermes/skills/blockchain/solana/scripts/solana_client.py \
-  nft 9WzDXwBbmkg8ZTbNMqUxvQRAyrZzDsGYdLVL9zYtAWWM
-```
-
-Note: Compressed NFTs (cNFTs) are not detected by this heuristic.
-
-### 6. Whale Detector
-
-Scan the most recent block for large SOL transfers with USD values.
-
-```bash
-python3 ~/.hermes/skills/blockchain/solana/scripts/solana_client.py \
-  whales --min-sol 500
-```
-
-Note: scans the latest block only — point-in-time snapshot, not historical.
-
-### 7. Network Stats
-
-Live Solana network health: current slot, epoch, TPS, supply, validator
-version, SOL price, and market cap.
-
-```bash
-python3 ~/.hermes/skills/blockchain/solana/scripts/solana_client.py stats
-```
-
-### 8. Price Lookup
-
-Quick price check for any token by mint address or known symbol.
-
-```bash
-python3 ~/.hermes/skills/blockchain/solana/scripts/solana_client.py price BONK
-python3 ~/.hermes/skills/blockchain/solana/scripts/solana_client.py price JUP
-python3 ~/.hermes/skills/blockchain/solana/scripts/solana_client.py price SOL
-python3 ~/.hermes/skills/blockchain/solana/scripts/solana_client.py price DezXAZ8z7PnrnRJjz3wXBoRgixCa6xjnB7YaB1pPB263
-```
-
-Known symbols: SOL, USDC, USDT, BONK, JUP, WETH, JTO, mSOL, stSOL,
-PYTH, HNT, RNDR, WEN, W, TNSR, DRIFT, bSOL, JLP, WIF, MEW, BOME, PENGU.
-
---
-
-## Pitfalls
-
- **CoinGecko rate-limits** — free tier allows ~10-30 requests/minute.
-  Price lookups use 1 request per token. Wallets with many tokens may
-  not get prices for all of them. Use `--no-prices` for speed.
- **Public RPC rate-limits** — Solana mainnet public RPC limits requests.
-  For production use, set SOLANA_RPC_URL to a private endpoint
-  (Helius, QuickNode, Triton).
- **NFT detection is heuristic** — amount=1 + decimals=0. Compressed
-  NFTs (cNFTs) and Token-2022 NFTs won't appear.
- **Whale detector scans latest block only** — not historical. Results
-  vary by the moment you query.
- **Transaction history** — public RPC keeps ~2 days. Older transactions
-  may not be available.
- **Token names** — ~25 well-known tokens are labeled by name. Others
-  show abbreviated mint addresses. Use the `token` command for full info.
- **Retry on 429** — both RPC and CoinGecko calls retry up to 2 times
-  with exponential backoff on rate-limit errors.
-
---
-
-## Verification
-
-```bash
-# Should print current Solana slot, TPS, and SOL price
-python3 ~/.hermes/skills/blockchain/solana/scripts/solana_client.py stats
-```
@@ -1,698 +0,0 @@
-#!/usr/bin/env python3
-"""
-Solana Blockchain CLI Tool for Hermes Agent
--------------------------------------------
-Queries the Solana JSON-RPC API and CoinGecko for enriched on-chain data.
-Uses only Python standard library — no external packages required.
-
-Usage:
-  python3 solana_client.py stats
-  python3 solana_client.py wallet   <address> [--limit N] [--all] [--no-prices]
-  python3 solana_client.py tx       <signature>
-  python3 solana_client.py token    <mint_address>
-  python3 solana_client.py activity <address> [--limit N]
-  python3 solana_client.py nft      <address>
-  python3 solana_client.py whales   [--min-sol N]
-  python3 solana_client.py price    <mint_address_or_symbol>
-
-Environment:
-  SOLANA_RPC_URL  Override the default RPC endpoint (default: mainnet-beta public)
-"""
-
-import argparse
-import json
-import os
-import sys
-import time
-import urllib.request
-import urllib.error
-from typing import Any, Dict, List, Optional
-
-RPC_URL = os.environ.get(
-    "SOLANA_RPC_URL",
-    "https://api.mainnet-beta.solana.com",
-)
-
-LAMPORTS_PER_SOL = 1_000_000_000
-
-# Well-known Solana token names — avoids API calls for common tokens.
-# Maps mint address → (symbol, name).
-KNOWN_TOKENS: Dict[str, tuple] = {
-    "So11111111111111111111111111111111111111112":  ("SOL",   "Solana"),
-    "EPjFWdd5AufqSSqeM2qN1xzybapC8G4wEGGkZwyTDt1v": ("USDC",  "USD Coin"),
-    "Es9vMFrzaCERmJfrF4H2FYD4KCoNkY11McCe8BenwNYB":  ("USDT",  "Tether"),
-    "DezXAZ8z7PnrnRJjz3wXBoRgixCa6xjnB7YaB1pPB263": ("BONK",  "Bonk"),
-    "JUPyiwrYJFskUPiHa7hkeR8VUtAeFoSYbKedZNsDvCN":  ("JUP",   "Jupiter"),
-    "7vfCXTUXx5WJV5JADk17DUJ4ksgau7utNKj4b963voxs": ("WETH",  "Wrapped Ether"),
-    "jtojtomepa8beP8AuQc6eXt5FriJwfFMwQx2v2f9mCL":  ("JTO",   "Jito"),
-    "mSoLzYCxHdYgdzU16g5QSh3i5K3z3KZK7ytfqcJm7So":  ("mSOL",  "Marinade Staked SOL"),
-    "7dHbWXmci3dT8UFYWYZweBLXgycu7Y3iL6trKn1Y7ARj": ("stSOL", "Lido Staked SOL"),
-    "HZ1JovNiVvGrGNiiYvEozEVgZ58xaU3RKwX8eACQBCt3": ("PYTH",  "Pyth Network"),
-    "RLBxxFkseAZ4RgJH3Sqn8jXxhmGoz9jWxDNJMh8pL7a":  ("RLBB",  "Rollbit"),
-    "hntyVP6YFm1Hg25TN9WGLqM12b8TQmcknKrdu1oxWux":  ("HNT",   "Helium"),
-    "rndrizKT3MK1iimdxRdWabcF7Zg7AR5T4nud4EkHBof":  ("RNDR",  "Render"),
-    "WENWENvqqNya429ubCdR81ZmD69brwQaaBYY6p91oHQQ":  ("WEN",   "Wen"),
-    "85VBFQZC9TZkfaptBWjvUw7YbZjy52A6mjtPGjstQAmQ": ("W",     "Wormhole"),
-    "TNSRxcUxoT9xBG3de7PiJyTDYu7kskLqcpddxnEJAS6":  ("TNSR",  "Tensor"),
-    "DriFtupJYLTosbwoN8koMbEYSx54aFAVLddWsbksjwg7":  ("DRIFT", "Drift"),
-    "bSo13r4TkiE4KumL71LsHTPpL2euBYLFx6h9HP3piy1":  ("bSOL",  "BlazeStake Staked SOL"),
-    "27G8MtK7VtTcCHkpASjSDdkWWYfoqT6ggEuKidVJidD4": ("JLP",   "Jupiter LP"),
-    "EKpQGSJtjMFqKZ9KQanSqYXRcF8fBopzLHYxdM65zcjm": ("WIF",   "dogwifhat"),
-    "MEW1gQWJ3nEXg2qgERiKu7FAFj79PHvQVREQUzScPP5":  ("MEW",   "cat in a dogs world"),
-    "ukHH6c7mMyiWCf1b9pnWe25TSpkDDt3H5pQZgZ74J82":  ("BOME",  "Book of Meme"),
-    "A8C3xuqscfmyLrte3VwJvtPHXvcSN3FjDbUaSMAkQrCS": ("PENGU", "Pudgy Penguins"),
-}
-
-# Reverse lookup: symbol → mint (for the `price` command).
-_SYMBOL_TO_MINT = {v[0].upper(): k for k, v in KNOWN_TOKENS.items()}
-
-
-# ---------------------------------------------------------------------------
-# HTTP / RPC helpers
-# ---------------------------------------------------------------------------
-
-def _http_get_json(url: str, timeout: int = 10, retries: int = 2) -> Any:
-    """GET JSON from a URL with retry on 429 rate-limit. Returns parsed JSON or None."""
-    for attempt in range(retries + 1):
-        req = urllib.request.Request(
-            url, headers={"Accept": "application/json", "User-Agent": "HermesAgent/1.0"},
-        )
-        try:
-            with urllib.request.urlopen(req, timeout=timeout) as resp:
-                return json.load(resp)
-        except urllib.error.HTTPError as exc:
-            if exc.code == 429 and attempt < retries:
-                time.sleep(2.0 * (attempt + 1))
-                continue
-            return None
-        except Exception:
-            return None
-    return None
-
-
-def _rpc_call(method: str, params: list = None, retries: int = 2) -> Any:
-    """Send a JSON-RPC request with retry on 429 rate-limit."""
-    payload = json.dumps({
-        "jsonrpc": "2.0", "id": 1,
-        "method": method, "params": params or [],
-    }).encode()
-
-    for attempt in range(retries + 1):
-        req = urllib.request.Request(
-            RPC_URL, data=payload,
-            headers={"Content-Type": "application/json"}, method="POST",
-        )
-        try:
-            with urllib.request.urlopen(req, timeout=20) as resp:
-                body = json.load(resp)
-            if "error" in body:
-                err = body["error"]
-                # Rate-limit: retry after delay
-                if isinstance(err, dict) and err.get("code") == 429:
-                    if attempt < retries:
-                        time.sleep(1.5 * (attempt + 1))
-                        continue
-                sys.exit(f"RPC error: {err}")
-            return body.get("result")
-        except urllib.error.HTTPError as exc:
-            if exc.code == 429 and attempt < retries:
-                time.sleep(1.5 * (attempt + 1))
-                continue
-            sys.exit(f"RPC HTTP error: {exc}")
-        except urllib.error.URLError as exc:
-            sys.exit(f"RPC connection error: {exc}")
-    return None
-
-
-# Keep backward compat — the rest of the code uses `rpc()`.
-rpc = _rpc_call
-
-
-def rpc_batch(calls: list) -> list:
-    """Send a batch of JSON-RPC requests (with retry on 429)."""
-    payload = json.dumps([
-        {"jsonrpc": "2.0", "id": i, "method": c["method"], "params": c.get("params", [])}
-        for i, c in enumerate(calls)
-    ]).encode()
-
-    for attempt in range(3):
-        req = urllib.request.Request(
-            RPC_URL, data=payload,
-            headers={"Content-Type": "application/json"}, method="POST",
-        )
-        try:
-            with urllib.request.urlopen(req, timeout=20) as resp:
-                return json.load(resp)
-        except urllib.error.HTTPError as exc:
-            if exc.code == 429 and attempt < 2:
-                time.sleep(1.5 * (attempt + 1))
-                continue
-            sys.exit(f"RPC batch HTTP error: {exc}")
-        except urllib.error.URLError as exc:
-            sys.exit(f"RPC batch error: {exc}")
-    return []
-
-
-def lamports_to_sol(lamports: int) -> float:
-    return lamports / LAMPORTS_PER_SOL
-
-
-def print_json(obj: Any) -> None:
-    print(json.dumps(obj, indent=2))
-
-
-def _short_mint(mint: str) -> str:
-    """Abbreviate a mint address for display: first 4 + last 4."""
-    if len(mint) <= 12:
-        return mint
-    return f"{mint[:4]}...{mint[-4:]}"
-
-
-# ---------------------------------------------------------------------------
-# Price & token name helpers (CoinGecko — free, no API key)
-# ---------------------------------------------------------------------------
-
-def fetch_prices(mints: List[str], max_lookups: int = 20) -> Dict[str, float]:
-    """Fetch USD prices for mint addresses via CoinGecko (one per request).
-
-    CoinGecko free tier doesn't support batch Solana token lookups,
-    so we do individual calls — capped at *max_lookups* to stay within
-    rate limits. Returns {mint: usd_price}.
-    """
-    prices: Dict[str, float] = {}
-    for i, mint in enumerate(mints[:max_lookups]):
-        url = (
-            f"https://api.coingecko.com/api/v3/simple/token_price/solana"
-            f"?contract_addresses={mint}&vs_currencies=usd"
-        )
-        data = _http_get_json(url, timeout=10)
-        if data and isinstance(data, dict):
-            for addr, info in data.items():
-                if isinstance(info, dict) and "usd" in info:
-                    prices[mint] = info["usd"]
-                    break
-        # Pause between calls to respect CoinGecko free-tier rate-limits
-        if i < len(mints[:max_lookups]) - 1:
-            time.sleep(1.0)
-    return prices
-
-
-def fetch_sol_price() -> Optional[float]:
-    """Fetch current SOL price in USD via CoinGecko."""
-    data = _http_get_json(
-        "https://api.coingecko.com/api/v3/simple/price?ids=solana&vs_currencies=usd"
-    )
-    if data and "solana" in data:
-        return data["solana"].get("usd")
-    return None
-
-
-def resolve_token_name(mint: str) -> Optional[Dict[str, str]]:
-    """Look up token name and symbol from CoinGecko by mint address.
-
-    Returns {"name": ..., "symbol": ...} or None.
-    """
-    if mint in KNOWN_TOKENS:
-        sym, name = KNOWN_TOKENS[mint]
-        return {"symbol": sym, "name": name}
-    url = f"https://api.coingecko.com/api/v3/coins/solana/contract/{mint}"
-    data = _http_get_json(url, timeout=10)
-    if data and "symbol" in data:
-        return {"symbol": data["symbol"].upper(), "name": data.get("name", "")}
-    return None
-
-
-def _token_label(mint: str) -> str:
-    """Return a human-readable label for a mint: symbol if known, else abbreviated address."""
-    if mint in KNOWN_TOKENS:
-        return KNOWN_TOKENS[mint][0]
-    return _short_mint(mint)
-
-
-# ---------------------------------------------------------------------------
-# 1. Network Stats
-# ---------------------------------------------------------------------------
-
-def cmd_stats(_args):
-    """Live Solana network: slot, epoch, TPS, supply, version, SOL price."""
-    results = rpc_batch([
-        {"method": "getSlot"},
-        {"method": "getEpochInfo"},
-        {"method": "getRecentPerformanceSamples", "params": [1]},
-        {"method": "getSupply"},
-        {"method": "getVersion"},
-    ])
-
-    by_id = {r["id"]: r.get("result") for r in results}
-
-    slot         = by_id.get(0)
-    epoch_info   = by_id.get(1)
-    perf_samples = by_id.get(2)
-    supply       = by_id.get(3)
-    version      = by_id.get(4)
-
-    tps = None
-    if perf_samples:
-        s = perf_samples[0]
-        tps = round(s["numTransactions"] / s["samplePeriodSecs"], 1)
-
-    total_supply = lamports_to_sol(supply["value"]["total"])      if supply else None
-    circ_supply  = lamports_to_sol(supply["value"]["circulating"]) if supply else None
-
-    sol_price = fetch_sol_price()
-
-    out = {
-        "slot":                   slot,
-        "epoch":                  epoch_info.get("epoch")     if epoch_info else None,
-        "slot_in_epoch":          epoch_info.get("slotIndex") if epoch_info else None,
-        "tps":                    tps,
-        "total_supply_SOL":       round(total_supply, 2) if total_supply else None,
-        "circulating_supply_SOL": round(circ_supply, 2)  if circ_supply  else None,
-        "validator_version":      version.get("solana-core")  if version   else None,
-    }
-    if sol_price is not None:
-        out["sol_price_usd"] = sol_price
-        if circ_supply:
-            out["market_cap_usd"] = round(sol_price * circ_supply, 0)
-    print_json(out)
-
-
-# ---------------------------------------------------------------------------
-# 2. Wallet Info (enhanced with prices, sorting, filtering)
-# ---------------------------------------------------------------------------
-
-def cmd_wallet(args):
-    """SOL balance + SPL token holdings with USD values."""
-    address = args.address
-    show_all = getattr(args, "all", False)
-    limit = getattr(args, "limit", 20) or 20
-    skip_prices = getattr(args, "no_prices", False)
-
-    # Fetch SOL balance
-    balance_result = rpc("getBalance", [address])
-    sol_balance = lamports_to_sol(balance_result["value"])
-
-    # Fetch all SPL token accounts
-    token_result = rpc("getTokenAccountsByOwner", [
-        address,
-        {"programId": "TokenkegQfeZyiNwAJbNbGKPFXCWuBvf9Ss623VQ5DA"},
-        {"encoding": "jsonParsed"},
-    ])
-
-    raw_tokens = []
-    for acct in (token_result.get("value") or []):
-        info = acct["account"]["data"]["parsed"]["info"]
-        ta = info["tokenAmount"]
-        amount = float(ta.get("uiAmountString") or 0)
-        if amount > 0:
-            raw_tokens.append({
-                "mint":     info["mint"],
-                "amount":   amount,
-                "decimals": ta["decimals"],
-            })
-
-    # Separate NFTs (amount=1, decimals=0) from fungible tokens
-    nfts = [t for t in raw_tokens if t["decimals"] == 0 and t["amount"] == 1]
-    fungible = [t for t in raw_tokens if not (t["decimals"] == 0 and t["amount"] == 1)]
-
-    # Fetch prices for fungible tokens (cap lookups to avoid API abuse)
-    sol_price = None
-    prices: Dict[str, float] = {}
-    if not skip_prices and fungible:
-        sol_price = fetch_sol_price()
-        # Prioritize known tokens, then a small sample of unknowns.
-        # CoinGecko free tier = 1 request per mint, so we cap lookups.
-        known_mints = [t["mint"] for t in fungible if t["mint"] in KNOWN_TOKENS]
-        other_mints = [t["mint"] for t in fungible if t["mint"] not in KNOWN_TOKENS][:15]
-        mints_to_price = known_mints + other_mints
-        if mints_to_price:
-            prices = fetch_prices(mints_to_price, max_lookups=30)
-
-    # Enrich tokens with labels and USD values
-    enriched = []
-    dust_count = 0
-    dust_value = 0.0
-    for t in fungible:
-        mint = t["mint"]
-        label = _token_label(mint)
-        usd_price = prices.get(mint)
-        usd_value = round(usd_price * t["amount"], 2) if usd_price else None
-
-        # Filter dust (< $0.01) unless --all
-        if not show_all and usd_value is not None and usd_value < 0.01:
-            dust_count += 1
-            dust_value += usd_value
-            continue
-
-        entry = {"token": label, "mint": mint, "amount": t["amount"]}
-        if usd_price is not None:
-            entry["price_usd"] = usd_price
-            entry["value_usd"] = usd_value
-        enriched.append(entry)
-
-    # Sort: tokens with known USD value first (highest→lowest), then unknowns
-    enriched.sort(key=lambda x: (x.get("value_usd") is not None, x.get("value_usd") or 0), reverse=True)
-
-    # Apply limit unless --all
-    total_tokens = len(enriched)
-    if not show_all and len(enriched) > limit:
-        enriched = enriched[:limit]
-
-    # Compute portfolio total
-    total_usd = sum(t.get("value_usd", 0) for t in enriched)
-    sol_value_usd = round(sol_price * sol_balance, 2) if sol_price else None
-    if sol_value_usd:
-        total_usd += sol_value_usd
-    total_usd += dust_value
-
-    output = {
-        "address":     address,
-        "sol_balance":  round(sol_balance, 9),
-    }
-    if sol_price:
-        output["sol_price_usd"] = sol_price
-        output["sol_value_usd"] = sol_value_usd
-    output["tokens_shown"] = len(enriched)
-    if total_tokens > len(enriched):
-        output["tokens_hidden"] = total_tokens - len(enriched)
-    output["spl_tokens"] = enriched
-    if dust_count > 0:
-        output["dust_filtered"] = {"count": dust_count, "total_value_usd": round(dust_value, 4)}
-    output["nft_count"] = len(nfts)
-    if nfts:
-        output["nfts"] = [_token_label(n["mint"]) + f" ({_short_mint(n['mint'])})" for n in nfts[:10]]
-        if len(nfts) > 10:
-            output["nfts"].append(f"... and {len(nfts) - 10} more")
-    if total_usd > 0:
-        output["portfolio_total_usd"] = round(total_usd, 2)
-
-    print_json(output)
-
-
-# ---------------------------------------------------------------------------
-# 3. Transaction Details
-# ---------------------------------------------------------------------------
-
-def cmd_tx(args):
-    """Full transaction details by signature."""
-    result = rpc("getTransaction", [
-        args.signature,
-        {"encoding": "jsonParsed", "maxSupportedTransactionVersion": 0},
-    ])
-
-    if result is None:
-        sys.exit("Transaction not found (may be too old for public RPC history).")
-
-    meta         = result.get("meta", {}) or {}
-    msg          = result.get("transaction", {}).get("message", {})
-    account_keys = msg.get("accountKeys", [])
-
-    pre  = meta.get("preBalances",  [])
-    post = meta.get("postBalances", [])
-
-    balance_changes = []
-    for i, key in enumerate(account_keys):
-        acct_key = key["pubkey"] if isinstance(key, dict) else key
-        if i < len(pre) and i < len(post):
-            change = lamports_to_sol(post[i] - pre[i])
-            if change != 0:
-                balance_changes.append({"account": acct_key, "change_SOL": round(change, 9)})
-
-    programs = []
-    for ix in msg.get("instructions", []):
-        prog = ix.get("programId")
-        if prog is None and "programIdIndex" in ix:
-            k = account_keys[ix["programIdIndex"]]
-            prog = k["pubkey"] if isinstance(k, dict) else k
-        if prog:
-            programs.append(prog)
-
-    # Add USD value for SOL changes
-    sol_price = fetch_sol_price()
-    if sol_price and balance_changes:
-        for bc in balance_changes:
-            bc["change_USD"] = round(bc["change_SOL"] * sol_price, 2)
-
-    print_json({
-        "signature":        args.signature,
-        "slot":             result.get("slot"),
-        "block_time":       result.get("blockTime"),
-        "fee_SOL":          lamports_to_sol(meta.get("fee", 0)),
-        "status":           "success" if meta.get("err") is None else "failed",
-        "balance_changes":  balance_changes,
-        "programs_invoked": list(dict.fromkeys(programs)),
-    })
-
-
-# ---------------------------------------------------------------------------
-# 4. Token Info (enhanced with name + price)
-# ---------------------------------------------------------------------------
-
-def cmd_token(args):
-    """SPL token metadata, supply, decimals, price, top holders."""
-    mint = args.mint
-
-    mint_info = rpc("getAccountInfo", [mint, {"encoding": "jsonParsed"}])
-    if mint_info is None or mint_info.get("value") is None:
-        sys.exit("Mint account not found.")
-
-    parsed       = mint_info["value"]["data"]["parsed"]["info"]
-    decimals     = parsed.get("decimals", 0)
-    supply_raw   = int(parsed.get("supply", 0))
-    supply_human = supply_raw / (10 ** decimals) if decimals else supply_raw
-
-    largest = rpc("getTokenLargestAccounts", [mint])
-    holders = []
-    for acct in (largest.get("value") or [])[:5]:
-        amount = float(acct.get("uiAmountString") or 0)
-        pct = round((amount / supply_human * 100), 4) if supply_human > 0 else 0
-        holders.append({
-            "account": acct["address"],
-            "amount":  amount,
-            "percent": pct,
-        })
-
-    # Resolve name + price
-    token_meta = resolve_token_name(mint)
-    price_data = fetch_prices([mint])
-
-    out = {"mint": mint}
-    if token_meta:
-        out["name"] = token_meta["name"]
-        out["symbol"] = token_meta["symbol"]
-    out["decimals"] = decimals
-    out["supply"] = round(supply_human, min(decimals, 6))
-    out["mint_authority"] = parsed.get("mintAuthority")
-    out["freeze_authority"] = parsed.get("freezeAuthority")
-    if mint in price_data:
-        out["price_usd"] = price_data[mint]
-        out["market_cap_usd"] = round(price_data[mint] * supply_human, 0)
-    out["top_5_holders"] = holders
-
-    print_json(out)
-
-
-# ---------------------------------------------------------------------------
-# 5. Recent Activity
-# ---------------------------------------------------------------------------
-
-def cmd_activity(args):
-    """Recent transaction signatures for an address."""
-    limit  = min(args.limit, 25)
-    result = rpc("getSignaturesForAddress", [args.address, {"limit": limit}])
-
-    txs = [
-        {
-            "signature": item["signature"],
-            "slot":       item.get("slot"),
-            "block_time": item.get("blockTime"),
-            "err":        item.get("err"),
-        }
-        for item in (result or [])
-    ]
-
-    print_json({"address": args.address, "transactions": txs})
-
-
-# ---------------------------------------------------------------------------
-# 6. NFT Portfolio
-# ---------------------------------------------------------------------------
-
-def cmd_nft(args):
-    """NFTs owned by a wallet (amount=1 && decimals=0 heuristic)."""
-    result = rpc("getTokenAccountsByOwner", [
-        args.address,
-        {"programId": "TokenkegQfeZyiNwAJbNbGKPFXCWuBvf9Ss623VQ5DA"},
-        {"encoding": "jsonParsed"},
-    ])
-
-    nfts = [
-        acct["account"]["data"]["parsed"]["info"]["mint"]
-        for acct in (result.get("value") or [])
-        if acct["account"]["data"]["parsed"]["info"]["tokenAmount"]["decimals"] == 0
-        and int(acct["account"]["data"]["parsed"]["info"]["tokenAmount"]["amount"]) == 1
-    ]
-
-    print_json({
-        "address":   args.address,
-        "nft_count": len(nfts),
-        "nfts":      nfts,
-        "note":      "Heuristic only. Compressed NFTs (cNFTs) are not detected.",
-    })
-
-
-# ---------------------------------------------------------------------------
-# 7. Whale Detector (enhanced with USD values)
-# ---------------------------------------------------------------------------
-
-def cmd_whales(args):
-    """Scan the latest block for large SOL transfers."""
-    min_lamports = int(args.min_sol * LAMPORTS_PER_SOL)
-
-    slot  = rpc("getSlot")
-    block = rpc("getBlock", [
-        slot,
-        {
-            "encoding": "jsonParsed",
-            "transactionDetails": "full",
-            "maxSupportedTransactionVersion": 0,
-            "rewards": False,
-        },
-    ])
-
-    if block is None:
-        sys.exit("Could not retrieve latest block.")
-
-    sol_price = fetch_sol_price()
-
-    whales = []
-    for tx in (block.get("transactions") or []):
-        meta = tx.get("meta", {}) or {}
-        if meta.get("err") is not None:
-            continue
-
-        msg          = tx["transaction"].get("message", {})
-        account_keys = msg.get("accountKeys", [])
-        pre          = meta.get("preBalances",  [])
-        post         = meta.get("postBalances", [])
-
-        for i in range(len(pre)):
-            change = post[i] - pre[i]
-            if change >= min_lamports:
-                k        = account_keys[i]
-                receiver = k["pubkey"] if isinstance(k, dict) else k
-                sender   = None
-                for j in range(len(pre)):
-                    if pre[j] - post[j] >= min_lamports:
-                        sk     = account_keys[j]
-                        sender = sk["pubkey"] if isinstance(sk, dict) else sk
-                        break
-                entry = {
-                    "sender":     sender,
-                    "receiver":   receiver,
-                    "amount_SOL": round(lamports_to_sol(change), 4),
-                }
-                if sol_price:
-                    entry["amount_USD"] = round(lamports_to_sol(change) * sol_price, 2)
-                whales.append(entry)
-
-    out = {
-        "slot":              slot,
-        "min_threshold_SOL": args.min_sol,
-        "large_transfers":   whales,
-        "note":              "Scans latest block only — point-in-time snapshot.",
-    }
-    if sol_price:
-        out["sol_price_usd"] = sol_price
-    print_json(out)
-
-
-# ---------------------------------------------------------------------------
-# 8. Price Lookup
-# ---------------------------------------------------------------------------
-
-def cmd_price(args):
-    """Quick price lookup for a token by mint address or known symbol."""
-    query = args.token
-
-    # Check if it's a known symbol
-    mint = _SYMBOL_TO_MINT.get(query.upper(), query)
-
-    # Try to resolve name
-    token_meta = resolve_token_name(mint)
-
-    # Fetch price
-    prices = fetch_prices([mint])
-
-    out = {"query": query, "mint": mint}
-    if token_meta:
-        out["name"] = token_meta["name"]
-        out["symbol"] = token_meta["symbol"]
-    if mint in prices:
-        out["price_usd"] = prices[mint]
-    else:
-        out["price_usd"] = None
-        out["note"] = "Price not available — token may not be listed on CoinGecko."
-    print_json(out)
-
-
-# ---------------------------------------------------------------------------
-# CLI
-# ---------------------------------------------------------------------------
-
-def main():
-    parser = argparse.ArgumentParser(
-        prog="solana_client.py",
-        description="Solana blockchain query tool for Hermes Agent",
-    )
-    sub = parser.add_subparsers(dest="command", required=True)
-
-    sub.add_parser("stats", help="Network stats: slot, epoch, TPS, supply, SOL price")
-
-    p_wallet = sub.add_parser("wallet", help="SOL balance + SPL tokens with USD values")
-    p_wallet.add_argument("address")
-    p_wallet.add_argument("--limit", type=int, default=20,
-                          help="Max tokens to display (default: 20)")
-    p_wallet.add_argument("--all", action="store_true",
-                          help="Show all tokens (no limit, no dust filter)")
-    p_wallet.add_argument("--no-prices", action="store_true",
-                          help="Skip price lookups (faster, RPC-only)")
-
-    p_tx = sub.add_parser("tx", help="Transaction details by signature")
-    p_tx.add_argument("signature")
-
-    p_token = sub.add_parser("token", help="SPL token metadata, price, and top holders")
-    p_token.add_argument("mint")
-
-    p_activity = sub.add_parser("activity", help="Recent transactions for an address")
-    p_activity.add_argument("address")
-    p_activity.add_argument("--limit", type=int, default=10,
-                            help="Number of transactions (max 25, default 10)")
-
-    p_nft = sub.add_parser("nft", help="NFT portfolio for a wallet")
-    p_nft.add_argument("address")
-
-    p_whales = sub.add_parser("whales", help="Large SOL transfers in the latest block")
-    p_whales.add_argument("--min-sol", type=float, default=1000.0,
-                          help="Minimum SOL transfer size (default: 1000)")
-
-    p_price = sub.add_parser("price", help="Quick price lookup by mint or symbol")
-    p_price.add_argument("token", help="Mint address or known symbol (SOL, BONK, JUP, ...)")
-
-    args = parser.parse_args()
-
-    dispatch = {
-        "stats":    cmd_stats,
-        "wallet":   cmd_wallet,
-        "tx":       cmd_tx,
-        "token":    cmd_token,
-        "activity": cmd_activity,
-        "nft":      cmd_nft,
-        "whales":   cmd_whales,
-        "price":    cmd_price,
-    }
-    dispatch[args.command](args)
-
-
-if __name__ == "__main__":
-    main()
@@ -1,125 +0,0 @@
---
-name: agentmail
-description: Give the agent its own dedicated email inbox via AgentMail. Send, receive, and manage email autonomously using agent-owned email addresses (e.g. hermes-agent@agentmail.to).
-version: 1.0.0
-metadata:
-  hermes:
-    tags: [email, communication, agentmail, mcp]
-    category: email
---
-
-# AgentMail — Agent-Owned Email Inboxes
-
-## Requirements
-
- **AgentMail API key** (required) — sign up at https://console.agentmail.to (free tier: 3 inboxes, 3,000 emails/month; paid plans from $20/mo)
- Node.js 18+ (for the MCP server)
-
-## When to Use
-Use this skill when you need to:
- Give the agent its own dedicated email address
- Send emails autonomously on behalf of the agent
- Receive and read incoming emails
- Manage email threads and conversations
- Sign up for services or authenticate via email
- Communicate with other agents or humans via email
-
-This is NOT for reading the user's personal email (use himalaya or Gmail for that).
-AgentMail gives the agent its own identity and inbox.
-
-## Setup
-
-### 1. Get an API Key
- Go to https://console.agentmail.to
- Create an account and generate an API key (starts with `am_`)
-
-### 2. Configure MCP Server
-Add to `~/.hermes/config.yaml` (paste your actual key — MCP env vars are not expanded from .env):
-```yaml
-mcp_servers:
-  agentmail:
-    command: "npx"
-    args: ["-y", "agentmail-mcp"]
-    env:
-      AGENTMAIL_API_KEY: "am_your_key_here"
-```
-
-### 3. Restart Hermes
-```bash
-hermes
-```
-All 11 AgentMail tools are now available automatically.
-
-## Available Tools (via MCP)
-
-| Tool | Description |
-|------|-------------|
-| `list_inboxes` | List all agent inboxes |
-| `get_inbox` | Get details of a specific inbox |
-| `create_inbox` | Create a new inbox (gets a real email address) |
-| `delete_inbox` | Delete an inbox |
-| `list_threads` | List email threads in an inbox |
-| `get_thread` | Get a specific email thread |
-| `send_message` | Send a new email |
-| `reply_to_message` | Reply to an existing email |
-| `forward_message` | Forward an email |
-| `update_message` | Update message labels/status |
-| `get_attachment` | Download an email attachment |
-
-## Procedure
-
-### Create an inbox and send an email
-1. Create a dedicated inbox:
-   - Use `create_inbox` with a username (e.g. `hermes-agent`)
-   - The agent gets address: `hermes-agent@agentmail.to`
-2. Send an email:
-   - Use `send_message` with `inbox_id`, `to`, `subject`, `text`
-3. Check for replies:
-   - Use `list_threads` to see incoming conversations
-   - Use `get_thread` to read a specific thread
-
-### Check incoming email
-1. Use `list_inboxes` to find your inbox ID
-2. Use `list_threads` with the inbox ID to see conversations
-3. Use `get_thread` to read a thread and its messages
-
-### Reply to an email
-1. Get the thread with `get_thread`
-2. Use `reply_to_message` with the message ID and your reply text
-
-## Example Workflows
-
-**Sign up for a service:**
-```
-1. create_inbox (username: "signup-bot")
-2. Use the inbox address to register on the service
-3. list_threads to check for verification email
-4. get_thread to read the verification code
-```
-
-**Agent-to-human outreach:**
-```
-1. create_inbox (username: "hermes-outreach")
-2. send_message (to: user@example.com, subject: "Hello", text: "...")
-3. list_threads to check for replies
-```
-
-## Pitfalls
- Free tier limited to 3 inboxes and 3,000 emails/month
- Emails come from `@agentmail.to` domain on free tier (custom domains on paid plans)
- Node.js (18+) is required for the MCP server (`npx -y agentmail-mcp`)
- The `mcp` Python package must be installed: `pip install mcp`
- Real-time inbound email (webhooks) requires a public server — use `list_threads` polling via cronjob instead for personal use
-
-## Verification
-After setup, test with:
-```
-hermes --toolsets mcp -q "Create an AgentMail inbox called test-agent and tell me its email address"
-```
-You should see the new inbox address returned.
-
-## References
- AgentMail docs: https://docs.agentmail.to/
- AgentMail console: https://console.agentmail.to
- AgentMail MCP repo: https://github.com/agentmail-to/agentmail-mcp
- Pricing: https://www.agentmail.to/pricing
@@ -183,7 +183,6 @@ class AIAgent:
        session_db=None,
        honcho_session_key: str = None,
        iteration_budget: "IterationBudget" = None,
-        fallback_model: Dict[str, Any] = None,
    ):
        """
        Initialize the AI Agent.
@@ -407,17 +406,6 @@ class AIAgent:
        except Exception as e:
            raise RuntimeError(f"Failed to initialize OpenAI client: {e}")
        
-        # Provider fallback — a single backup model/provider tried when the
-        # primary is exhausted (rate-limit, overload, connection failure).
-        # Config shape: {"provider": "openrouter", "model": "anthropic/claude-sonnet-4"}
-        self._fallback_model = fallback_model if isinstance(fallback_model, dict) else None
-        self._fallback_activated = False
-        if self._fallback_model:
-            fb_p = self._fallback_model.get("provider", "")
-            fb_m = self._fallback_model.get("model", "")
-            if fb_p and fb_m and not self.quiet_mode:
-                print(f"🔄 Fallback model: {fb_m} ({fb_p})")
-
        # Get available tools with filtering
        self.tools = get_tool_definitions(
            enabled_toolsets=enabled_toolsets,
@@ -2158,141 +2146,6 @@ class AIAgent:
            raise result["error"]
        return result["response"]

-    # ── Provider fallback ──────────────────────────────────────────────────
-
-    # API-key providers: provider → (base_url, [env_var_names])
-    _FALLBACK_API_KEY_PROVIDERS = {
-        "openrouter": (OPENROUTER_BASE_URL, ["OPENROUTER_API_KEY"]),
-        "zai": ("https://api.z.ai/api/paas/v4", ["ZAI_API_KEY", "Z_AI_API_KEY"]),
-        "kimi-coding": ("https://api.moonshot.ai/v1", ["KIMI_API_KEY"]),
-        "minimax": ("https://api.minimax.io/v1", ["MINIMAX_API_KEY"]),
-        "minimax-cn": ("https://api.minimaxi.com/v1", ["MINIMAX_CN_API_KEY"]),
-    }
-
-    # OAuth providers: provider → (resolver_import_path, api_mode)
-    # Each resolver returns {"api_key": ..., "base_url": ...}.
-    _FALLBACK_OAUTH_PROVIDERS = {
-        "openai-codex": ("resolve_codex_runtime_credentials", "codex_responses"),
-        "nous": ("resolve_nous_runtime_credentials", "chat_completions"),
-    }
-
-    def _resolve_fallback_credentials(
-        self, fb_provider: str, fb_config: dict
-    ) -> Optional[tuple]:
-        """Resolve credentials for a fallback provider.
-
-        Returns (api_key, base_url, api_mode) on success, or None on failure.
-        Handles three cases:
-          1. OAuth providers (openai-codex, nous) — call credential resolver
-          2. API-key providers (openrouter, zai, etc.) — read env var
-          3. Custom endpoints — use base_url + api_key_env from config
-        """
-        # ── 1. OAuth providers ────────────────────────────────────────
-        if fb_provider in self._FALLBACK_OAUTH_PROVIDERS:
-            resolver_name, api_mode = self._FALLBACK_OAUTH_PROVIDERS[fb_provider]
-            try:
-                import hermes_cli.auth as _auth
-                resolver = getattr(_auth, resolver_name)
-                creds = resolver()
-                return creds["api_key"], creds["base_url"], api_mode
-            except Exception as e:
-                logging.warning(
-                    "Fallback to %s failed (credential resolution): %s",
-                    fb_provider, e,
-                )
-                return None
-
-        # ── 2. API-key providers ──────────────────────────────────────
-        fb_key = (fb_config.get("api_key") or "").strip()
-        if not fb_key:
-            key_env = (fb_config.get("api_key_env") or "").strip()
-            if key_env:
-                fb_key = os.getenv(key_env, "")
-            elif fb_provider in self._FALLBACK_API_KEY_PROVIDERS:
-                for env_var in self._FALLBACK_API_KEY_PROVIDERS[fb_provider][1]:
-                    fb_key = os.getenv(env_var, "")
-                    if fb_key:
-                        break
-        if not fb_key:
-            logging.warning(
-                "Fallback model configured but no API key found for provider '%s'",
-                fb_provider,
-            )
-            return None
-
-        # ── 3. Resolve base URL ───────────────────────────────────────
-        fb_base_url = (fb_config.get("base_url") or "").strip()
-        if not fb_base_url and fb_provider in self._FALLBACK_API_KEY_PROVIDERS:
-            fb_base_url = self._FALLBACK_API_KEY_PROVIDERS[fb_provider][0]
-        if not fb_base_url:
-            fb_base_url = OPENROUTER_BASE_URL
-
-        return fb_key, fb_base_url, "chat_completions"
-
-    def _try_activate_fallback(self) -> bool:
-        """Switch to the configured fallback model/provider.
-
-        Called when the primary model is failing after retries.  Swaps the
-        OpenAI client, model slug, and provider in-place so the retry loop
-        can continue with the new backend.  One-shot: returns False if
-        already activated or not configured.
-        """
-        if self._fallback_activated or not self._fallback_model:
-            return False
-
-        fb = self._fallback_model
-        fb_provider = (fb.get("provider") or "").strip().lower()
-        fb_model = (fb.get("model") or "").strip()
-        if not fb_provider or not fb_model:
-            return False
-
-        resolved = self._resolve_fallback_credentials(fb_provider, fb)
-        if resolved is None:
-            return False
-        fb_key, fb_base_url, fb_api_mode = resolved
-
-        # Build new client
-        try:
-            client_kwargs = {"api_key": fb_key, "base_url": fb_base_url}
-            if "openrouter" in fb_base_url.lower():
-                client_kwargs["default_headers"] = {
-                    "HTTP-Referer": "https://github.com/NousResearch/hermes-agent",
-                    "X-OpenRouter-Title": "Hermes Agent",
-                    "X-OpenRouter-Categories": "productivity,cli-agent",
-                }
-            elif "api.kimi.com" in fb_base_url.lower():
-                client_kwargs["default_headers"] = {"User-Agent": "KimiCLI/1.0"}
-
-            self.client = OpenAI(**client_kwargs)
-            self._client_kwargs = client_kwargs
-            old_model = self.model
-            self.model = fb_model
-            self.provider = fb_provider
-            self.base_url = fb_base_url
-            self.api_mode = fb_api_mode
-            self._fallback_activated = True
-
-            # Re-evaluate prompt caching for the new provider/model
-            self._use_prompt_caching = (
-                "openrouter" in fb_base_url.lower()
-                and "claude" in fb_model.lower()
-            )
-
-            print(
-                f"{self.log_prefix}🔄 Primary model failed — switching to fallback: "
-                f"{fb_model} via {fb_provider}"
-            )
-            logging.info(
-                "Fallback activated: %s → %s (%s)",
-                old_model, fb_model, fb_provider,
-            )
-            return True
-        except Exception as e:
-            logging.error("Failed to activate fallback model: %s", e)
-            return False
-
-    # ── End provider fallback ──────────────────────────────────────────────
-
    def _build_api_kwargs(self, api_messages: list) -> dict:
        """Build the keyword arguments dict for the active API mode."""
        if self.api_mode == "codex_responses":
@@ -2666,10 +2519,9 @@ class AIAgent:
                if remaining_calls:
                    print(f"{self.log_prefix}⚡ Interrupt: skipping {len(remaining_calls)} tool call(s)")
                for skipped_tc in remaining_calls:
-                    skipped_name = skipped_tc.function.name
                    skip_msg = {
                        "role": "tool",
-                        "content": f"[Tool execution cancelled — {skipped_name} was skipped due to user interrupt]",
+                        "content": "[Tool execution cancelled - user interrupted]",
                        "tool_call_id": skipped_tc.id,
                    }
                    messages.append(skip_msg)
@@ -2872,10 +2724,9 @@ class AIAgent:
                remaining = len(assistant_message.tool_calls) - i
                print(f"{self.log_prefix}⚡ Interrupt: skipping {remaining} remaining tool call(s)")
                for skipped_tc in assistant_message.tool_calls[i:]:
-                    skipped_name = skipped_tc.function.name
                    skip_msg = {
                        "role": "tool",
-                        "content": f"[Tool execution skipped — {skipped_name} was not started. User sent a new message]",
+                        "content": "[Tool execution skipped - user sent a new message]",
                        "tool_call_id": skipped_tc.id
                    }
                    messages.append(skip_msg)
@@ -3092,14 +2943,9 @@ class AIAgent:
            )
            self._iters_since_skill = 0

-        # Honcho prefetch: retrieve user context for system prompt injection.
-        # Only on the FIRST turn of a session (empty history).  On subsequent
-        # turns the model already has all prior context in its conversation
-        # history, and the Honcho context is baked into the stored system
-        # prompt — re-fetching it would change the system message and break
-        # Anthropic prompt caching.
+        # Honcho prefetch: retrieve user context for system prompt injection
        self._honcho_context = ""
-        if self._honcho and self._honcho_session_key and not conversation_history:
+        if self._honcho and self._honcho_session_key:
            try:
                self._honcho_context = self._honcho_prefetch(user_message)
            except Exception as e:
@@ -3117,42 +2963,14 @@ class AIAgent:
        # Built once on first call, reused for all subsequent calls.
        # Only rebuilt after context compression events (which invalidate
        # the cache and reload memory from disk).
-        #
-        # For continuing sessions (gateway creates a fresh AIAgent per
-        # message), we load the stored system prompt from the session DB
-        # instead of rebuilding.  Rebuilding would pick up memory changes
-        # from disk that the model already knows about (it wrote them!),
-        # producing a different system prompt and breaking the Anthropic
-        # prefix cache.
        if self._cached_system_prompt is None:
-            stored_prompt = None
-            if conversation_history and self._session_db:
+            self._cached_system_prompt = self._build_system_prompt(system_message)
+            # Store the system prompt snapshot in SQLite
+            if self._session_db:
                try:
-                    session_row = self._session_db.get_session(self.session_id)
-                    if session_row:
-                        stored_prompt = session_row.get("system_prompt") or None
-                except Exception:
-                    pass  # Fall through to build fresh
-
-            if stored_prompt:
-                # Continuing session — reuse the exact system prompt from
-                # the previous turn so the Anthropic cache prefix matches.
-                self._cached_system_prompt = stored_prompt
-            else:
-                # First turn of a new session — build from scratch.
-                self._cached_system_prompt = self._build_system_prompt(system_message)
-                # Bake Honcho context into the prompt so it's stable for
-                # the entire session (not re-fetched per turn).
-                if self._honcho_context:
-                    self._cached_system_prompt = (
-                        self._cached_system_prompt + "\n\n" + self._honcho_context
-                    ).strip()
-                # Store the system prompt snapshot in SQLite
-                if self._session_db:
-                    try:
-                        self._session_db.update_system_prompt(self.session_id, self._cached_system_prompt)
-                    except Exception as e:
-                        logger.debug("Session DB update_system_prompt failed: %s", e)
+                    self._session_db.update_system_prompt(self.session_id, self._cached_system_prompt)
+                except Exception as e:
+                    logger.debug("Session DB update_system_prompt failed: %s", e)

        active_system_prompt = self._cached_system_prompt

@@ -3277,13 +3095,11 @@ class AIAgent:
            # Build the final system message: cached prompt + ephemeral system prompt.
            # The ephemeral part is appended here (not baked into the cached prompt)
            # so it stays out of the session DB and logs.
-            # Note: Honcho context is baked into _cached_system_prompt on the first
-            # turn and stored in the session DB, so it does NOT need to be injected
-            # here.  This keeps the system message identical across all turns in a
-            # session, maximizing Anthropic prompt cache hits.
            effective_system = active_system_prompt or ""
            if self.ephemeral_system_prompt:
                effective_system = (effective_system + "\n\n" + self.ephemeral_system_prompt).strip()
+            if self._honcho_context:
+                effective_system = (effective_system + "\n\n" + self._honcho_context).strip()
            if effective_system:
                api_messages = [{"role": "system", "content": effective_system}] + api_messages
            
@@ -3434,10 +3250,6 @@ class AIAgent:
                        print(f"{self.log_prefix}   ⏱️  Response time: {api_duration:.2f}s (fast response often indicates rate limiting)")
                        
                        if retry_count >= max_retries:
-                            # Try fallback before giving up
-                            if self._try_activate_fallback():
-                                retry_count = 0
-                                continue
                            print(f"{self.log_prefix}❌ Max retries ({max_retries}) exceeded for invalid responses. Giving up.")
                            logging.error(f"{self.log_prefix}Invalid API response after {max_retries} retries.")
                            self._persist_session(messages, conversation_history)
@@ -3462,7 +3274,7 @@ class AIAgent:
                                self._persist_session(messages, conversation_history)
                                self.clear_interrupt()
                                return {
-                                    "final_response": f"Operation interrupted: retrying API call after rate limit (retry {retry_count}/{max_retries}).",
+                                    "final_response": "Operation interrupted.",
                                    "messages": messages,
                                    "api_calls": api_call_count,
                                    "completed": False,
@@ -3571,11 +3383,10 @@ class AIAgent:
                    if thinking_spinner:
                        thinking_spinner.stop("")
                        thinking_spinner = None
-                    api_elapsed = time.time() - api_start_time
                    print(f"{self.log_prefix}⚡ Interrupted during API call.")
                    self._persist_session(messages, conversation_history)
                    interrupted = True
-                    final_response = f"Operation interrupted: waiting for model response ({api_elapsed:.1f}s elapsed)."
+                    final_response = "Operation interrupted."
                    break

                except Exception as api_error:
@@ -3624,7 +3435,7 @@ class AIAgent:
                        self._persist_session(messages, conversation_history)
                        self.clear_interrupt()
                        return {
-                            "final_response": f"Operation interrupted: handling API error ({error_type}: {str(api_error)[:80]}).",
+                            "final_response": "Operation interrupted.",
                            "messages": messages,
                            "api_calls": api_call_count,
                            "completed": False,
@@ -3762,11 +3573,6 @@ class AIAgent:
                    ])) and not is_context_length_error

                    if is_client_error:
-                        # Try fallback before aborting — a different provider
-                        # may not have the same issue (rate limit, auth, etc.)
-                        if self._try_activate_fallback():
-                            retry_count = 0
-                            continue
                        self._dump_api_request_debug(
                            api_kwargs, reason="non_retryable_client_error", error=api_error,
                        )
@@ -3784,10 +3590,6 @@ class AIAgent:
                        }

                    if retry_count >= max_retries:
-                        # Try fallback before giving up entirely
-                        if self._try_activate_fallback():
-                            retry_count = 0
-                            continue
                        print(f"{self.log_prefix}❌ Max retries ({max_retries}) exceeded. Giving up.")
                        logging.error(f"{self.log_prefix}API call failed after {max_retries} retries. Last error: {api_error}")
                        logging.error(f"{self.log_prefix}Request details - Messages: {len(api_messages)}, Approx tokens: {approx_tokens:,}")
@@ -3808,7 +3610,7 @@ class AIAgent:
                            self._persist_session(messages, conversation_history)
                            self.clear_interrupt()
                            return {
-                                "final_response": f"Operation interrupted: retrying API call after error (retry {retry_count}/{max_retries}).",
+                                "final_response": "Operation interrupted.",
                                "messages": messages,
                                "api_calls": api_call_count,
                                "completed": False,
@@ -492,23 +492,9 @@ install_system_packages() {
                        return 0
                    fi
                fi
-            elif [ -e /dev/tty ]; then
-                # Non-interactive (e.g. curl | bash) but a terminal is available.
-                # Read the prompt from /dev/tty (same approach the setup wizard uses).
-                echo ""
-                log_info "Installing ${description} requires sudo."
-                read -p "Install? [Y/n] " -n 1 -r < /dev/tty
-                echo
-                if [[ $REPLY =~ ^[Yy]$ ]] || [[ -z $REPLY ]]; then
-                    if sudo DEBIAN_FRONTEND=noninteractive NEEDRESTART_MODE=a $install_cmd < /dev/tty; then
-                        [ "$need_ripgrep" = true ] && HAS_RIPGREP=true && log_success "ripgrep installed"
-                        [ "$need_ffmpeg" = true ]  && HAS_FFMPEG=true  && log_success "ffmpeg installed"
-                        return 0
-                    fi
-                fi
            else
-                log_warn "Non-interactive mode and no terminal available — cannot install system packages"
-                log_info "Install manually after setup completes: sudo $install_cmd"
+                log_warn "Non-interactive mode: cannot prompt for sudo password"
+                log_info "Install missing packages manually: sudo $install_cmd"
            fi
        fi
    fi
@@ -1,7 +1,7 @@
 ---
 name: ascii-art
-description: Generate ASCII art using pyfiglet (571 fonts), cowsay, boxes, toilet, image-to-ascii, remote APIs (asciified, ascii.co.uk), and LLM fallback. No API keys required.
-version: 4.0.0
+description: Generate ASCII art using pyfiglet (571 fonts), cowsay, boxes, toilet, image-to-ascii conversion, and search curated art from emojicombos.com and asciiart.eu (11,000+ artworks). Falls back to LLM-generated art.
+version: 3.1.0
 author: 0xbyt4, Hermes Agent
 license: MIT
 dependencies: []
@@ -14,9 +14,9 @@ metadata:

 # ASCII Art Skill

-Multiple tools for different ASCII art needs. All tools are local CLI programs or free REST APIs — no API keys required.
+Multiple tools for different ASCII art needs. All tools are local CLI programs — no API keys required.

-## Tool 1: Text Banners (pyfiglet — local)
+## Tool 1: Text Banners (pyfiglet)

 Render text as large ASCII art banners. 571 built-in fonts.

@@ -53,35 +53,7 @@ python3 -m pyfiglet --list_fonts             # List all 571 fonts
 - Short text (1-8 chars) works best with detailed fonts like `doom` or `block`
 - Long text works better with compact fonts like `small` or `mini`

-## Tool 2: Text Banners (asciified API — remote, no install)
-
-Free REST API that converts text to ASCII art. 250+ FIGlet fonts. Returns plain text directly — no parsing needed. Use this when pyfiglet is not installed or as a quick alternative.
-
-### Usage (via terminal curl)
-
-```bash
-# Basic text banner (default font)
-curl -s "https://asciified.thelicato.io/api/v2/ascii?text=Hello+World"
-
-# With a specific font
-curl -s "https://asciified.thelicato.io/api/v2/ascii?text=Hello&font=Slant"
-curl -s "https://asciified.thelicato.io/api/v2/ascii?text=Hello&font=Doom"
-curl -s "https://asciified.thelicato.io/api/v2/ascii?text=Hello&font=Star+Wars"
-curl -s "https://asciified.thelicato.io/api/v2/ascii?text=Hello&font=3-D"
-curl -s "https://asciified.thelicato.io/api/v2/ascii?text=Hello&font=Banner3"
-
-# List all available fonts (returns JSON array)
-curl -s "https://asciified.thelicato.io/api/v2/fonts"
-```
-
-### Tips
-
- URL-encode spaces as `+` in the text parameter
- The response is plain text ASCII art — no JSON wrapping, ready to display
- Font names are case-sensitive; use the fonts endpoint to get exact names
- Works from any terminal with curl — no Python or pip needed
-
-## Tool 3: Cowsay (Message Art)
+## Tool 2: Cowsay (Message Art)

 Classic tool that wraps text in a speech bubble with an ASCII character.

@@ -125,7 +97,7 @@ cowsay -e "OO" "Msg"   # Custom eyes
 cowsay -T "U " "Msg"   # Custom tongue
 ```

-## Tool 4: Boxes (Decorative Borders)
+## Tool 3: Boxes (Decorative Borders)

 Draw decorative ASCII art borders/frames around any text. 70+ built-in designs.

@@ -152,15 +124,13 @@ echo "Hello World" | boxes -a c               # Center text
 boxes -l                                       # List all 70+ designs
 ```

-### Combine with pyfiglet or asciified
+### Combine with pyfiglet

 ```bash
 python3 -m pyfiglet "HERMES" -f slant | boxes -d stone
-# Or without pyfiglet installed:
-curl -s "https://asciified.thelicato.io/api/v2/ascii?text=HERMES&font=Slant" | boxes -d stone
 ```

-## Tool 5: TOIlet (Colored Text Art)
+## Tool 4: TOIlet (Colored Text Art)

 Like pyfiglet but with ANSI color effects and visual filters. Great for terminal eye candy.

@@ -190,14 +160,14 @@ toilet -F list                          # List available filters

 **Note**: toilet outputs ANSI escape codes for colors — works in terminals but may not render in all contexts (e.g., plain text files, some chat platforms).

-## Tool 6: Image to ASCII Art
+## Tool 5: Image to ASCII Art

 Convert images (PNG, JPEG, GIF, WEBP) to ASCII art.

 ### Option A: ascii-image-converter (recommended, modern)

 ```bash
-# Install
+# Install via snap or Go
 sudo snap install ascii-image-converter
 # OR: go install github.com/TheZoraiz/ascii-image-converter@latest
 ```
@@ -220,77 +190,63 @@ jp2a --width=80 image.jpg
 jp2a --colors image.jpg              # Colorized
 ```

-## Tool 7: Search Pre-Made ASCII Art
+## Tool 6: Search Pre-Made ASCII Art (Web APIs)

-Search curated ASCII art from the web. Use `terminal` with `curl`.
+Search curated ASCII art databases via `web_extract`. No API keys needed.

-### Source A: ascii.co.uk (recommended for pre-made art)
+### Source A: emojicombos.com (recommended first)

-Large collection of classic ASCII art organized by subject. Art is inside HTML `<pre>` tags. Fetch the page with curl, then extract art with a small Python snippet.
+Huge collection of ASCII art, dot art, kaomoji, and emoji combos. Modern, meme-aware, user-submitted content. Great for pop culture, animals, objects, aesthetics.

-**URL pattern:** `https://ascii.co.uk/art/{subject}`
+**URL pattern:** `https://emojicombos.com/{term}-ascii-art`

-**Step 1 — Fetch the page:**
-
-```bash
-curl -s 'https://ascii.co.uk/art/cat' -o /tmp/ascii_art.html
 ```
-
-**Step 2 — Extract art from pre tags:**
-
-```python
-import re, html
-with open('/tmp/ascii_art.html') as f:
-    text = f.read()
-arts = re.findall(r'<pre[^>]*>(.*?)</pre>', text, re.DOTALL)
-for art in arts:
-    clean = re.sub(r'<[^>]+>', '', art)
-    clean = html.unescape(clean).strip()
-    if len(clean) > 30:
-        print(clean)
-        print('\n---\n')
+web_extract(urls=["https://emojicombos.com/cat-ascii-art"])
+web_extract(urls=["https://emojicombos.com/rocket-ascii-art"])
+web_extract(urls=["https://emojicombos.com/dragon-ascii-art"])
+web_extract(urls=["https://emojicombos.com/skull-ascii-art"])
+web_extract(urls=["https://emojicombos.com/heart-ascii-art"])
 ```

-**Available subjects** (use as URL path):
- Animals: `cat`, `dog`, `horse`, `bird`, `fish`, `dragon`, `snake`, `rabbit`, `elephant`, `dolphin`, `butterfly`, `owl`, `wolf`, `bear`, `penguin`, `turtle`
- Objects: `car`, `ship`, `airplane`, `rocket`, `guitar`, `computer`, `coffee`, `beer`, `cake`, `house`, `castle`, `sword`, `crown`, `key`
- Nature: `tree`, `flower`, `sun`, `moon`, `star`, `mountain`, `ocean`, `rainbow`
- Characters: `skull`, `robot`, `angel`, `wizard`, `pirate`, `ninja`, `alien`
- Holidays: `christmas`, `halloween`, `valentine`
-
 **Tips:**
- Preserve artist signatures/initials — important etiquette
- Multiple art pieces per page — pick the best one for the user
- Works reliably via curl, no JavaScript needed
+- Use hyphenated search terms: `hello-kitty-ascii-art`, `star-wars-ascii-art`
+- Returns a mix of classic ASCII, Braille dot art, and kaomoji — pick the best style for the user
+- Includes modern meme art and pop culture references
+- Great for kaomoji/emoticons too: `https://emojicombos.com/cat-kaomoji`

-### Source B: GitHub Octocat API (fun easter egg)
+### Source B: asciiart.eu (classic archive)

-Returns a random GitHub Octocat with a wise quote. No auth needed.
+11,000+ classic ASCII artworks organized by category. More traditional/vintage art.
+
+**Browse by category** (use as URL paths):
+- `animals/cats`, `animals/dogs`, `animals/birds`, `animals/horses`
+- `animals/dolphins`, `animals/dragons`, `animals/insects`
+- `space/rockets`, `space/stars`, `space/planets`
+- `vehicles/cars`, `vehicles/ships`, `vehicles/airplanes`
+- `food-and-drinks/coffee`, `food-and-drinks/beer`
+- `computers/computers`, `electronics/robots`
+- `art-and-design/hearts`, `art-and-design/skulls`
+- `plants/flowers`, `plants/trees`
+- `mythology/dragons`, `mythology/unicorns`
+
+```
+web_extract(urls=["https://www.asciiart.eu/animals/cats"])
+web_extract(urls=["https://www.asciiart.eu/search?q=rocket"])
+```
+
+**Tips:**
+- Preserve artist initials/signatures (e.g., `jgs`, `hjw`) — this is important etiquette
+- Better for classic/vintage ASCII art style
+
+### Source C: GitHub Octocat API (fun easter egg)
+
+Returns a random GitHub Octocat with a quote. No auth needed.

 ```bash
 curl -s https://api.github.com/octocat
 ```

-## Tool 8: Fun ASCII Utilities (via curl)
-
-These free services return ASCII art directly — great for fun extras.
-
-### QR Codes as ASCII Art
-
-```bash
-curl -s "qrenco.de/Hello+World"
-curl -s "qrenco.de/https://example.com"
-```
-
-### Weather as ASCII Art
-
-```bash
-curl -s "wttr.in/London"          # Full weather report with ASCII graphics
-curl -s "wttr.in/Moon"            # Moon phase in ASCII art
-curl -s "v2.wttr.in/London"       # Detailed version
-```
-
-## Tool 9: LLM-Generated Custom Art (Fallback)
+## Tool 7: LLM-Generated Custom Art (Fallback)

 When tools above don't have what's needed, generate ASCII art directly using these Unicode characters:

@@ -308,14 +264,28 @@ When tools above don't have what's needed, generate ASCII art directly using the
 - Max height: 15 lines for banners, 25 for scenes
 - Monospace only: output must render correctly in fixed-width fonts

+## Fun Extras
+
+### Star Wars in ASCII (via telnet)
+
+```bash
+telnet towel.blinkenlights.nl
+```
+
+### Useful Resources
+
+- [asciiart.eu](https://www.asciiart.eu/) — 11,000+ artworks, searchable
+- [patorjk.com/software/taag](http://patorjk.com/software/taag/) — Web-based text-to-ASCII with font preview
+- [asciiflow.com](http://asciiflow.com/) — Interactive ASCII diagram editor (browser)
+- [awesome-ascii-art](https://github.com/moul/awesome-ascii-art) — Curated resource list
+
 ## Decision Flow

-1. **Text as a banner** → pyfiglet if installed, otherwise asciified API via curl
+1. **Text as a banner** → pyfiglet (or toilet for colored output)
 2. **Wrap a message in fun character art** → cowsay
-3. **Add decorative border/frame** → boxes (can combine with pyfiglet/asciified)
-4. **Art of a specific thing** (cat, rocket, dragon) → ascii.co.uk via curl + parsing
-5. **Convert an image to ASCII** → ascii-image-converter or jp2a
-6. **QR code** → qrenco.de via curl
-7. **Weather/moon art** → wttr.in via curl
-8. **Something custom/creative** → LLM generation with Unicode palette
-9. **Any tool not installed** → install it, or fall back to next option
+3. **Add decorative border/frame** → boxes (can combine with pyfiglet)
+4. **Art of a thing** (cat, rocket, dragon) → emojicombos.com first, then asciiart.eu
+5. **Kaomoji / emoticons** → emojicombos.com (`{term}-kaomoji`)
+6. **Convert an image to ASCII** → ascii-image-converter or jp2a
+7. **Something custom/creative** → LLM generation with Unicode palette
+8. **Any tool not installed** → install it, or fall back to next option
@@ -1,162 +0,0 @@
---
-name: dogfood
-description: Systematic exploratory QA testing of web applications — find bugs, capture evidence, and generate structured reports
-version: 1.0.0
-metadata:
-  hermes:
-    tags: [qa, testing, browser, web, dogfood]
-    related_skills: []
---
-
-# Dogfood: Systematic Web Application QA Testing
-
-## Overview
-
-This skill guides you through systematic exploratory QA testing of web applications using the browser toolset. You will navigate the application, interact with elements, capture evidence of issues, and produce a structured bug report.
-
-## Prerequisites
-
- Browser toolset must be available (`browser_navigate`, `browser_snapshot`, `browser_click`, `browser_type`, `browser_vision`, `browser_console`, `browser_scroll`, `browser_back`, `browser_press`, `browser_close`)
- A target URL and testing scope from the user
-
-## Inputs
-
-The user provides:
-1. **Target URL** — the entry point for testing
-2. **Scope** — what areas/features to focus on (or "full site" for comprehensive testing)
-3. **Output directory** (optional) — where to save screenshots and the report (default: `./dogfood-output`)
-
-## Workflow
-
-Follow this 5-phase systematic workflow:
-
-### Phase 1: Plan
-
-1. Create the output directory structure:
-   ```
-   {output_dir}/
-   ├── screenshots/       # Evidence screenshots
-   └── report.md          # Final report (generated in Phase 5)
-   ```
-2. Identify the testing scope based on user input.
-3. Build a rough sitemap by planning which pages and features to test:
-   - Landing/home page
-   - Navigation links (header, footer, sidebar)
-   - Key user flows (sign up, login, search, checkout, etc.)
-   - Forms and interactive elements
-   - Edge cases (empty states, error pages, 404s)
-
-### Phase 2: Explore
-
-For each page or feature in your plan:
-
-1. **Navigate** to the page:
-   ```
-   browser_navigate(url="https://example.com/page")
-   ```
-
-2. **Take a snapshot** to understand the DOM structure:
-   ```
-   browser_snapshot()
-   ```
-
-3. **Check the console** for JavaScript errors:
-   ```
-   browser_console(clear=true)
-   ```
-   Do this after every navigation and after every significant interaction. Silent JS errors are high-value findings.
-
-4. **Take an annotated screenshot** to visually assess the page and identify interactive elements:
-   ```
-   browser_vision(question="Describe the page layout, identify any visual issues, broken elements, or accessibility concerns", annotate=true)
-   ```
-   The `annotate=true` flag overlays numbered `[N]` labels on interactive elements. Each `[N]` maps to ref `@eN` for subsequent browser commands.
-
-5. **Test interactive elements** systematically:
-   - Click buttons and links: `browser_click(ref="@eN")`
-   - Fill forms: `browser_type(ref="@eN", text="test input")`
-   - Test keyboard navigation: `browser_press(key="Tab")`, `browser_press(key="Enter")`
-   - Scroll through content: `browser_scroll(direction="down")`
-   - Test form validation with invalid inputs
-   - Test empty submissions
-
-6. **After each interaction**, check for:
-   - Console errors: `browser_console()`
-   - Visual changes: `browser_vision(question="What changed after the interaction?")`
-   - Expected vs actual behavior
-
-### Phase 3: Collect Evidence
-
-For every issue found:
-
-1. **Take a screenshot** showing the issue:
-   ```
-   browser_vision(question="Capture and describe the issue visible on this page", annotate=false)
-   ```
-   Save the `screenshot_path` from the response — you will reference it in the report.
-
-2. **Record the details**:
-   - URL where the issue occurs
-   - Steps to reproduce
-   - Expected behavior
-   - Actual behavior
-   - Console errors (if any)
-   - Screenshot path
-
-3. **Classify the issue** using the issue taxonomy (see `references/issue-taxonomy.md`):
-   - Severity: Critical / High / Medium / Low
-   - Category: Functional / Visual / Accessibility / Console / UX / Content
-
-### Phase 4: Categorize
-
-1. Review all collected issues.
-2. De-duplicate — merge issues that are the same bug manifesting in different places.
-3. Assign final severity and category to each issue.
-4. Sort by severity (Critical first, then High, Medium, Low).
-5. Count issues by severity and category for the executive summary.
-
-### Phase 5: Report
-
-Generate the final report using the template at `templates/dogfood-report-template.md`.
-
-The report must include:
-1. **Executive summary** with total issue count, breakdown by severity, and testing scope
-2. **Per-issue sections** with:
-   - Issue number and title
-   - Severity and category badges
-   - URL where observed
-   - Description of the issue
-   - Steps to reproduce
-   - Expected vs actual behavior
-   - Screenshot references (use `MEDIA:<screenshot_path>` for inline images)
-   - Console errors if relevant
-3. **Summary table** of all issues
-4. **Testing notes** — what was tested, what was not, any blockers
-
-Save the report to `{output_dir}/report.md`.
-
-## Tools Reference
-
-| Tool | Purpose |
-|------|---------|
-| `browser_navigate` | Go to a URL |
-| `browser_snapshot` | Get DOM text snapshot (accessibility tree) |
-| `browser_click` | Click an element by ref (`@eN`) or text |
-| `browser_type` | Type into an input field |
-| `browser_scroll` | Scroll up/down on the page |
-| `browser_back` | Go back in browser history |
-| `browser_press` | Press a keyboard key |
-| `browser_vision` | Screenshot + AI analysis; use `annotate=true` for element labels |
-| `browser_console` | Get JS console output and errors |
-| `browser_close` | Close the browser session |
-
-## Tips
-
- **Always check `browser_console()` after navigating and after significant interactions.** Silent JS errors are among the most valuable findings.
- **Use `annotate=true` with `browser_vision`** when you need to reason about interactive element positions or when the snapshot refs are unclear.
- **Test with both valid and invalid inputs** — form validation bugs are common.
- **Scroll through long pages** — content below the fold may have rendering issues.
- **Test navigation flows** — click through multi-step processes end-to-end.
- **Check responsive behavior** by noting any layout issues visible in screenshots.
- **Don't forget edge cases**: empty states, very long text, special characters, rapid clicking.
- When reporting screenshots to the user, include `MEDIA:<screenshot_path>` so they can see the evidence inline.
@@ -1,109 +0,0 @@
-# Issue Taxonomy
-
-Use this taxonomy to classify issues found during dogfood QA testing.
-
-## Severity Levels
-
-### Critical
-The issue makes a core feature completely unusable or causes data loss.
-
-**Examples:**
- Application crashes or shows a blank white page
- Form submission silently loses user data
- Authentication is completely broken (can't log in at all)
- Payment flow fails and charges the user without completing the order
- Security vulnerability (e.g., XSS, exposed credentials in console)
-
-### High
-The issue significantly impairs functionality but a workaround may exist.
-
-**Examples:**
- A key button does nothing when clicked (but refreshing fixes it)
- Search returns no results for valid queries
- Form validation rejects valid input
- Page loads but critical content is missing or garbled
- Navigation link leads to a 404 or wrong page
- Uncaught JavaScript exceptions in the console on core pages
-
-### Medium
-The issue is noticeable and affects user experience but doesn't block core functionality.
-
-**Examples:**
- Layout is misaligned or overlapping on certain screen sections
- Images fail to load (broken image icons)
- Slow performance (visible loading delays > 3 seconds)
- Form field lacks proper validation feedback (no error message on bad input)
- Console warnings that suggest deprecated or misconfigured features
- Inconsistent styling between similar pages
-
-### Low
-Minor polish issues that don't affect functionality.
-
-**Examples:**
- Typos or grammatical errors in text content
- Minor spacing or alignment inconsistencies
- Placeholder text left in production ("Lorem ipsum")
- Favicon missing
- Console info/debug messages that shouldn't be in production
- Subtle color contrast issues that don't fail WCAG requirements
-
-## Categories
-
-### Functional
-Issues where features don't work as expected.
-
- Buttons/links that don't respond
- Forms that don't submit or submit incorrectly
- Broken user flows (can't complete a multi-step process)
- Incorrect data displayed
- Features that work partially
-
-### Visual
-Issues with the visual presentation of the page.
-
- Layout problems (overlapping elements, broken grids)
- Broken images or missing media
- Styling inconsistencies
- Responsive design failures
- Z-index issues (elements hidden behind others)
- Text overflow or truncation
-
-### Accessibility
-Issues that prevent or hinder access for users with disabilities.
-
- Missing alt text on meaningful images
- Poor color contrast (fails WCAG AA)
- Elements not reachable via keyboard navigation
- Missing form labels or ARIA attributes
- Focus indicators missing or unclear
- Screen reader incompatible content
-
-### Console
-Issues detected through JavaScript console output.
-
- Uncaught exceptions and unhandled promise rejections
- Failed network requests (4xx, 5xx errors in console)
- Deprecation warnings
- CORS errors
- Mixed content warnings (HTTP resources on HTTPS page)
- Excessive console.log output left from development
-
-### UX (User Experience)
-Issues where functionality works but the experience is poor.
-
- Confusing navigation or information architecture
- Missing loading indicators (user doesn't know something is happening)
- No feedback after user actions (e.g., button click with no visible result)
- Inconsistent interaction patterns
- Missing confirmation dialogs for destructive actions
- Poor error messages that don't help the user recover
-
-### Content
-Issues with the text, media, or information on the page.
-
- Typos and grammatical errors
- Placeholder/dummy content in production
- Outdated information
- Missing content (empty sections)
- Broken or dead links to external resources
- Incorrect or misleading labels
@@ -1,86 +0,0 @@
-# Dogfood QA Report
-
-**Target:** {target_url}
-**Date:** {date}
-**Scope:** {scope_description}
-**Tester:** Hermes Agent (automated exploratory QA)
-
---
-
-## Executive Summary
-
-| Severity | Count |
-|----------|-------|
-| 🔴 Critical | {critical_count} |
-| 🟠 High | {high_count} |
-| 🟡 Medium | {medium_count} |
-| 🔵 Low | {low_count} |
-| **Total** | **{total_count}** |
-
-**Overall Assessment:** {one_sentence_assessment}
-
---
-
-## Issues
-
-<!-- Repeat this section for each issue found, sorted by severity (Critical first) -->
-
-### Issue #{issue_number}: {issue_title}
-
-| Field | Value |
-|-------|-------|
-| **Severity** | {severity} |
-| **Category** | {category} |
-| **URL** | {url_where_found} |
-
-**Description:**
-{detailed_description_of_the_issue}
-
-**Steps to Reproduce:**
-1. {step_1}
-2. {step_2}
-3. {step_3}
-
-**Expected Behavior:**
-{what_should_happen}
-
-**Actual Behavior:**
-{what_actually_happens}
-
-**Screenshot:**
-MEDIA:{screenshot_path}
-
-**Console Errors** (if applicable):
-```
-{console_error_output}
-```
-
---
-
-<!-- End of per-issue section -->
-
-## Issues Summary Table
-
-| # | Title | Severity | Category | URL |
-|---|-------|----------|----------|-----|
-| {n} | {title} | {severity} | {category} | {url} |
-
-## Testing Coverage
-
-### Pages Tested
- {list_of_pages_visited}
-
-### Features Tested
- {list_of_features_exercised}
-
-### Not Tested / Out of Scope
- {areas_not_covered_and_why}
-
-### Blockers
- {any_issues_that_prevented_testing_certain_areas}
-
---
-
-## Notes
-
-{any_additional_observations_or_recommendations}
@@ -1,4 +1,4 @@
-"""Tests for agent.auxiliary_client resolution chain, provider overrides, and model overrides."""
+"""Tests for agent.auxiliary_client resolution chain, especially the Codex fallback."""

 import json
 import os
@@ -12,9 +12,6 @@ from agent.auxiliary_client import (
    get_vision_auxiliary_client,
    auxiliary_max_tokens_param,
    _read_codex_access_token,
-    _get_auxiliary_provider,
-    _resolve_forced_provider,
-    _resolve_auto,
 )


@@ -24,10 +21,6 @@ def _clean_env(monkeypatch):
    for key in (
        "OPENROUTER_API_KEY", "OPENAI_BASE_URL", "OPENAI_API_KEY",
        "OPENAI_MODEL", "LLM_MODEL", "NOUS_INFERENCE_BASE_URL",
-        # Per-task provider/model overrides
-        "AUXILIARY_VISION_PROVIDER", "AUXILIARY_VISION_MODEL",
-        "AUXILIARY_WEB_EXTRACT_PROVIDER", "AUXILIARY_WEB_EXTRACT_MODEL",
-        "CONTEXT_COMPRESSION_PROVIDER", "CONTEXT_COMPRESSION_MODEL",
    ):
        monkeypatch.delenv(key, raising=False)

@@ -158,230 +151,15 @@ class TestGetTextAuxiliaryClient:
        assert model is None


-class TestVisionClientFallback:
-    """Vision client auto mode only tries OpenRouter + Nous (multimodal-capable)."""
+class TestCodexNotInVisionClient:
+    """Codex fallback should NOT apply to vision tasks."""

-    def test_vision_returns_none_without_any_credentials(self):
+    def test_vision_returns_none_without_openrouter_nous(self):
        with patch("agent.auxiliary_client._read_nous_auth", return_value=None):
            client, model = get_vision_auxiliary_client()
        assert client is None
        assert model is None

-    def test_vision_auto_includes_codex(self, codex_auth_dir):
-        """Codex supports vision (gpt-5.3-codex), so auto mode should use it."""
-        with patch("agent.auxiliary_client._read_nous_auth", return_value=None), \
-             patch("agent.auxiliary_client.OpenAI"):
-            client, model = get_vision_auxiliary_client()
-        from agent.auxiliary_client import CodexAuxiliaryClient
-        assert isinstance(client, CodexAuxiliaryClient)
-        assert model == "gpt-5.3-codex"
-
-    def test_vision_auto_skips_custom_endpoint(self, monkeypatch):
-        """Custom endpoint is skipped in vision auto mode."""
-        monkeypatch.setenv("OPENAI_BASE_URL", "http://localhost:1234/v1")
-        monkeypatch.setenv("OPENAI_API_KEY", "local-key")
-        with patch("agent.auxiliary_client._read_nous_auth", return_value=None):
-            client, model = get_vision_auxiliary_client()
-        assert client is None
-        assert model is None
-
-    def test_vision_uses_openrouter_when_available(self, monkeypatch):
-        monkeypatch.setenv("OPENROUTER_API_KEY", "or-key")
-        with patch("agent.auxiliary_client.OpenAI") as mock_openai:
-            client, model = get_vision_auxiliary_client()
-        assert model == "google/gemini-3-flash-preview"
-        assert client is not None
-
-    def test_vision_uses_nous_when_available(self, monkeypatch):
-        with patch("agent.auxiliary_client._read_nous_auth") as mock_nous, \
-             patch("agent.auxiliary_client.OpenAI"):
-            mock_nous.return_value = {"access_token": "nous-tok"}
-            client, model = get_vision_auxiliary_client()
-        assert model == "gemini-3-flash"
-        assert client is not None
-
-    def test_vision_forced_main_uses_custom_endpoint(self, monkeypatch):
-        """When explicitly forced to 'main', vision CAN use custom endpoint."""
-        monkeypatch.setenv("AUXILIARY_VISION_PROVIDER", "main")
-        monkeypatch.setenv("OPENAI_BASE_URL", "http://localhost:1234/v1")
-        monkeypatch.setenv("OPENAI_API_KEY", "local-key")
-        with patch("agent.auxiliary_client._read_nous_auth", return_value=None), \
-             patch("agent.auxiliary_client.OpenAI") as mock_openai:
-            client, model = get_vision_auxiliary_client()
-        assert client is not None
-        assert model == "gpt-4o-mini"
-
-    def test_vision_forced_main_returns_none_without_creds(self, monkeypatch):
-        """Forced main with no credentials still returns None."""
-        monkeypatch.setenv("AUXILIARY_VISION_PROVIDER", "main")
-        with patch("agent.auxiliary_client._read_nous_auth", return_value=None), \
-             patch("agent.auxiliary_client._read_codex_access_token", return_value=None):
-            client, model = get_vision_auxiliary_client()
-        assert client is None
-        assert model is None
-
-    def test_vision_forced_codex(self, monkeypatch, codex_auth_dir):
-        """When forced to 'codex', vision uses Codex OAuth."""
-        monkeypatch.setenv("AUXILIARY_VISION_PROVIDER", "codex")
-        with patch("agent.auxiliary_client._read_nous_auth", return_value=None), \
-             patch("agent.auxiliary_client.OpenAI"):
-            client, model = get_vision_auxiliary_client()
-        from agent.auxiliary_client import CodexAuxiliaryClient
-        assert isinstance(client, CodexAuxiliaryClient)
-        assert model == "gpt-5.3-codex"
-
-
-class TestGetAuxiliaryProvider:
-    """Tests for _get_auxiliary_provider env var resolution."""
-
-    def test_no_task_returns_auto(self):
-        assert _get_auxiliary_provider() == "auto"
-        assert _get_auxiliary_provider("") == "auto"
-
-    def test_auxiliary_prefix_takes_priority(self, monkeypatch):
-        monkeypatch.setenv("AUXILIARY_VISION_PROVIDER", "openrouter")
-        assert _get_auxiliary_provider("vision") == "openrouter"
-
-    def test_context_prefix_fallback(self, monkeypatch):
-        monkeypatch.setenv("CONTEXT_COMPRESSION_PROVIDER", "nous")
-        assert _get_auxiliary_provider("compression") == "nous"
-
-    def test_auxiliary_prefix_over_context_prefix(self, monkeypatch):
-        monkeypatch.setenv("AUXILIARY_COMPRESSION_PROVIDER", "openrouter")
-        monkeypatch.setenv("CONTEXT_COMPRESSION_PROVIDER", "nous")
-        assert _get_auxiliary_provider("compression") == "openrouter"
-
-    def test_auto_value_treated_as_auto(self, monkeypatch):
-        monkeypatch.setenv("AUXILIARY_VISION_PROVIDER", "auto")
-        assert _get_auxiliary_provider("vision") == "auto"
-
-    def test_whitespace_stripped(self, monkeypatch):
-        monkeypatch.setenv("AUXILIARY_VISION_PROVIDER", "  openrouter  ")
-        assert _get_auxiliary_provider("vision") == "openrouter"
-
-    def test_case_insensitive(self, monkeypatch):
-        monkeypatch.setenv("AUXILIARY_VISION_PROVIDER", "OpenRouter")
-        assert _get_auxiliary_provider("vision") == "openrouter"
-
-    def test_main_provider(self, monkeypatch):
-        monkeypatch.setenv("AUXILIARY_WEB_EXTRACT_PROVIDER", "main")
-        assert _get_auxiliary_provider("web_extract") == "main"
-
-
-class TestResolveForcedProvider:
-    """Tests for _resolve_forced_provider with explicit provider selection."""
-
-    def test_forced_openrouter(self, monkeypatch):
-        monkeypatch.setenv("OPENROUTER_API_KEY", "or-key")
-        with patch("agent.auxiliary_client.OpenAI") as mock_openai:
-            client, model = _resolve_forced_provider("openrouter")
-        assert model == "google/gemini-3-flash-preview"
-        assert client is not None
-
-    def test_forced_openrouter_no_key(self, monkeypatch):
-        with patch("agent.auxiliary_client._read_nous_auth", return_value=None):
-            client, model = _resolve_forced_provider("openrouter")
-        assert client is None
-        assert model is None
-
-    def test_forced_nous(self, monkeypatch):
-        with patch("agent.auxiliary_client._read_nous_auth") as mock_nous, \
-             patch("agent.auxiliary_client.OpenAI"):
-            mock_nous.return_value = {"access_token": "nous-tok"}
-            client, model = _resolve_forced_provider("nous")
-        assert model == "gemini-3-flash"
-        assert client is not None
-
-    def test_forced_nous_not_configured(self, monkeypatch):
-        with patch("agent.auxiliary_client._read_nous_auth", return_value=None):
-            client, model = _resolve_forced_provider("nous")
-        assert client is None
-        assert model is None
-
-    def test_forced_main_uses_custom(self, monkeypatch):
-        monkeypatch.setenv("OPENAI_BASE_URL", "http://local:8080/v1")
-        monkeypatch.setenv("OPENAI_API_KEY", "local-key")
-        with patch("agent.auxiliary_client._read_nous_auth", return_value=None), \
-             patch("agent.auxiliary_client.OpenAI") as mock_openai:
-            client, model = _resolve_forced_provider("main")
-        assert model == "gpt-4o-mini"
-
-    def test_forced_main_skips_openrouter_nous(self, monkeypatch):
-        """Even if OpenRouter key is set, 'main' skips it."""
-        monkeypatch.setenv("OPENROUTER_API_KEY", "or-key")
-        monkeypatch.setenv("OPENAI_BASE_URL", "http://local:8080/v1")
-        monkeypatch.setenv("OPENAI_API_KEY", "local-key")
-        with patch("agent.auxiliary_client._read_nous_auth", return_value=None), \
-             patch("agent.auxiliary_client.OpenAI") as mock_openai:
-            client, model = _resolve_forced_provider("main")
-        # Should use custom endpoint, not OpenRouter
-        assert model == "gpt-4o-mini"
-
-    def test_forced_main_falls_to_codex(self, codex_auth_dir, monkeypatch):
-        with patch("agent.auxiliary_client._read_nous_auth", return_value=None), \
-             patch("agent.auxiliary_client.OpenAI"):
-            client, model = _resolve_forced_provider("main")
-        from agent.auxiliary_client import CodexAuxiliaryClient
-        assert isinstance(client, CodexAuxiliaryClient)
-        assert model == "gpt-5.3-codex"
-
-    def test_forced_codex(self, codex_auth_dir, monkeypatch):
-        with patch("agent.auxiliary_client._read_nous_auth", return_value=None), \
-             patch("agent.auxiliary_client.OpenAI"):
-            client, model = _resolve_forced_provider("codex")
-        from agent.auxiliary_client import CodexAuxiliaryClient
-        assert isinstance(client, CodexAuxiliaryClient)
-        assert model == "gpt-5.3-codex"
-
-    def test_forced_codex_no_token(self, monkeypatch):
-        with patch("agent.auxiliary_client._read_codex_access_token", return_value=None):
-            client, model = _resolve_forced_provider("codex")
-        assert client is None
-        assert model is None
-
-    def test_forced_unknown_returns_none(self, monkeypatch):
-        with patch("agent.auxiliary_client._read_nous_auth", return_value=None), \
-             patch("agent.auxiliary_client._read_codex_access_token", return_value=None):
-            client, model = _resolve_forced_provider("invalid-provider")
-        assert client is None
-        assert model is None
-
-
-class TestTaskSpecificOverrides:
-    """Integration tests for per-task provider routing via get_text_auxiliary_client(task=...)."""
-
-    def test_text_with_vision_provider_override(self, monkeypatch):
-        """AUXILIARY_VISION_PROVIDER should not affect text tasks."""
-        monkeypatch.setenv("AUXILIARY_VISION_PROVIDER", "nous")
-        monkeypatch.setenv("OPENROUTER_API_KEY", "or-key")
-        with patch("agent.auxiliary_client.OpenAI"):
-            client, model = get_text_auxiliary_client()  # no task → auto
-        assert model == "google/gemini-3-flash-preview"  # OpenRouter, not Nous
-
-    def test_compression_task_reads_context_prefix(self, monkeypatch):
-        """Compression task should check CONTEXT_COMPRESSION_PROVIDER."""
-        monkeypatch.setenv("CONTEXT_COMPRESSION_PROVIDER", "nous")
-        monkeypatch.setenv("OPENROUTER_API_KEY", "or-key")  # would win in auto
-        with patch("agent.auxiliary_client._read_nous_auth") as mock_nous, \
-             patch("agent.auxiliary_client.OpenAI"):
-            mock_nous.return_value = {"access_token": "nous-tok"}
-            client, model = get_text_auxiliary_client("compression")
-        assert model == "gemini-3-flash"  # forced to Nous, not OpenRouter
-
-    def test_web_extract_task_override(self, monkeypatch):
-        monkeypatch.setenv("AUXILIARY_WEB_EXTRACT_PROVIDER", "openrouter")
-        monkeypatch.setenv("OPENROUTER_API_KEY", "or-key")
-        with patch("agent.auxiliary_client.OpenAI"):
-            client, model = get_text_auxiliary_client("web_extract")
-        assert model == "google/gemini-3-flash-preview"
-
-    def test_task_without_override_uses_auto(self, monkeypatch):
-        """A task with no provider env var falls through to auto chain."""
-        monkeypatch.setenv("OPENROUTER_API_KEY", "or-key")
-        with patch("agent.auxiliary_client.OpenAI"):
-            client, model = get_text_auxiliary_client("compression")
-        assert model == "google/gemini-3-flash-preview"  # auto → OpenRouter
-

 class TestAuxiliaryMaxTokensParam:
    def test_codex_fallback_uses_max_tokens(self, monkeypatch):
@@ -224,60 +224,6 @@ class TestCompressWithClient:
                for tc in msg["tool_calls"]:
                    assert tc["id"] in answered_ids

-    def test_summary_role_avoids_consecutive_user_messages(self):
-        """Summary role should alternate with the last head message to avoid consecutive same-role messages."""
-        mock_client = MagicMock()
-        mock_response = MagicMock()
-        mock_response.choices = [MagicMock()]
-        mock_response.choices[0].message.content = "[CONTEXT SUMMARY]: stuff happened"
-        mock_client.chat.completions.create.return_value = mock_response
-
-        with patch("agent.context_compressor.get_model_context_length", return_value=100000), \
-             patch("agent.context_compressor.get_text_auxiliary_client", return_value=(mock_client, "test-model")):
-            c = ContextCompressor(model="test", quiet_mode=True, protect_first_n=2, protect_last_n=2)
-
-        # Last head message (index 1) is "assistant" → summary should be "user"
-        msgs = [
-            {"role": "user", "content": "msg 0"},
-            {"role": "assistant", "content": "msg 1"},
-            {"role": "user", "content": "msg 2"},
-            {"role": "assistant", "content": "msg 3"},
-            {"role": "user", "content": "msg 4"},
-            {"role": "assistant", "content": "msg 5"},
-        ]
-        result = c.compress(msgs)
-        summary_msg = [m for m in result if "CONTEXT SUMMARY" in (m.get("content") or "")]
-        assert len(summary_msg) == 1
-        assert summary_msg[0]["role"] == "user"
-
-    def test_summary_role_avoids_consecutive_user_when_head_ends_with_user(self):
-        """When last head message is 'user', summary must be 'assistant' to avoid two consecutive user messages."""
-        mock_client = MagicMock()
-        mock_response = MagicMock()
-        mock_response.choices = [MagicMock()]
-        mock_response.choices[0].message.content = "[CONTEXT SUMMARY]: stuff happened"
-        mock_client.chat.completions.create.return_value = mock_response
-
-        with patch("agent.context_compressor.get_model_context_length", return_value=100000), \
-             patch("agent.context_compressor.get_text_auxiliary_client", return_value=(mock_client, "test-model")):
-            c = ContextCompressor(model="test", quiet_mode=True, protect_first_n=3, protect_last_n=2)
-
-        # Last head message (index 2) is "user" → summary should be "assistant"
-        msgs = [
-            {"role": "system", "content": "system prompt"},
-            {"role": "user", "content": "msg 1"},
-            {"role": "user", "content": "msg 2"},  # last head — user
-            {"role": "assistant", "content": "msg 3"},
-            {"role": "user", "content": "msg 4"},
-            {"role": "assistant", "content": "msg 5"},
-            {"role": "user", "content": "msg 6"},
-            {"role": "assistant", "content": "msg 7"},
-        ]
-        result = c.compress(msgs)
-        summary_msg = [m for m in result if "CONTEXT SUMMARY" in (m.get("content") or "")]
-        assert len(summary_msg) == 1
-        assert summary_msg[0]["role"] == "assistant"
-
    def test_summarization_does_not_start_tail_with_tool_outputs(self):
        mock_client = MagicMock()
        mock_response = MagicMock()
@@ -1,200 +0,0 @@
-"""Tests for /resume gateway slash command.
-
-Tests the _handle_resume_command handler (switch to a previously-named session)
-across gateway messenger platforms.
-"""
-
-from unittest.mock import MagicMock, AsyncMock
-
-import pytest
-
-from gateway.config import Platform
-from gateway.platforms.base import MessageEvent
-from gateway.session import SessionSource, build_session_key
-
-
-def _make_event(text="/resume", platform=Platform.TELEGRAM,
-                user_id="12345", chat_id="67890"):
-    """Build a MessageEvent for testing."""
-    source = SessionSource(
-        platform=platform,
-        user_id=user_id,
-        chat_id=chat_id,
-        user_name="testuser",
-    )
-    return MessageEvent(text=text, source=source)
-
-
-def _session_key_for_event(event):
-    """Get the session key that build_session_key produces for an event."""
-    return build_session_key(event.source)
-
-
-def _make_runner(session_db=None, current_session_id="current_session_001",
-                 event=None):
-    """Create a bare GatewayRunner with a mock session_store and optional session_db."""
-    from gateway.run import GatewayRunner
-    runner = object.__new__(GatewayRunner)
-    runner.adapters = {}
-    runner._session_db = session_db
-    runner._running_agents = {}
-
-    # Compute the real session key if an event is provided
-    session_key = build_session_key(event.source) if event else "agent:main:telegram:dm"
-
-    # Mock session_store that returns a session entry with a known session_id
-    mock_session_entry = MagicMock()
-    mock_session_entry.session_id = current_session_id
-    mock_session_entry.session_key = session_key
-    mock_store = MagicMock()
-    mock_store.get_or_create_session.return_value = mock_session_entry
-    mock_store.load_transcript.return_value = []
-    mock_store.switch_session.return_value = mock_session_entry
-    runner.session_store = mock_store
-
-    # Stub out memory flushing
-    runner._async_flush_memories = AsyncMock()
-
-    return runner
-
-
-# ---------------------------------------------------------------------------
-# _handle_resume_command
-# ---------------------------------------------------------------------------
-
-
-class TestHandleResumeCommand:
-    """Tests for GatewayRunner._handle_resume_command."""
-
-    @pytest.mark.asyncio
-    async def test_no_session_db(self):
-        """Returns error when session database is unavailable."""
-        runner = _make_runner(session_db=None)
-        event = _make_event(text="/resume My Project")
-        result = await runner._handle_resume_command(event)
-        assert "not available" in result.lower()
-
-    @pytest.mark.asyncio
-    async def test_list_named_sessions_when_no_arg(self, tmp_path):
-        """With no argument, lists recently titled sessions."""
-        from hermes_state import SessionDB
-        db = SessionDB(db_path=tmp_path / "state.db")
-        db.create_session("sess_001", "telegram")
-        db.create_session("sess_002", "telegram")
-        db.set_session_title("sess_001", "Research")
-        db.set_session_title("sess_002", "Coding")
-
-        event = _make_event(text="/resume")
-        runner = _make_runner(session_db=db, event=event)
-        result = await runner._handle_resume_command(event)
-        assert "Research" in result
-        assert "Coding" in result
-        assert "Named Sessions" in result
-        db.close()
-
-    @pytest.mark.asyncio
-    async def test_list_shows_usage_when_no_titled(self, tmp_path):
-        """With no arg and no titled sessions, shows instructions."""
-        from hermes_state import SessionDB
-        db = SessionDB(db_path=tmp_path / "state.db")
-        db.create_session("sess_001", "telegram")  # No title
-
-        event = _make_event(text="/resume")
-        runner = _make_runner(session_db=db, event=event)
-        result = await runner._handle_resume_command(event)
-        assert "No named sessions" in result
-        assert "/title" in result
-        db.close()
-
-    @pytest.mark.asyncio
-    async def test_resume_by_name(self, tmp_path):
-        """Resolves a title and switches to that session."""
-        from hermes_state import SessionDB
-        db = SessionDB(db_path=tmp_path / "state.db")
-        db.create_session("old_session_abc", "telegram")
-        db.set_session_title("old_session_abc", "My Project")
-        db.create_session("current_session_001", "telegram")
-
-        event = _make_event(text="/resume My Project")
-        runner = _make_runner(session_db=db, current_session_id="current_session_001",
-                              event=event)
-        result = await runner._handle_resume_command(event)
-
-        assert "Resumed" in result
-        assert "My Project" in result
-        # Verify switch_session was called with the old session ID
-        runner.session_store.switch_session.assert_called_once()
-        call_args = runner.session_store.switch_session.call_args
-        assert call_args[0][1] == "old_session_abc"
-        db.close()
-
-    @pytest.mark.asyncio
-    async def test_resume_nonexistent_name(self, tmp_path):
-        """Returns error for unknown session name."""
-        from hermes_state import SessionDB
-        db = SessionDB(db_path=tmp_path / "state.db")
-        db.create_session("current_session_001", "telegram")
-
-        event = _make_event(text="/resume Nonexistent Session")
-        runner = _make_runner(session_db=db, event=event)
-        result = await runner._handle_resume_command(event)
-        assert "No session found" in result
-        db.close()
-
-    @pytest.mark.asyncio
-    async def test_resume_already_on_session(self, tmp_path):
-        """Returns friendly message when already on the requested session."""
-        from hermes_state import SessionDB
-        db = SessionDB(db_path=tmp_path / "state.db")
-        db.create_session("current_session_001", "telegram")
-        db.set_session_title("current_session_001", "Active Project")
-
-        event = _make_event(text="/resume Active Project")
-        runner = _make_runner(session_db=db, current_session_id="current_session_001",
-                              event=event)
-        result = await runner._handle_resume_command(event)
-        assert "Already on session" in result
-        db.close()
-
-    @pytest.mark.asyncio
-    async def test_resume_auto_lineage(self, tmp_path):
-        """Asking for 'My Project' when 'My Project #2' exists gets the latest."""
-        from hermes_state import SessionDB
-        db = SessionDB(db_path=tmp_path / "state.db")
-        db.create_session("sess_v1", "telegram")
-        db.set_session_title("sess_v1", "My Project")
-        db.create_session("sess_v2", "telegram")
-        db.set_session_title("sess_v2", "My Project #2")
-        db.create_session("current_session_001", "telegram")
-
-        event = _make_event(text="/resume My Project")
-        runner = _make_runner(session_db=db, current_session_id="current_session_001",
-                              event=event)
-        result = await runner._handle_resume_command(event)
-
-        assert "Resumed" in result
-        # Should resolve to #2 (latest in lineage)
-        call_args = runner.session_store.switch_session.call_args
-        assert call_args[0][1] == "sess_v2"
-        db.close()
-
-    @pytest.mark.asyncio
-    async def test_resume_clears_running_agent(self, tmp_path):
-        """Switching sessions clears any cached running agent."""
-        from hermes_state import SessionDB
-        db = SessionDB(db_path=tmp_path / "state.db")
-        db.create_session("old_session", "telegram")
-        db.set_session_title("old_session", "Old Work")
-        db.create_session("current_session_001", "telegram")
-
-        event = _make_event(text="/resume Old Work")
-        runner = _make_runner(session_db=db, current_session_id="current_session_001",
-                              event=event)
-        # Simulate a running agent using the real session key
-        real_key = _session_key_for_event(event)
-        runner._running_agents[real_key] = MagicMock()
-
-        await runner._handle_resume_command(event)
-
-        assert real_key not in runner._running_agents
-        db.close()
@@ -2,10 +2,6 @@

 Verifies that the gateway detects pathologically large transcripts and
 triggers auto-compression before running the agent.  (#628)
-
-The hygiene system uses the SAME compression config as the agent:
-  compression.threshold × model context length
-so CLI and messaging platforms behave identically.
 """

 import pytest
@@ -42,113 +38,75 @@ def _make_large_history_tokens(target_tokens: int) -> list:


 # ---------------------------------------------------------------------------
-# Detection threshold tests (model-aware, unified with compression config)
+# Detection threshold tests
 # ---------------------------------------------------------------------------

 class TestSessionHygieneThresholds:
-    """Test that the threshold logic correctly identifies large sessions.
-
-    Thresholds are derived from model context length × compression threshold,
-    matching what the agent's ContextCompressor uses.
-    """
+    """Test that the threshold logic correctly identifies large sessions."""

    def test_small_session_below_thresholds(self):
        """A 10-message session should not trigger compression."""
        history = _make_history(10)
+        msg_count = len(history)
        approx_tokens = estimate_messages_tokens_rough(history)

-        # For a 200k-context model at 85% threshold = 170k
-        context_length = 200_000
-        threshold_pct = 0.85
-        compress_token_threshold = int(context_length * threshold_pct)
+        compress_token_threshold = 100_000
+        compress_msg_threshold = 200

-        needs_compress = approx_tokens >= compress_token_threshold
+        needs_compress = (
+            approx_tokens >= compress_token_threshold
+            or msg_count >= compress_msg_threshold
+        )
        assert not needs_compress

+    def test_large_message_count_triggers(self):
+        """200+ messages should trigger compression even if tokens are low."""
+        history = _make_history(250, content_size=10)
+        msg_count = len(history)
+
+        compress_msg_threshold = 200
+        needs_compress = msg_count >= compress_msg_threshold
+        assert needs_compress
+
    def test_large_token_count_triggers(self):
-        """High token count should trigger compression when exceeding model threshold."""
-        # Build a history that exceeds 85% of a 200k model (170k tokens)
-        history = _make_large_history_tokens(180_000)
+        """High token count should trigger compression even if message count is low."""
+        # 50 messages with huge content to exceed 100K tokens
+        history = _make_history(50, content_size=10_000)
        approx_tokens = estimate_messages_tokens_rough(history)

-        context_length = 200_000
-        threshold_pct = 0.85
-        compress_token_threshold = int(context_length * threshold_pct)
-
+        compress_token_threshold = 100_000
        needs_compress = approx_tokens >= compress_token_threshold
        assert needs_compress

-    def test_under_threshold_no_trigger(self):
-        """Session under threshold should not trigger, even with many messages."""
-        # 250 short messages — lots of messages but well under token threshold
-        history = _make_history(250, content_size=10)
+    def test_under_both_thresholds_no_trigger(self):
+        """Session under both thresholds should not trigger."""
+        history = _make_history(100, content_size=100)
+        msg_count = len(history)
        approx_tokens = estimate_messages_tokens_rough(history)

-        # 200k model at 85% = 170k token threshold
-        context_length = 200_000
-        threshold_pct = 0.85
-        compress_token_threshold = int(context_length * threshold_pct)
+        compress_token_threshold = 100_000
+        compress_msg_threshold = 200

-        needs_compress = approx_tokens >= compress_token_threshold
-        assert not needs_compress, (
-            f"250 short messages (~{approx_tokens} tokens) should NOT trigger "
-            f"compression at {compress_token_threshold} token threshold"
+        needs_compress = (
+            approx_tokens >= compress_token_threshold
+            or msg_count >= compress_msg_threshold
        )
-
-    def test_message_count_alone_does_not_trigger(self):
-        """Message count alone should NOT trigger — only token count matters.
-
-        The old system used an OR of token-count and message-count thresholds,
-        which caused premature compression in tool-heavy sessions with 200+
-        messages but low total tokens.
-        """
-        # 300 very short messages — old system would compress, new should not
-        history = _make_history(300, content_size=10)
-        approx_tokens = estimate_messages_tokens_rough(history)
-
-        context_length = 200_000
-        threshold_pct = 0.85
-        compress_token_threshold = int(context_length * threshold_pct)
-
-        # Token-based check only
-        needs_compress = approx_tokens >= compress_token_threshold
        assert not needs_compress

-    def test_threshold_scales_with_model(self):
-        """Different models should have different compression thresholds."""
-        # 128k model at 85% = 108,800 tokens
-        small_model_threshold = int(128_000 * 0.85)
-        # 200k model at 85% = 170,000 tokens
-        large_model_threshold = int(200_000 * 0.85)
-        # 1M model at 85% = 850,000 tokens
-        huge_model_threshold = int(1_000_000 * 0.85)
+    def test_custom_thresholds(self):
+        """Custom thresholds from config should be respected."""
+        history = _make_history(60, content_size=100)
+        msg_count = len(history)

-        # A session at ~120k tokens:
-        history = _make_large_history_tokens(120_000)
-        approx_tokens = estimate_messages_tokens_rough(history)
+        # Custom lower threshold
+        compress_msg_threshold = 50
+        needs_compress = msg_count >= compress_msg_threshold
+        assert needs_compress

-        # Should trigger for 128k model
-        assert approx_tokens >= small_model_threshold
-        # Should NOT trigger for 200k model
-        assert approx_tokens < large_model_threshold
-        # Should NOT trigger for 1M model
-        assert approx_tokens < huge_model_threshold
-
-    def test_custom_threshold_percentage(self):
-        """Custom threshold percentage from config should be respected."""
-        context_length = 200_000
-
-        # At 50% threshold = 100k
-        low_threshold = int(context_length * 0.50)
-        # At 90% threshold = 180k
-        high_threshold = int(context_length * 0.90)
-
-        history = _make_large_history_tokens(150_000)
-        approx_tokens = estimate_messages_tokens_rough(history)
-
-        # Should trigger at 50% but not at 90%
-        assert approx_tokens >= low_threshold
-        assert approx_tokens < high_threshold
+        # Custom higher threshold
+        compress_msg_threshold = 100
+        needs_compress = msg_count >= compress_msg_threshold
+        assert not needs_compress

    def test_minimum_message_guard(self):
        """Sessions with fewer than 4 messages should never trigger."""
@@ -159,19 +117,18 @@ class TestSessionHygieneThresholds:


 class TestSessionHygieneWarnThreshold:
-    """Test the post-compression warning threshold (95% of context)."""
+    """Test the post-compression warning threshold."""

    def test_warn_when_still_large(self):
-        """If compressed result is still above 95% of context, should warn."""
-        context_length = 200_000
-        warn_threshold = int(context_length * 0.95)  # 190k
-        post_compress_tokens = 195_000
+        """If compressed result is still above warn_tokens, should warn."""
+        # Simulate post-compression tokens
+        warn_threshold = 200_000
+        post_compress_tokens = 250_000
        assert post_compress_tokens >= warn_threshold

    def test_no_warn_when_under(self):
-        """If compressed result is under 95% of context, no warning."""
-        context_length = 200_000
-        warn_threshold = int(context_length * 0.95)  # 190k
+        """If compressed result is under warn_tokens, no warning."""
+        warn_threshold = 200_000
        post_compress_tokens = 150_000
        assert post_compress_tokens < warn_threshold

@@ -193,12 +150,10 @@ class TestTokenEstimation:
        assert estimate_messages_tokens_rough(many) > estimate_messages_tokens_rough(few)

    def test_pathological_session_detected(self):
-        """The reported pathological case: 648 messages, ~299K tokens.
-
-        With a 200k model at 85% threshold (170k), this should trigger.
-        """
+        """The reported pathological case: 648 messages, ~299K tokens."""
+        # Simulate a 648-message session averaging ~460 tokens per message
        history = _make_history(648, content_size=1800)
        tokens = estimate_messages_tokens_rough(history)
-        # Should be well above the 170K threshold for a 200k model
-        threshold = int(200_000 * 0.85)
-        assert tokens > threshold
+        # Should be well above the 100K default threshold
+        assert tokens > 100_000
+        assert len(history) > 200
@@ -1,294 +0,0 @@
-"""Tests for Signal messenger platform adapter."""
-import json
-import pytest
-from unittest.mock import MagicMock, patch, AsyncMock
-
-from gateway.config import Platform, PlatformConfig
-
-
-# ---------------------------------------------------------------------------
-# Platform & Config
-# ---------------------------------------------------------------------------
-
-class TestSignalPlatformEnum:
-    def test_signal_enum_exists(self):
-        assert Platform.SIGNAL.value == "signal"
-
-    def test_signal_in_platform_list(self):
-        platforms = [p.value for p in Platform]
-        assert "signal" in platforms
-
-
-class TestSignalConfigLoading:
-    def test_apply_env_overrides_signal(self, monkeypatch):
-        monkeypatch.setenv("SIGNAL_HTTP_URL", "http://localhost:9090")
-        monkeypatch.setenv("SIGNAL_ACCOUNT", "+15551234567")
-
-        from gateway.config import GatewayConfig, _apply_env_overrides
-        config = GatewayConfig()
-        _apply_env_overrides(config)
-
-        assert Platform.SIGNAL in config.platforms
-        sc = config.platforms[Platform.SIGNAL]
-        assert sc.enabled is True
-        assert sc.extra["http_url"] == "http://localhost:9090"
-        assert sc.extra["account"] == "+15551234567"
-
-    def test_signal_not_loaded_without_both_vars(self, monkeypatch):
-        monkeypatch.setenv("SIGNAL_HTTP_URL", "http://localhost:9090")
-        # No SIGNAL_ACCOUNT
-
-        from gateway.config import GatewayConfig, _apply_env_overrides
-        config = GatewayConfig()
-        _apply_env_overrides(config)
-
-        assert Platform.SIGNAL not in config.platforms
-
-    def test_connected_platforms_includes_signal(self, monkeypatch):
-        monkeypatch.setenv("SIGNAL_HTTP_URL", "http://localhost:8080")
-        monkeypatch.setenv("SIGNAL_ACCOUNT", "+15551234567")
-
-        from gateway.config import GatewayConfig, _apply_env_overrides
-        config = GatewayConfig()
-        _apply_env_overrides(config)
-
-        connected = config.get_connected_platforms()
-        assert Platform.SIGNAL in connected
-
-
-# ---------------------------------------------------------------------------
-# Adapter Init & Helpers
-# ---------------------------------------------------------------------------
-
-class TestSignalAdapterInit:
-    def _make_config(self, **extra):
-        config = PlatformConfig()
-        config.enabled = True
-        config.extra = {
-            "http_url": "http://localhost:8080",
-            "account": "+15551234567",
-            **extra,
-        }
-        return config
-
-    def test_init_parses_config(self, monkeypatch):
-        monkeypatch.setenv("SIGNAL_GROUP_ALLOWED_USERS", "group123,group456")
-
-        from gateway.platforms.signal import SignalAdapter
-        adapter = SignalAdapter(self._make_config())
-
-        assert adapter.http_url == "http://localhost:8080"
-        assert adapter.account == "+15551234567"
-        assert "group123" in adapter.group_allow_from
-
-    def test_init_empty_allowlist(self, monkeypatch):
-        monkeypatch.setenv("SIGNAL_GROUP_ALLOWED_USERS", "")
-
-        from gateway.platforms.signal import SignalAdapter
-        adapter = SignalAdapter(self._make_config())
-
-        assert len(adapter.group_allow_from) == 0
-
-    def test_init_strips_trailing_slash(self, monkeypatch):
-        monkeypatch.setenv("SIGNAL_GROUP_ALLOWED_USERS", "")
-
-        from gateway.platforms.signal import SignalAdapter
-        adapter = SignalAdapter(self._make_config(http_url="http://localhost:8080/"))
-
-        assert adapter.http_url == "http://localhost:8080"
-
-    def test_self_message_filtering(self, monkeypatch):
-        monkeypatch.setenv("SIGNAL_GROUP_ALLOWED_USERS", "")
-
-        from gateway.platforms.signal import SignalAdapter
-        adapter = SignalAdapter(self._make_config())
-
-        assert adapter._account_normalized == "+15551234567"
-
-
-class TestSignalHelpers:
-    def test_redact_phone_long(self):
-        from gateway.platforms.signal import _redact_phone
-        assert _redact_phone("+15551234567") == "+155****4567"
-
-    def test_redact_phone_short(self):
-        from gateway.platforms.signal import _redact_phone
-        assert _redact_phone("+12345") == "+1****45"
-
-    def test_redact_phone_empty(self):
-        from gateway.platforms.signal import _redact_phone
-        assert _redact_phone("") == "<none>"
-
-    def test_parse_comma_list(self):
-        from gateway.platforms.signal import _parse_comma_list
-        assert _parse_comma_list("+1234, +5678 , +9012") == ["+1234", "+5678", "+9012"]
-        assert _parse_comma_list("") == []
-        assert _parse_comma_list("  ,  ,  ") == []
-
-    def test_guess_extension_png(self):
-        from gateway.platforms.signal import _guess_extension
-        assert _guess_extension(b"\x89PNG\r\n\x1a\n" + b"\x00" * 100) == ".png"
-
-    def test_guess_extension_jpeg(self):
-        from gateway.platforms.signal import _guess_extension
-        assert _guess_extension(b"\xff\xd8\xff\xe0" + b"\x00" * 100) == ".jpg"
-
-    def test_guess_extension_pdf(self):
-        from gateway.platforms.signal import _guess_extension
-        assert _guess_extension(b"%PDF-1.4" + b"\x00" * 100) == ".pdf"
-
-    def test_guess_extension_zip(self):
-        from gateway.platforms.signal import _guess_extension
-        assert _guess_extension(b"PK\x03\x04" + b"\x00" * 100) == ".zip"
-
-    def test_guess_extension_mp4(self):
-        from gateway.platforms.signal import _guess_extension
-        assert _guess_extension(b"\x00\x00\x00\x18ftypisom" + b"\x00" * 100) == ".mp4"
-
-    def test_guess_extension_unknown(self):
-        from gateway.platforms.signal import _guess_extension
-        assert _guess_extension(b"\x00\x01\x02\x03" * 10) == ".bin"
-
-    def test_is_image_ext(self):
-        from gateway.platforms.signal import _is_image_ext
-        assert _is_image_ext(".png") is True
-        assert _is_image_ext(".jpg") is True
-        assert _is_image_ext(".gif") is True
-        assert _is_image_ext(".pdf") is False
-
-    def test_is_audio_ext(self):
-        from gateway.platforms.signal import _is_audio_ext
-        assert _is_audio_ext(".mp3") is True
-        assert _is_audio_ext(".ogg") is True
-        assert _is_audio_ext(".png") is False
-
-    def test_check_requirements(self, monkeypatch):
-        from gateway.platforms.signal import check_signal_requirements
-        monkeypatch.setenv("SIGNAL_HTTP_URL", "http://localhost:8080")
-        monkeypatch.setenv("SIGNAL_ACCOUNT", "+15551234567")
-        assert check_signal_requirements() is True
-
-    def test_render_mentions(self):
-        from gateway.platforms.signal import _render_mentions
-        text = "Hello \uFFFC, how are you?"
-        mentions = [{"start": 6, "length": 1, "number": "+15559999999"}]
-        result = _render_mentions(text, mentions)
-        assert "@+15559999999" in result
-        assert "\uFFFC" not in result
-
-    def test_render_mentions_no_mentions(self):
-        from gateway.platforms.signal import _render_mentions
-        text = "Hello world"
-        result = _render_mentions(text, [])
-        assert result == "Hello world"
-
-    def test_check_requirements_missing(self, monkeypatch):
-        from gateway.platforms.signal import check_signal_requirements
-        monkeypatch.delenv("SIGNAL_HTTP_URL", raising=False)
-        monkeypatch.delenv("SIGNAL_ACCOUNT", raising=False)
-        assert check_signal_requirements() is False
-
-
-# ---------------------------------------------------------------------------
-# Session Source
-# ---------------------------------------------------------------------------
-
-class TestSignalSessionSource:
-    def test_session_source_alt_fields(self):
-        from gateway.session import SessionSource
-        source = SessionSource(
-            platform=Platform.SIGNAL,
-            chat_id="+15551234567",
-            user_id="+15551234567",
-            user_id_alt="uuid:abc-123",
-            chat_id_alt=None,
-        )
-        d = source.to_dict()
-        assert d["user_id_alt"] == "uuid:abc-123"
-        assert "chat_id_alt" not in d  # None fields excluded
-
-    def test_session_source_roundtrip(self):
-        from gateway.session import SessionSource
-        source = SessionSource(
-            platform=Platform.SIGNAL,
-            chat_id="group:xyz",
-            chat_type="group",
-            user_id="+15551234567",
-            user_id_alt="uuid:abc",
-            chat_id_alt="xyz",
-        )
-        d = source.to_dict()
-        restored = SessionSource.from_dict(d)
-        assert restored.user_id_alt == "uuid:abc"
-        assert restored.chat_id_alt == "xyz"
-        assert restored.platform == Platform.SIGNAL
-
-
-# ---------------------------------------------------------------------------
-# Phone Redaction in agent/redact.py
-# ---------------------------------------------------------------------------
-
-class TestSignalPhoneRedaction:
-    def test_us_number(self):
-        from agent.redact import redact_sensitive_text
-        result = redact_sensitive_text("Call +15551234567 now")
-        assert "+15551234567" not in result
-        assert "+155" in result  # Prefix preserved
-        assert "4567" in result  # Suffix preserved
-
-    def test_uk_number(self):
-        from agent.redact import redact_sensitive_text
-        result = redact_sensitive_text("UK: +442071838750")
-        assert "+442071838750" not in result
-        assert "****" in result
-
-    def test_multiple_numbers(self):
-        from agent.redact import redact_sensitive_text
-        text = "From +15551234567 to +442071838750"
-        result = redact_sensitive_text(text)
-        assert "+15551234567" not in result
-        assert "+442071838750" not in result
-
-    def test_short_number_not_matched(self):
-        from agent.redact import redact_sensitive_text
-        result = redact_sensitive_text("Code: +12345")
-        # 5 digits after + is below the 7-digit minimum
-        assert "+12345" in result  # Too short to redact
-
-
-# ---------------------------------------------------------------------------
-# Authorization in run.py
-# ---------------------------------------------------------------------------
-
-class TestSignalAuthorization:
-    def test_signal_in_allowlist_maps(self):
-        """Signal should be in the platform auth maps."""
-        from gateway.run import GatewayRunner
-        from gateway.config import GatewayConfig
-
-        gw = GatewayRunner.__new__(GatewayRunner)
-        gw.config = GatewayConfig()
-        gw.pairing_store = MagicMock()
-        gw.pairing_store.is_approved.return_value = False
-
-        source = MagicMock()
-        source.platform = Platform.SIGNAL
-        source.user_id = "+15559999999"
-
-        # No allowlists set — should check GATEWAY_ALLOW_ALL_USERS
-        with patch.dict("os.environ", {}, clear=True):
-            result = gw._is_user_authorized(source)
-            assert result is False
-
-
-# ---------------------------------------------------------------------------
-# Send Message Tool
-# ---------------------------------------------------------------------------
-
-class TestSignalSendMessage:
-    def test_signal_in_platform_map(self):
-        """Signal should be in the send_message tool's platform map."""
-        from tools.send_message_tool import send_message_tool
-        # Just verify the import works and Signal is a valid platform
-        from gateway.config import Platform
-        assert Platform.SIGNAL.value == "signal"
@@ -1,542 +0,0 @@
-"""Tests for the interactive session browser (`hermes sessions browse`).
-
-Covers:
- _session_browse_picker logic (curses mocked, fallback tested)
- cmd_sessions 'browse' action integration
- Argument parser registration
-"""
-
-import os
-import time
-from unittest.mock import MagicMock, patch, call
-
-import pytest
-
-from hermes_cli.main import _session_browse_picker
-
-
-# ─── Sample session data ──────────────────────────────────────────────────────
-
-def _make_sessions(n=5):
-    """Generate a list of fake rich-session dicts."""
-    now = time.time()
-    sessions = []
-    for i in range(n):
-        sessions.append({
-            "id": f"20260308_{i:06d}_abcdef",
-            "source": "cli" if i % 2 == 0 else "telegram",
-            "model": "test/model",
-            "title": f"Session {i}" if i % 3 != 0 else None,
-            "preview": f"Hello from session {i}",
-            "last_active": now - i * 3600,
-            "started_at": now - i * 3600 - 60,
-            "message_count": (i + 1) * 5,
-        })
-    return sessions
-
-
-SAMPLE_SESSIONS = _make_sessions(5)
-
-
-# ─── _session_browse_picker ──────────────────────────────────────────────────
-
-class TestSessionBrowsePicker:
-    """Tests for the _session_browse_picker function."""
-
-    def test_empty_sessions_returns_none(self, capsys):
-        result = _session_browse_picker([])
-        assert result is None
-        assert "No sessions found" in capsys.readouterr().out
-
-    def test_returns_none_when_no_sessions(self, capsys):
-        result = _session_browse_picker([])
-        assert result is None
-
-    def test_fallback_mode_valid_selection(self):
-        """When curses is unavailable, fallback numbered list should work."""
-        sessions = _make_sessions(3)
-
-        # Mock curses import to fail, forcing fallback
-        import builtins
-        original_import = builtins.__import__
-
-        def mock_import(name, *args, **kwargs):
-            if name == "curses":
-                raise ImportError("no curses")
-            return original_import(name, *args, **kwargs)
-
-        with patch.object(builtins, "__import__", side_effect=mock_import):
-            with patch("builtins.input", return_value="2"):
-                result = _session_browse_picker(sessions)
-
-        assert result == sessions[1]["id"]
-
-    def test_fallback_mode_cancel_q(self):
-        """Entering 'q' in fallback mode cancels."""
-        sessions = _make_sessions(3)
-
-        import builtins
-        original_import = builtins.__import__
-
-        def mock_import(name, *args, **kwargs):
-            if name == "curses":
-                raise ImportError("no curses")
-            return original_import(name, *args, **kwargs)
-
-        with patch.object(builtins, "__import__", side_effect=mock_import):
-            with patch("builtins.input", return_value="q"):
-                result = _session_browse_picker(sessions)
-
-        assert result is None
-
-    def test_fallback_mode_cancel_empty(self):
-        """Entering empty string in fallback mode cancels."""
-        sessions = _make_sessions(3)
-
-        import builtins
-        original_import = builtins.__import__
-
-        def mock_import(name, *args, **kwargs):
-            if name == "curses":
-                raise ImportError("no curses")
-            return original_import(name, *args, **kwargs)
-
-        with patch.object(builtins, "__import__", side_effect=mock_import):
-            with patch("builtins.input", return_value=""):
-                result = _session_browse_picker(sessions)
-
-        assert result is None
-
-    def test_fallback_mode_invalid_then_valid(self):
-        """Invalid selection followed by valid one works."""
-        sessions = _make_sessions(3)
-
-        import builtins
-        original_import = builtins.__import__
-
-        def mock_import(name, *args, **kwargs):
-            if name == "curses":
-                raise ImportError("no curses")
-            return original_import(name, *args, **kwargs)
-
-        with patch.object(builtins, "__import__", side_effect=mock_import):
-            with patch("builtins.input", side_effect=["99", "1"]):
-                result = _session_browse_picker(sessions)
-
-        assert result == sessions[0]["id"]
-
-    def test_fallback_mode_keyboard_interrupt(self):
-        """KeyboardInterrupt in fallback mode returns None."""
-        sessions = _make_sessions(3)
-
-        import builtins
-        original_import = builtins.__import__
-
-        def mock_import(name, *args, **kwargs):
-            if name == "curses":
-                raise ImportError("no curses")
-            return original_import(name, *args, **kwargs)
-
-        with patch.object(builtins, "__import__", side_effect=mock_import):
-            with patch("builtins.input", side_effect=KeyboardInterrupt):
-                result = _session_browse_picker(sessions)
-
-        assert result is None
-
-    def test_fallback_displays_all_sessions(self, capsys):
-        """Fallback mode should display all session entries."""
-        sessions = _make_sessions(4)
-
-        import builtins
-        original_import = builtins.__import__
-
-        def mock_import(name, *args, **kwargs):
-            if name == "curses":
-                raise ImportError("no curses")
-            return original_import(name, *args, **kwargs)
-
-        with patch.object(builtins, "__import__", side_effect=mock_import):
-            with patch("builtins.input", return_value="q"):
-                _session_browse_picker(sessions)
-
-        output = capsys.readouterr().out
-        # All 4 entries should be shown
-        assert "1." in output
-        assert "2." in output
-        assert "3." in output
-        assert "4." in output
-
-    def test_fallback_shows_title_over_preview(self, capsys):
-        """When a session has a title, show it instead of the preview."""
-        sessions = [{
-            "id": "test_001",
-            "source": "cli",
-            "title": "My Cool Project",
-            "preview": "some preview text",
-            "last_active": time.time(),
-        }]
-
-        import builtins
-        original_import = builtins.__import__
-
-        def mock_import(name, *args, **kwargs):
-            if name == "curses":
-                raise ImportError("no curses")
-            return original_import(name, *args, **kwargs)
-
-        with patch.object(builtins, "__import__", side_effect=mock_import):
-            with patch("builtins.input", return_value="q"):
-                _session_browse_picker(sessions)
-
-        output = capsys.readouterr().out
-        assert "My Cool Project" in output
-
-    def test_fallback_shows_preview_when_no_title(self, capsys):
-        """When no title, show preview."""
-        sessions = [{
-            "id": "test_002",
-            "source": "cli",
-            "title": None,
-            "preview": "Hello world test message",
-            "last_active": time.time(),
-        }]
-
-        import builtins
-        original_import = builtins.__import__
-
-        def mock_import(name, *args, **kwargs):
-            if name == "curses":
-                raise ImportError("no curses")
-            return original_import(name, *args, **kwargs)
-
-        with patch.object(builtins, "__import__", side_effect=mock_import):
-            with patch("builtins.input", return_value="q"):
-                _session_browse_picker(sessions)
-
-        output = capsys.readouterr().out
-        assert "Hello world test message" in output
-
-    def test_fallback_shows_id_when_no_title_or_preview(self, capsys):
-        """When neither title nor preview, show session ID."""
-        sessions = [{
-            "id": "test_003_fallback",
-            "source": "cli",
-            "title": None,
-            "preview": "",
-            "last_active": time.time(),
-        }]
-
-        import builtins
-        original_import = builtins.__import__
-
-        def mock_import(name, *args, **kwargs):
-            if name == "curses":
-                raise ImportError("no curses")
-            return original_import(name, *args, **kwargs)
-
-        with patch.object(builtins, "__import__", side_effect=mock_import):
-            with patch("builtins.input", return_value="q"):
-                _session_browse_picker(sessions)
-
-        output = capsys.readouterr().out
-        assert "test_003_fallback" in output
-
-
-# ─── Curses-based picker (mocked curses) ────────────────────────────────────
-
-class TestCursesBrowse:
-    """Tests for the curses-based interactive picker via simulated key sequences."""
-
-    def _run_with_keys(self, sessions, key_sequence):
-        """Simulate running the curses picker with a given key sequence."""
-        import curses
-
-        # Build a mock stdscr that returns keys from the sequence
-        mock_stdscr = MagicMock()
-        mock_stdscr.getmaxyx.return_value = (30, 120)
-        mock_stdscr.getch.side_effect = key_sequence
-
-        # Capture what curses.wrapper receives and call it with our mock
-        with patch("curses.wrapper") as mock_wrapper:
-            # When wrapper is called, invoke the function with our mock stdscr
-            def run_inner(func):
-                try:
-                    func(mock_stdscr)
-                except StopIteration:
-                    pass  # key sequence exhausted
-
-            mock_wrapper.side_effect = run_inner
-            with patch("curses.curs_set"):
-                with patch("curses.has_colors", return_value=False):
-                    return _session_browse_picker(sessions)
-
-    def test_enter_selects_first_session(self):
-        sessions = _make_sessions(3)
-        result = self._run_with_keys(sessions, [10])  # Enter key
-        assert result == sessions[0]["id"]
-
-    def test_down_then_enter_selects_second(self):
-        import curses
-        sessions = _make_sessions(3)
-        result = self._run_with_keys(sessions, [curses.KEY_DOWN, 10])
-        assert result == sessions[1]["id"]
-
-    def test_down_down_enter_selects_third(self):
-        import curses
-        sessions = _make_sessions(5)
-        result = self._run_with_keys(sessions, [curses.KEY_DOWN, curses.KEY_DOWN, 10])
-        assert result == sessions[2]["id"]
-
-    def test_up_wraps_to_last(self):
-        import curses
-        sessions = _make_sessions(3)
-        result = self._run_with_keys(sessions, [curses.KEY_UP, 10])
-        assert result == sessions[2]["id"]
-
-    def test_escape_cancels(self):
-        sessions = _make_sessions(3)
-        result = self._run_with_keys(sessions, [27])  # Esc
-        assert result is None
-
-    def test_q_cancels(self):
-        sessions = _make_sessions(3)
-        result = self._run_with_keys(sessions, [ord('q')])
-        assert result is None
-
-    def test_type_to_filter_then_enter(self):
-        """Typing characters filters the list, Enter selects from filtered."""
-        import curses
-        sessions = [
-            {"id": "s1", "source": "cli", "title": "Alpha project", "preview": "", "last_active": time.time()},
-            {"id": "s2", "source": "cli", "title": "Beta project", "preview": "", "last_active": time.time()},
-            {"id": "s3", "source": "cli", "title": "Gamma project", "preview": "", "last_active": time.time()},
-        ]
-        # Type "Beta" then Enter — should select s2
-        keys = [ord(c) for c in "Beta"] + [10]
-        result = self._run_with_keys(sessions, keys)
-        assert result == "s2"
-
-    def test_filter_no_match_enter_does_nothing(self):
-        """When filter produces no results, Enter shouldn't select."""
-        sessions = _make_sessions(3)
-        keys = [ord(c) for c in "zzzznonexistent"] + [10]
-        result = self._run_with_keys(sessions, keys)
-        assert result is None
-
-    def test_backspace_removes_filter_char(self):
-        """Backspace removes the last character from the filter."""
-        import curses
-        sessions = [
-            {"id": "s1", "source": "cli", "title": "Alpha", "preview": "", "last_active": time.time()},
-            {"id": "s2", "source": "cli", "title": "Beta", "preview": "", "last_active": time.time()},
-        ]
-        # Type "Bet", backspace, backspace, backspace (clears filter), then Enter (selects first)
-        keys = [ord('B'), ord('e'), ord('t'), 127, 127, 127, 10]
-        result = self._run_with_keys(sessions, keys)
-        assert result == "s1"
-
-    def test_escape_clears_filter_first(self):
-        """First Esc clears the search text, second Esc exits."""
-        import curses
-        sessions = _make_sessions(3)
-        # Type "ab" then Esc (clears filter) then Enter (selects first)
-        keys = [ord('a'), ord('b'), 27, 10]
-        result = self._run_with_keys(sessions, keys)
-        assert result == sessions[0]["id"]
-
-    def test_filter_matches_preview(self):
-        """Typing should match against session preview text."""
-        sessions = [
-            {"id": "s1", "source": "cli", "title": None, "preview": "Set up Minecraft server", "last_active": time.time()},
-            {"id": "s2", "source": "cli", "title": None, "preview": "Review PR 438", "last_active": time.time()},
-        ]
-        keys = [ord(c) for c in "Mine"] + [10]
-        result = self._run_with_keys(sessions, keys)
-        assert result == "s1"
-
-    def test_filter_matches_source(self):
-        """Typing a source name should filter by source."""
-        sessions = [
-            {"id": "s1", "source": "telegram", "title": "TG session", "preview": "", "last_active": time.time()},
-            {"id": "s2", "source": "cli", "title": "CLI session", "preview": "", "last_active": time.time()},
-        ]
-        keys = [ord(c) for c in "telegram"] + [10]
-        result = self._run_with_keys(sessions, keys)
-        assert result == "s1"
-
-    def test_q_quits_when_no_filter_active(self):
-        """When no search text is active, 'q' should quit (not filter)."""
-        sessions = _make_sessions(3)
-        result = self._run_with_keys(sessions, [ord('q')])
-        assert result is None
-
-    def test_q_types_into_filter_when_filter_active(self):
-        """When search text is already active, 'q' should add to filter, not quit."""
-        sessions = [
-            {"id": "s1", "source": "cli", "title": "the sequel", "preview": "", "last_active": time.time()},
-            {"id": "s2", "source": "cli", "title": "other thing", "preview": "", "last_active": time.time()},
-        ]
-        # Type "se" first (activates filter, matches "the sequel")
-        # Then type "q" — should add 'q' to filter (filter="seq"), NOT quit
-        # "seq" still matches "the sequel" → Enter selects it
-        keys = [ord('s'), ord('e'), ord('q'), 10]
-        result = self._run_with_keys(sessions, keys)
-        assert result == "s1"  # "the sequel" matches "seq"
-
-
-# ─── Argument parser registration ──────────────────────────────────────────
-
-class TestSessionBrowseArgparse:
-    """Verify the 'browse' subcommand is properly registered."""
-
-    def test_browse_subcommand_exists(self):
-        """hermes sessions browse should be parseable."""
-        from hermes_cli.main import main as _main_entry
-
-        # We can't run main(), but we can import and test the parser setup
-        # by checking that argparse doesn't error on "sessions browse"
-        import argparse
-        # Re-create the parser portion
-        # Instead, let's just verify the import works and the function exists
-        from hermes_cli.main import _session_browse_picker
-        assert callable(_session_browse_picker)
-
-    def test_browse_default_limit_is_50(self):
-        """The default --limit for browse should be 50."""
-        # This test verifies at the argparse level
-        # We test by running the parse on "sessions browse" args
-        # Since we can't easily extract the subparser, verify via the
-        # _session_browse_picker accepting large lists
-        sessions = _make_sessions(50)
-        assert len(sessions) == 50
-
-
-# ─── Integration: cmd_sessions browse action ────────────────────────────────
-
-class TestCmdSessionsBrowse:
-    """Integration tests for the 'browse' action in cmd_sessions."""
-
-    def test_browse_no_sessions_prints_message(self, capsys):
-        """When no sessions exist, _session_browse_picker returns None and prints message."""
-        result = _session_browse_picker([])
-        assert result is None
-        output = capsys.readouterr().out
-        assert "No sessions found" in output
-
-    def test_browse_with_source_filter(self):
-        """The --source flag should be passed to list_sessions_rich."""
-        sessions = [
-            {"id": "s1", "source": "cli", "title": "CLI only", "preview": "", "last_active": time.time()},
-        ]
-
-        import builtins
-        original_import = builtins.__import__
-
-        def mock_import(name, *args, **kwargs):
-            if name == "curses":
-                raise ImportError("no curses")
-            return original_import(name, *args, **kwargs)
-
-        with patch.object(builtins, "__import__", side_effect=mock_import):
-            with patch("builtins.input", return_value="1"):
-                result = _session_browse_picker(sessions)
-
-        assert result == "s1"
-
-
-# ─── Edge cases ──────────────────────────────────────────────────────────────
-
-class TestEdgeCases:
-    """Edge case handling for the session browser."""
-
-    def test_sessions_with_missing_fields(self):
-        """Sessions with missing optional fields should not crash."""
-        sessions = [
-            {"id": "minimal_001", "source": "cli"},  # No title, preview, last_active
-        ]
-
-        import builtins
-        original_import = builtins.__import__
-
-        def mock_import(name, *args, **kwargs):
-            if name == "curses":
-                raise ImportError("no curses")
-            return original_import(name, *args, **kwargs)
-
-        with patch.object(builtins, "__import__", side_effect=mock_import):
-            with patch("builtins.input", return_value="1"):
-                result = _session_browse_picker(sessions)
-
-        assert result == "minimal_001"
-
-    def test_single_session(self):
-        """A single session in the list should work fine."""
-        sessions = [
-            {"id": "only_one", "source": "cli", "title": "Solo", "preview": "", "last_active": time.time()},
-        ]
-
-        import builtins
-        original_import = builtins.__import__
-
-        def mock_import(name, *args, **kwargs):
-            if name == "curses":
-                raise ImportError("no curses")
-            return original_import(name, *args, **kwargs)
-
-        with patch.object(builtins, "__import__", side_effect=mock_import):
-            with patch("builtins.input", return_value="1"):
-                result = _session_browse_picker(sessions)
-
-        assert result == "only_one"
-
-    def test_long_title_truncated_in_fallback(self, capsys):
-        """Very long titles should be truncated in fallback mode."""
-        sessions = [{
-            "id": "long_title_001",
-            "source": "cli",
-            "title": "A" * 100,
-            "preview": "",
-            "last_active": time.time(),
-        }]
-
-        import builtins
-        original_import = builtins.__import__
-
-        def mock_import(name, *args, **kwargs):
-            if name == "curses":
-                raise ImportError("no curses")
-            return original_import(name, *args, **kwargs)
-
-        with patch.object(builtins, "__import__", side_effect=mock_import):
-            with patch("builtins.input", return_value="q"):
-                _session_browse_picker(sessions)
-
-        output = capsys.readouterr().out
-        # Title should be truncated to 50 chars with "..."
-        assert "..." in output
-
-    def test_relative_time_formatting(self, capsys):
-        """Verify various time deltas format correctly."""
-        now = time.time()
-        sessions = [
-            {"id": "recent", "source": "cli", "title": None, "preview": "just now test", "last_active": now},
-            {"id": "hour_ago", "source": "cli", "title": None, "preview": "hour ago test", "last_active": now - 7200},
-            {"id": "days_ago", "source": "cli", "title": None, "preview": "days ago test", "last_active": now - 259200},
-        ]
-
-        import builtins
-        original_import = builtins.__import__
-
-        def mock_import(name, *args, **kwargs):
-            if name == "curses":
-                raise ImportError("no curses")
-            return original_import(name, *args, **kwargs)
-
-        with patch.object(builtins, "__import__", side_effect=mock_import):
-            with patch("builtins.input", return_value="q"):
-                _session_browse_picker(sessions)
-
-        output = capsys.readouterr().out
-        assert "just now" in output
-        assert "2h ago" in output
-        assert "3d ago" in output
@@ -38,6 +38,7 @@ class TestExplicitAllowlist:
        "OPENROUTER_API_KEY",
        "OPENAI_API_KEY",
        "ANTHROPIC_API_KEY",
+        "NOUS_API_KEY",
        "WANDB_API_KEY",
        "TINKER_API_KEY",
        "HONCHO_API_KEY",
@@ -1,31 +0,0 @@
-from io import StringIO
-
-from rich.console import Console
-
-from hermes_cli.skills_hub import do_list
-
-
-def test_do_list_initializes_hub_dir(monkeypatch, tmp_path):
-    import tools.skills_hub as hub
-    import tools.skills_tool as skills_tool
-
-    hub_dir = tmp_path / "skills" / ".hub"
-    monkeypatch.setattr(hub, "SKILLS_DIR", tmp_path / "skills")
-    monkeypatch.setattr(hub, "HUB_DIR", hub_dir)
-    monkeypatch.setattr(hub, "LOCK_FILE", hub_dir / "lock.json")
-    monkeypatch.setattr(hub, "QUARANTINE_DIR", hub_dir / "quarantine")
-    monkeypatch.setattr(hub, "AUDIT_LOG", hub_dir / "audit.log")
-    monkeypatch.setattr(hub, "TAPS_FILE", hub_dir / "taps.json")
-    monkeypatch.setattr(hub, "INDEX_CACHE_DIR", hub_dir / "index-cache")
-    monkeypatch.setattr(skills_tool, "_find_all_skills", lambda: [])
-
-    console = Console(file=StringIO(), force_terminal=False, color_system=None)
-
-    assert not hub_dir.exists()
-
-    do_list(console=console)
-
-    assert hub_dir.exists()
-    assert (hub_dir / "lock.json").exists()
-    assert (hub_dir / "quarantine").is_dir()
-    assert (hub_dir / "index-cache").is_dir()
@@ -12,7 +12,7 @@ Usage:

 Requirements:
    - FIRECRAWL_API_KEY environment variable must be set
-    - An auxiliary LLM provider (OPENROUTER_API_KEY or Nous Portal auth) (optional, for LLM tests)
+    - NOUS_API_KEY environment variable (optional, for LLM tests)
 """

 import pytest
@@ -128,12 +128,12 @@ class WebToolsTester:
        else:
            self.log_result("Firecrawl API Key", "passed", "Found")
        
-        # Check auxiliary LLM provider (optional)
+        # Check Nous API key (optional)
        if not check_auxiliary_model():
-            self.log_result("Auxiliary LLM", "skipped", "No auxiliary LLM provider available (LLM tests will be skipped)")
+            self.log_result("Nous API Key", "skipped", "NOUS_API_KEY not set (LLM tests will be skipped)")
            self.test_llm = False
        else:
-            self.log_result("Auxiliary LLM", "passed", "Found")
+            self.log_result("Nous API Key", "passed", "Found")
        
        # Check debug mode
        debug_info = get_debug_session_info()
@@ -0,0 +1,486 @@
+"""
+Tests for environments/agent_loop.py — HermesAgentLoop.
+
+Tests the multi-turn agent engine using mocked servers, without needing
+real API keys or running servers.
+"""
+
+import asyncio
+import json
+import sys
+from dataclasses import dataclass
+from pathlib import Path
+from typing import Any, Dict, List, Optional
+from unittest.mock import MagicMock
+
+import pytest
+
+# Ensure repo root is importable
+sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
+
+try:
+    from environments.agent_loop import (
+        AgentResult,
+        HermesAgentLoop,
+        ToolError,
+        _extract_reasoning_from_message,
+        resize_tool_pool,
+    )
+except ImportError:
+    pytest.skip("atroposlib not installed", allow_module_level=True)
+
+
+# ─── Mock server infrastructure ─────────────────────────────────────────
+
+
+@dataclass
+class MockFunction:
+    name: str
+    arguments: str
+
+
+@dataclass
+class MockToolCall:
+    id: str
+    function: MockFunction
+    type: str = "function"
+
+
+@dataclass
+class MockMessage:
+    content: Optional[str]
+    role: str = "assistant"
+    tool_calls: Optional[List[MockToolCall]] = None
+    reasoning_content: Optional[str] = None
+    reasoning: Optional[str] = None
+    reasoning_details: Optional[list] = None
+
+
+@dataclass
+class MockChoice:
+    message: MockMessage
+    finish_reason: str = "stop"
+    index: int = 0
+
+
+@dataclass
+class MockChatCompletion:
+    choices: List[MockChoice]
+    id: str = "chatcmpl-mock"
+    model: str = "mock-model"
+
+
+class MockServer:
+    """
+    Mock server that returns pre-configured responses in sequence.
+    Mimics the chat_completion() interface.
+    """
+
+    def __init__(self, responses: List[MockChatCompletion]):
+        self.responses = responses
+        self.call_count = 0
+        self.call_history: List[Dict[str, Any]] = []
+
+    async def chat_completion(self, **kwargs) -> MockChatCompletion:
+        self.call_history.append(kwargs)
+        if self.call_count >= len(self.responses):
+            # Return a simple text response if we run out
+            return MockChatCompletion(
+                choices=[MockChoice(message=MockMessage(content="Done."))]
+            )
+        resp = self.responses[self.call_count]
+        self.call_count += 1
+        return resp
+
+
+def make_text_response(content: str) -> MockChatCompletion:
+    """Create a simple text-only response (no tool calls)."""
+    return MockChatCompletion(
+        choices=[MockChoice(message=MockMessage(content=content))]
+    )
+
+
+def make_tool_response(
+    tool_name: str,
+    arguments: dict,
+    content: str = "",
+    tool_call_id: str = "call_001",
+) -> MockChatCompletion:
+    """Create a response with a single tool call."""
+    return MockChatCompletion(
+        choices=[
+            MockChoice(
+                message=MockMessage(
+                    content=content,
+                    tool_calls=[
+                        MockToolCall(
+                            id=tool_call_id,
+                            function=MockFunction(
+                                name=tool_name,
+                                arguments=json.dumps(arguments),
+                            ),
+                        )
+                    ],
+                ),
+                finish_reason="tool_calls",
+            )
+        ]
+    )
+
+
+# ─── Tests ───────────────────────────────────────────────────────────────
+
+
+class TestAgentResult:
+    def test_defaults(self):
+        result = AgentResult(messages=[])
+        assert result.messages == []
+        assert result.managed_state is None
+        assert result.turns_used == 0
+        assert result.finished_naturally is False
+        assert result.reasoning_per_turn == []
+        assert result.tool_errors == []
+
+
+class TestExtractReasoning:
+    def test_reasoning_content_field(self):
+        msg = MockMessage(content="hello", reasoning_content="I think...")
+        assert _extract_reasoning_from_message(msg) == "I think..."
+
+    def test_reasoning_field(self):
+        msg = MockMessage(content="hello", reasoning="Let me consider...")
+        assert _extract_reasoning_from_message(msg) == "Let me consider..."
+
+    def test_reasoning_details(self):
+        detail = MagicMock()
+        detail.text = "Detail reasoning"
+        msg = MockMessage(content="hello", reasoning_details=[detail])
+        assert _extract_reasoning_from_message(msg) == "Detail reasoning"
+
+    def test_reasoning_details_dict_format(self):
+        msg = MockMessage(
+            content="hello",
+            reasoning_details=[{"text": "Dict reasoning"}],
+        )
+        assert _extract_reasoning_from_message(msg) == "Dict reasoning"
+
+    def test_no_reasoning(self):
+        msg = MockMessage(content="hello")
+        assert _extract_reasoning_from_message(msg) is None
+
+    def test_reasoning_content_takes_priority(self):
+        msg = MockMessage(
+            content="hello",
+            reasoning_content="First",
+            reasoning="Second",
+        )
+        assert _extract_reasoning_from_message(msg) == "First"
+
+
+class TestHermesAgentLoop:
+    """Test the agent loop with mock servers."""
+
+    @pytest.fixture
+    def basic_tools(self):
+        """Minimal tool schema for testing."""
+        return [
+            {
+                "type": "function",
+                "function": {
+                    "name": "terminal",
+                    "description": "Run a command",
+                    "parameters": {
+                        "type": "object",
+                        "properties": {
+                            "command": {
+                                "type": "string",
+                                "description": "Command to run",
+                            }
+                        },
+                        "required": ["command"],
+                    },
+                },
+            },
+            {
+                "type": "function",
+                "function": {
+                    "name": "read_file",
+                    "description": "Read a file",
+                    "parameters": {
+                        "type": "object",
+                        "properties": {
+                            "path": {"type": "string"},
+                        },
+                        "required": ["path"],
+                    },
+                },
+            },
+        ]
+
+    @pytest.fixture
+    def valid_names(self):
+        return {"terminal", "read_file", "todo"}
+
+    @pytest.mark.asyncio
+    async def test_simple_text_response(self, basic_tools, valid_names):
+        """Model responds with text only, no tool calls."""
+        server = MockServer([make_text_response("Hello! How can I help?")])
+        agent = HermesAgentLoop(
+            server=server,
+            tool_schemas=basic_tools,
+            valid_tool_names=valid_names,
+            max_turns=10,
+        )
+        messages = [{"role": "user", "content": "Hi"}]
+        result = await agent.run(messages)
+
+        assert result.finished_naturally is True
+        assert result.turns_used == 1
+        assert len(result.messages) >= 2  # user + assistant
+        assert result.messages[-1]["role"] == "assistant"
+        assert result.messages[-1]["content"] == "Hello! How can I help?"
+
+    @pytest.mark.asyncio
+    async def test_tool_call_then_text(self, basic_tools, valid_names):
+        """Model calls a tool, then responds with text."""
+        server = MockServer([
+            make_tool_response("todo", {"todos": [{"id": "1", "content": "test", "status": "pending"}]}),
+            make_text_response("I created a todo for you."),
+        ])
+        agent = HermesAgentLoop(
+            server=server,
+            tool_schemas=basic_tools,
+            valid_tool_names=valid_names,
+            max_turns=10,
+        )
+        messages = [{"role": "user", "content": "Create a todo"}]
+        result = await agent.run(messages)
+
+        assert result.finished_naturally is True
+        assert result.turns_used == 2
+        # Should have: user, assistant (tool_call), tool (result), assistant (text)
+        roles = [m["role"] for m in result.messages]
+        assert roles == ["user", "assistant", "tool", "assistant"]
+
+    @pytest.mark.asyncio
+    async def test_max_turns_reached(self, basic_tools, valid_names):
+        """Model keeps calling tools until max_turns is hit."""
+        # Create responses that always call a tool
+        responses = [
+            make_tool_response("todo", {"todos": [{"id": str(i), "content": f"task {i}", "status": "pending"}]}, tool_call_id=f"call_{i}")
+            for i in range(10)
+        ]
+        server = MockServer(responses)
+        agent = HermesAgentLoop(
+            server=server,
+            tool_schemas=basic_tools,
+            valid_tool_names=valid_names,
+            max_turns=3,
+        )
+        messages = [{"role": "user", "content": "Keep going"}]
+        result = await agent.run(messages)
+
+        assert result.finished_naturally is False
+        assert result.turns_used == 3
+
+    @pytest.mark.asyncio
+    async def test_unknown_tool_name(self, basic_tools, valid_names):
+        """Model calls a tool not in valid_tool_names."""
+        server = MockServer([
+            make_tool_response("nonexistent_tool", {"arg": "val"}),
+            make_text_response("OK, that didn't work."),
+        ])
+        agent = HermesAgentLoop(
+            server=server,
+            tool_schemas=basic_tools,
+            valid_tool_names=valid_names,
+            max_turns=10,
+        )
+        messages = [{"role": "user", "content": "Call something weird"}]
+        result = await agent.run(messages)
+
+        # Should record a tool error
+        assert len(result.tool_errors) >= 1
+        assert result.tool_errors[0].tool_name == "nonexistent_tool"
+
+    @pytest.mark.asyncio
+    async def test_empty_response(self, basic_tools, valid_names):
+        """Server returns empty response."""
+        server = MockServer([MockChatCompletion(choices=[])])
+        agent = HermesAgentLoop(
+            server=server,
+            tool_schemas=basic_tools,
+            valid_tool_names=valid_names,
+            max_turns=10,
+        )
+        messages = [{"role": "user", "content": "Hi"}]
+        result = await agent.run(messages)
+
+        assert result.finished_naturally is False
+        assert result.turns_used == 1
+
+    @pytest.mark.asyncio
+    async def test_api_error_handling(self, basic_tools, valid_names):
+        """Server raises an exception."""
+
+        class FailingServer:
+            async def chat_completion(self, **kwargs):
+                raise ConnectionError("Server unreachable")
+
+        agent = HermesAgentLoop(
+            server=FailingServer(),
+            tool_schemas=basic_tools,
+            valid_tool_names=valid_names,
+            max_turns=10,
+        )
+        messages = [{"role": "user", "content": "Hi"}]
+        result = await agent.run(messages)
+
+        assert result.finished_naturally is False
+        assert result.turns_used == 1
+
+    @pytest.mark.asyncio
+    async def test_tools_passed_to_server(self, basic_tools, valid_names):
+        """Verify tools are passed in the chat_completion kwargs."""
+        server = MockServer([make_text_response("OK")])
+        agent = HermesAgentLoop(
+            server=server,
+            tool_schemas=basic_tools,
+            valid_tool_names=valid_names,
+            max_turns=10,
+        )
+        messages = [{"role": "user", "content": "Hi"}]
+        await agent.run(messages)
+
+        assert len(server.call_history) == 1
+        assert "tools" in server.call_history[0]
+        assert server.call_history[0]["tools"] == basic_tools
+
+    @pytest.mark.asyncio
+    async def test_extra_body_forwarded(self, basic_tools, valid_names):
+        """extra_body should be forwarded to server."""
+        extra = {"provider": {"ignore": ["DeepInfra"]}}
+        server = MockServer([make_text_response("OK")])
+        agent = HermesAgentLoop(
+            server=server,
+            tool_schemas=basic_tools,
+            valid_tool_names=valid_names,
+            max_turns=10,
+            extra_body=extra,
+        )
+        messages = [{"role": "user", "content": "Hi"}]
+        await agent.run(messages)
+
+        assert server.call_history[0].get("extra_body") == extra
+
+    @pytest.mark.asyncio
+    async def test_managed_state_returned(self, basic_tools, valid_names):
+        """If server has get_state(), result should include managed_state."""
+        server = MockServer([make_text_response("OK")])
+        server.get_state = lambda: {"nodes": [{"test": True}]}
+
+        agent = HermesAgentLoop(
+            server=server,
+            tool_schemas=basic_tools,
+            valid_tool_names=valid_names,
+            max_turns=10,
+        )
+        messages = [{"role": "user", "content": "Hi"}]
+        result = await agent.run(messages)
+
+        assert result.managed_state is not None
+        assert "nodes" in result.managed_state
+
+    @pytest.mark.asyncio
+    async def test_no_managed_state_without_get_state(self, basic_tools, valid_names):
+        """Regular server without get_state() should return None managed_state."""
+        server = MockServer([make_text_response("OK")])
+        agent = HermesAgentLoop(
+            server=server,
+            tool_schemas=basic_tools,
+            valid_tool_names=valid_names,
+            max_turns=10,
+        )
+        messages = [{"role": "user", "content": "Hi"}]
+        result = await agent.run(messages)
+
+        assert result.managed_state is None
+
+    @pytest.mark.asyncio
+    async def test_memory_tool_blocked(self, basic_tools):
+        """Memory tool should return error in RL environments."""
+        valid = {"terminal", "read_file", "todo", "memory"}
+        server = MockServer([
+            make_tool_response("memory", {"action": "add", "target": "user", "content": "test"}),
+            make_text_response("Done"),
+        ])
+        agent = HermesAgentLoop(
+            server=server,
+            tool_schemas=basic_tools,
+            valid_tool_names=valid,
+            max_turns=10,
+        )
+        messages = [{"role": "user", "content": "Remember this"}]
+        result = await agent.run(messages)
+
+        # Find the tool response
+        tool_msgs = [m for m in result.messages if m["role"] == "tool"]
+        assert len(tool_msgs) >= 1
+        tool_result = json.loads(tool_msgs[0]["content"])
+        assert "error" in tool_result
+        assert "not available" in tool_result["error"].lower()
+
+    @pytest.mark.asyncio
+    async def test_session_search_blocked(self, basic_tools):
+        """session_search should return error in RL environments."""
+        valid = {"terminal", "read_file", "todo", "session_search"}
+        server = MockServer([
+            make_tool_response("session_search", {"query": "test"}),
+            make_text_response("Done"),
+        ])
+        agent = HermesAgentLoop(
+            server=server,
+            tool_schemas=basic_tools,
+            valid_tool_names=valid,
+            max_turns=10,
+        )
+        messages = [{"role": "user", "content": "Search sessions"}]
+        result = await agent.run(messages)
+
+        tool_msgs = [m for m in result.messages if m["role"] == "tool"]
+        assert len(tool_msgs) >= 1
+        tool_result = json.loads(tool_msgs[0]["content"])
+        assert "error" in tool_result
+
+    @pytest.mark.asyncio
+    async def test_reasoning_content_preserved(self, basic_tools, valid_names):
+        """Reasoning content should be extracted and preserved."""
+        resp = MockChatCompletion(
+            choices=[
+                MockChoice(
+                    message=MockMessage(
+                        content="The answer is 42.",
+                        reasoning_content="Let me think about this step by step...",
+                    )
+                )
+            ]
+        )
+        server = MockServer([resp])
+        agent = HermesAgentLoop(
+            server=server,
+            tool_schemas=basic_tools,
+            valid_tool_names=valid_names,
+            max_turns=10,
+        )
+        messages = [{"role": "user", "content": "What is the meaning of life?"}]
+        result = await agent.run(messages)
+
+        assert len(result.reasoning_per_turn) == 1
+        assert result.reasoning_per_turn[0] == "Let me think about this step by step..."
+
+
+class TestResizeToolPool:
+    def test_resize_works(self):
+        """resize_tool_pool should not raise."""
+        resize_tool_pool(16)  # Small pool for testing
+        resize_tool_pool(128)  # Restore default
@@ -0,0 +1,550 @@
+"""Integration tests for HermesAgentLoop tool calling.
+
+Tests the full agent loop with real LLM calls via OpenRouter.
+Uses stepfun/step-3.5-flash:free by default (zero cost), falls back
+to anthropic/claude-sonnet-4 if the free model is unavailable.
+
+These tests verify:
+1. Single tool call: model calls a tool, gets result, responds
+2. Multi-tool call: model calls multiple tools in one turn
+3. Multi-turn: model calls tools across multiple turns
+4. Unknown tool rejection: model calling a non-existent tool gets an error
+5. Max turns: loop stops when max_turns is reached
+6. No tools: model responds without calling any tools
+7. Tool error handling: tool execution errors are captured
+
+Run:
+    pytest tests/test_agent_loop_tool_calling.py -v
+    pytest tests/test_agent_loop_tool_calling.py -v -k "single"  # run one test
+"""
+
+import asyncio
+import json
+import os
+import sys
+from pathlib import Path
+from typing import Any, Dict, List, Set
+from unittest.mock import patch
+
+import pytest
+
+# Ensure repo root is importable
+_repo_root = Path(__file__).resolve().parent.parent
+if str(_repo_root) not in sys.path:
+    sys.path.insert(0, str(_repo_root))
+
+try:
+    from environments.agent_loop import AgentResult, HermesAgentLoop
+    from atroposlib.envs.server_handling.openai_server import OpenAIServer  # noqa: F401
+except ImportError:
+    pytest.skip("atroposlib not installed", allow_module_level=True)
+
+
+# =========================================================================
+# Test infrastructure
+# =========================================================================
+
+# Models to try, in order of preference (free first)
+_MODELS = [
+    "stepfun/step-3.5-flash:free",
+    "google/gemini-2.0-flash-001",
+    "anthropic/claude-sonnet-4",
+]
+
+def _get_api_key():
+    key = os.getenv("OPENROUTER_API_KEY", "")
+    if not key:
+        pytest.skip("OPENROUTER_API_KEY not set")
+    return key
+
+
+def _make_server(model: str = None):
+    """Create an OpenAI server for testing."""
+    from atroposlib.envs.server_handling.openai_server import OpenAIServer
+    from atroposlib.envs.server_handling.server_manager import APIServerConfig
+
+    config = APIServerConfig(
+        base_url="https://openrouter.ai/api/v1",
+        model_name=model or _MODELS[0],
+        server_type="openai",
+        api_key=_get_api_key(),
+        health_check=False,
+    )
+    return OpenAIServer(config)
+
+
+async def _try_models(test_fn):
+    """Try running a test with each model until one works."""
+    last_error = None
+    for model in _MODELS:
+        try:
+            server = _make_server(model)
+            return await test_fn(server, model)
+        except Exception as e:
+            last_error = e
+            if "rate" in str(e).lower() or "limit" in str(e).lower():
+                continue  # Rate limited, try next model
+            raise  # Real error
+    pytest.skip(f"All models failed. Last error: {last_error}")
+
+
+# =========================================================================
+# Fake tools for testing
+# =========================================================================
+
+# Simple calculator tool
+CALC_TOOL = {
+    "type": "function",
+    "function": {
+        "name": "calculate",
+        "description": "Calculate a math expression. Returns the numeric result.",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "expression": {
+                    "type": "string",
+                    "description": "Math expression to evaluate, e.g. '2 + 3'"
+                }
+            },
+            "required": ["expression"],
+        },
+    },
+}
+
+# Weather lookup tool
+WEATHER_TOOL = {
+    "type": "function",
+    "function": {
+        "name": "get_weather",
+        "description": "Get the current weather for a city. Returns temperature and conditions.",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "city": {
+                    "type": "string",
+                    "description": "City name, e.g. 'Tokyo'"
+                }
+            },
+            "required": ["city"],
+        },
+    },
+}
+
+# Lookup tool (always succeeds)
+LOOKUP_TOOL = {
+    "type": "function",
+    "function": {
+        "name": "lookup",
+        "description": "Look up a fact. Returns a short answer string.",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "query": {
+                    "type": "string",
+                    "description": "What to look up"
+                }
+            },
+            "required": ["query"],
+        },
+    },
+}
+
+# Error tool (always fails)
+ERROR_TOOL = {
+    "type": "function",
+    "function": {
+        "name": "failing_tool",
+        "description": "A tool that always fails with an error.",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "input": {"type": "string"}
+            },
+            "required": ["input"],
+        },
+    },
+}
+
+
+def _fake_tool_handler(tool_name: str, args: Dict[str, Any], **kwargs) -> str:
+    """Handle fake tool calls for testing."""
+    if tool_name == "calculate":
+        expr = args.get("expression", "0")
+        try:
+            # Safe eval for simple math
+            result = eval(expr, {"__builtins__": {}}, {})
+            return json.dumps({"result": result})
+        except Exception as e:
+            return json.dumps({"error": str(e)})
+
+    elif tool_name == "get_weather":
+        city = args.get("city", "Unknown")
+        # Return canned weather
+        return json.dumps({
+            "city": city,
+            "temperature": 22,
+            "conditions": "sunny",
+            "humidity": 45,
+        })
+
+    elif tool_name == "lookup":
+        query = args.get("query", "")
+        return json.dumps({"answer": f"The answer to '{query}' is 42."})
+
+    elif tool_name == "failing_tool":
+        raise RuntimeError("This tool always fails!")
+
+    return json.dumps({"error": f"Unknown tool: {tool_name}"})
+
+
+# =========================================================================
+# Tests
+# =========================================================================
+
+@pytest.mark.asyncio
+async def test_single_tool_call():
+    """Model should call a single tool, get the result, and respond."""
+
+    async def _run(server, model):
+        agent = HermesAgentLoop(
+            server=server,
+            tool_schemas=[WEATHER_TOOL],
+            valid_tool_names={"get_weather"},
+            max_turns=5,
+            temperature=0.0,
+            max_tokens=500,
+        )
+
+        messages = [
+            {"role": "user", "content": "What's the weather in Tokyo? Use the get_weather tool."},
+        ]
+
+        with patch("environments.agent_loop.handle_function_call", side_effect=_fake_tool_handler):
+            result = await agent.run(messages)
+
+        assert isinstance(result, AgentResult)
+        assert result.turns_used >= 2, f"Expected at least 2 turns (tool call + response), got {result.turns_used}"
+
+        # Verify a tool call happened
+        tool_calls_found = False
+        for msg in result.messages:
+            if msg.get("role") == "assistant" and msg.get("tool_calls"):
+                for tc in msg["tool_calls"]:
+                    if tc["function"]["name"] == "get_weather":
+                        tool_calls_found = True
+                        args = json.loads(tc["function"]["arguments"])
+                        assert "city" in args
+        assert tool_calls_found, "Model should have called get_weather"
+
+        # Verify tool result is in conversation
+        tool_results = [m for m in result.messages if m.get("role") == "tool"]
+        assert len(tool_results) >= 1, "Should have at least one tool result"
+
+        # Verify the final response references the weather
+        final_msg = result.messages[-1]
+        assert final_msg["role"] == "assistant"
+        assert final_msg["content"], "Final response should have content"
+
+        return result
+
+    await _try_models(_run)
+
+
+@pytest.mark.asyncio
+async def test_multi_tool_single_turn():
+    """Model should call multiple tools in a single turn."""
+
+    async def _run(server, model):
+        agent = HermesAgentLoop(
+            server=server,
+            tool_schemas=[WEATHER_TOOL, CALC_TOOL],
+            valid_tool_names={"get_weather", "calculate"},
+            max_turns=5,
+            temperature=0.0,
+            max_tokens=500,
+        )
+
+        messages = [
+            {"role": "user", "content": (
+                "I need two things at once: "
+                "1) What's the weather in Paris? Use get_weather. "
+                "2) What is 15 * 7? Use calculate. "
+                "Call BOTH tools in a single response."
+            )},
+        ]
+
+        with patch("environments.agent_loop.handle_function_call", side_effect=_fake_tool_handler):
+            result = await agent.run(messages)
+
+        # Count distinct tools called
+        tools_called = set()
+        for msg in result.messages:
+            if msg.get("role") == "assistant" and msg.get("tool_calls"):
+                for tc in msg["tool_calls"]:
+                    tools_called.add(tc["function"]["name"])
+
+        # At minimum, both tools should have been called (maybe in different turns)
+        assert "get_weather" in tools_called, f"get_weather not called. Called: {tools_called}"
+        assert "calculate" in tools_called, f"calculate not called. Called: {tools_called}"
+
+        return result
+
+    await _try_models(_run)
+
+
+@pytest.mark.asyncio
+async def test_multi_turn_conversation():
+    """Agent should handle multiple turns of tool calls."""
+
+    async def _run(server, model):
+        agent = HermesAgentLoop(
+            server=server,
+            tool_schemas=[LOOKUP_TOOL, CALC_TOOL],
+            valid_tool_names={"lookup", "calculate"},
+            max_turns=10,
+            temperature=0.0,
+            max_tokens=500,
+        )
+
+        messages = [
+            {"role": "user", "content": (
+                "First, use the lookup tool to look up 'meaning of life'. "
+                "Then use calculate to compute 6 * 7. "
+                "Do these in separate tool calls, one at a time."
+            )},
+        ]
+
+        with patch("environments.agent_loop.handle_function_call", side_effect=_fake_tool_handler):
+            result = await agent.run(messages)
+
+        # Should have used both tools
+        tools_called = set()
+        for msg in result.messages:
+            if msg.get("role") == "assistant" and msg.get("tool_calls"):
+                for tc in msg["tool_calls"]:
+                    tools_called.add(tc["function"]["name"])
+
+        assert "lookup" in tools_called, f"lookup not called. Called: {tools_called}"
+        assert "calculate" in tools_called, f"calculate not called. Called: {tools_called}"
+
+        # Should finish naturally
+        assert result.finished_naturally, "Should finish naturally after answering"
+
+        return result
+
+    await _try_models(_run)
+
+
+@pytest.mark.asyncio
+async def test_unknown_tool_rejected():
+    """If the model calls a tool not in valid_tool_names, it gets an error."""
+
+    async def _run(server, model):
+        # Only allow "calculate" but give schema for both
+        agent = HermesAgentLoop(
+            server=server,
+            tool_schemas=[CALC_TOOL, WEATHER_TOOL],
+            valid_tool_names={"calculate"},  # weather NOT allowed
+            max_turns=5,
+            temperature=0.0,
+            max_tokens=500,
+        )
+
+        messages = [
+            {"role": "user", "content": "What's the weather in London? Use get_weather."},
+        ]
+
+        with patch("environments.agent_loop.handle_function_call", side_effect=_fake_tool_handler):
+            result = await agent.run(messages)
+
+        # Check if get_weather was called and rejected
+        if result.tool_errors:
+            weather_errors = [e for e in result.tool_errors if e.tool_name == "get_weather"]
+            assert len(weather_errors) > 0, "get_weather should have been rejected"
+            assert "Unknown tool" in weather_errors[0].error
+
+        return result
+
+    await _try_models(_run)
+
+
+@pytest.mark.asyncio
+async def test_max_turns_limit():
+    """Agent should stop after max_turns even if model keeps calling tools."""
+
+    async def _run(server, model):
+        agent = HermesAgentLoop(
+            server=server,
+            tool_schemas=[LOOKUP_TOOL],
+            valid_tool_names={"lookup"},
+            max_turns=2,  # Very low limit
+            temperature=0.0,
+            max_tokens=500,
+        )
+
+        messages = [
+            {"role": "user", "content": (
+                "Keep looking up facts. Look up 'fact 1', then 'fact 2', "
+                "then 'fact 3', then 'fact 4'. Do them one at a time."
+            )},
+        ]
+
+        with patch("environments.agent_loop.handle_function_call", side_effect=_fake_tool_handler):
+            result = await agent.run(messages)
+
+        assert result.turns_used <= 2, f"Should stop at max_turns=2, used {result.turns_used}"
+        assert not result.finished_naturally, "Should NOT finish naturally (hit max_turns)"
+
+        return result
+
+    await _try_models(_run)
+
+
+@pytest.mark.asyncio
+async def test_no_tools_direct_response():
+    """When no tools are useful, model should respond directly."""
+
+    async def _run(server, model):
+        agent = HermesAgentLoop(
+            server=server,
+            tool_schemas=[WEATHER_TOOL],
+            valid_tool_names={"get_weather"},
+            max_turns=5,
+            temperature=0.0,
+            max_tokens=200,
+        )
+
+        messages = [
+            {"role": "user", "content": "What is 2 + 2? Just answer directly, no tools needed."},
+        ]
+
+        with patch("environments.agent_loop.handle_function_call", side_effect=_fake_tool_handler):
+            result = await agent.run(messages)
+
+        assert result.finished_naturally, "Should finish naturally with a direct response"
+        assert result.turns_used == 1, f"Should take exactly 1 turn for a direct answer, took {result.turns_used}"
+
+        final = result.messages[-1]
+        assert final["role"] == "assistant"
+        assert final["content"], "Should have text content"
+        assert "4" in final["content"], "Should contain the answer '4'"
+
+        return result
+
+    await _try_models(_run)
+
+
+@pytest.mark.asyncio
+async def test_tool_error_handling():
+    """Tool execution errors should be captured and reported to the model."""
+
+    async def _run(server, model):
+        agent = HermesAgentLoop(
+            server=server,
+            tool_schemas=[ERROR_TOOL],
+            valid_tool_names={"failing_tool"},
+            max_turns=5,
+            temperature=0.0,
+            max_tokens=500,
+        )
+
+        messages = [
+            {"role": "user", "content": "Please call the failing_tool with input 'test'."},
+        ]
+
+        with patch("environments.agent_loop.handle_function_call", side_effect=_fake_tool_handler):
+            result = await agent.run(messages)
+
+        # The tool error should be recorded
+        assert len(result.tool_errors) >= 1, "Should have at least one tool error"
+        assert "RuntimeError" in result.tool_errors[0].error or "always fails" in result.tool_errors[0].error
+
+        # The error should be in the conversation as a tool result
+        tool_results = [m for m in result.messages if m.get("role") == "tool"]
+        assert len(tool_results) >= 1
+        error_result = json.loads(tool_results[0]["content"])
+        assert "error" in error_result
+
+        return result
+
+    await _try_models(_run)
+
+
+@pytest.mark.asyncio
+async def test_agent_result_structure():
+    """Verify the AgentResult has all expected fields populated."""
+
+    async def _run(server, model):
+        agent = HermesAgentLoop(
+            server=server,
+            tool_schemas=[CALC_TOOL],
+            valid_tool_names={"calculate"},
+            max_turns=5,
+            temperature=0.0,
+            max_tokens=300,
+        )
+
+        messages = [
+            {"role": "user", "content": "What is 3 + 4? Use the calculate tool."},
+        ]
+
+        with patch("environments.agent_loop.handle_function_call", side_effect=_fake_tool_handler):
+            result = await agent.run(messages)
+
+        # Structural checks
+        assert isinstance(result, AgentResult)
+        assert isinstance(result.messages, list)
+        assert len(result.messages) >= 3, "Should have user + assistant(tool) + tool_result + assistant(final)"
+        assert isinstance(result.turns_used, int)
+        assert result.turns_used > 0
+        assert isinstance(result.finished_naturally, bool)
+        assert isinstance(result.tool_errors, list)
+        assert isinstance(result.reasoning_per_turn, list)
+
+        # Messages should follow OpenAI format
+        for msg in result.messages:
+            assert "role" in msg, f"Message missing 'role': {msg}"
+            assert msg["role"] in ("system", "user", "assistant", "tool"), f"Invalid role: {msg['role']}"
+
+        return result
+
+    await _try_models(_run)
+
+
+@pytest.mark.asyncio
+async def test_conversation_history_preserved():
+    """The full conversation history should be in result.messages."""
+
+    async def _run(server, model):
+        agent = HermesAgentLoop(
+            server=server,
+            tool_schemas=[WEATHER_TOOL],
+            valid_tool_names={"get_weather"},
+            max_turns=5,
+            temperature=0.0,
+            max_tokens=500,
+        )
+
+        messages = [
+            {"role": "system", "content": "You are a helpful weather assistant."},
+            {"role": "user", "content": "What's the weather in Berlin? Use get_weather."},
+        ]
+
+        with patch("environments.agent_loop.handle_function_call", side_effect=_fake_tool_handler):
+            result = await agent.run(messages)
+
+        # System message should be preserved
+        assert result.messages[0]["role"] == "system"
+        assert "weather assistant" in result.messages[0]["content"]
+
+        # User message should be preserved
+        assert result.messages[1]["role"] == "user"
+        assert "Berlin" in result.messages[1]["content"]
+
+        # Should have assistant + tool + assistant sequence
+        roles = [m["role"] for m in result.messages]
+        assert "tool" in roles, "Should have tool results in conversation"
+
+        return result
+
+    await _try_models(_run)
@@ -0,0 +1,359 @@
+"""Integration tests for HermesAgentLoop with a local vLLM server.
+
+Tests the full Phase 2 flow: ManagedServer + tool calling with a real
+vLLM backend, producing actual token IDs and logprobs for RL training.
+
+Requires a running vLLM server. Start one from the atropos directory:
+
+    python -m example_trainer.vllm_api_server \
+        --model Qwen/Qwen3-4B-Thinking-2507 \
+        --port 9001 \
+        --gpu-memory-utilization 0.8 \
+        --max-model-len=32000
+
+Tests are automatically skipped if the server is not reachable.
+
+Run:
+    pytest tests/test_agent_loop_vllm.py -v
+    pytest tests/test_agent_loop_vllm.py -v -k "single"
+"""
+
+import asyncio
+import json
+import os
+import sys
+from pathlib import Path
+from typing import Any, Dict
+from unittest.mock import patch
+
+import pytest
+import requests
+
+# Ensure repo root is importable
+_repo_root = Path(__file__).resolve().parent.parent
+if str(_repo_root) not in sys.path:
+    sys.path.insert(0, str(_repo_root))
+
+try:
+    from environments.agent_loop import AgentResult, HermesAgentLoop
+except ImportError:
+    pytest.skip("atroposlib not installed", allow_module_level=True)
+
+
+# =========================================================================
+# Configuration
+# =========================================================================
+
+VLLM_HOST = "localhost"
+VLLM_PORT = 9001
+VLLM_BASE_URL = f"http://{VLLM_HOST}:{VLLM_PORT}"
+VLLM_MODEL = "Qwen/Qwen3-4B-Thinking-2507"
+
+
+def _vllm_is_running() -> bool:
+    """Check if the vLLM server is reachable."""
+    try:
+        r = requests.get(f"{VLLM_BASE_URL}/health", timeout=3)
+        return r.status_code == 200
+    except Exception:
+        return False
+
+
+# Skip all tests in this module if vLLM is not running
+pytestmark = pytest.mark.skipif(
+    not _vllm_is_running(),
+    reason=(
+        f"vLLM server not reachable at {VLLM_BASE_URL}. "
+        "Start it with: python -m example_trainer.vllm_api_server "
+        f"--model {VLLM_MODEL} --port {VLLM_PORT} "
+        "--gpu-memory-utilization 0.8 --max-model-len=32000"
+    ),
+)
+
+
+# =========================================================================
+# Server setup
+# =========================================================================
+
+def _make_server_manager():
+    """Create a ServerManager pointing to the local vLLM server."""
+    from atroposlib.envs.server_handling.server_manager import (
+        ServerManager,
+        APIServerConfig,
+    )
+
+    config = APIServerConfig(
+        base_url=VLLM_BASE_URL,
+        model_name=VLLM_MODEL,
+        server_type="vllm",
+        health_check=False,
+    )
+    sm = ServerManager([config], tool_parser="hermes")
+    sm.servers[0].server_healthy = True
+    return sm
+
+
+def _get_tokenizer():
+    """Load the tokenizer for the model."""
+    from transformers import AutoTokenizer
+    return AutoTokenizer.from_pretrained(VLLM_MODEL)
+
+
+# =========================================================================
+# Fake tools
+# =========================================================================
+
+WEATHER_TOOL = {
+    "type": "function",
+    "function": {
+        "name": "get_weather",
+        "description": "Get the current weather for a city. Returns temperature and conditions.",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "city": {
+                    "type": "string",
+                    "description": "City name, e.g. 'Tokyo'",
+                }
+            },
+            "required": ["city"],
+        },
+    },
+}
+
+CALC_TOOL = {
+    "type": "function",
+    "function": {
+        "name": "calculate",
+        "description": "Calculate a math expression. Returns the numeric result.",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "expression": {
+                    "type": "string",
+                    "description": "Math expression, e.g. '2 + 3'",
+                }
+            },
+            "required": ["expression"],
+        },
+    },
+}
+
+
+def _fake_tool_handler(tool_name: str, args: Dict[str, Any], **kwargs) -> str:
+    """Handle fake tool calls for testing."""
+    if tool_name == "get_weather":
+        city = args.get("city", "Unknown")
+        return json.dumps({
+            "city": city,
+            "temperature": 22,
+            "conditions": "sunny",
+            "humidity": 45,
+        })
+    elif tool_name == "calculate":
+        expr = args.get("expression", "0")
+        try:
+            result = eval(expr, {"__builtins__": {}}, {})
+            return json.dumps({"result": result})
+        except Exception as e:
+            return json.dumps({"error": str(e)})
+    return json.dumps({"error": f"Unknown tool: {tool_name}"})
+
+
+# =========================================================================
+# Tests
+# =========================================================================
+
+@pytest.mark.asyncio
+async def test_vllm_single_tool_call():
+    """vLLM model calls a tool, gets result, responds — full Phase 2 flow."""
+    sm = _make_server_manager()
+    tokenizer = _get_tokenizer()
+
+    async with sm.managed_server(tokenizer=tokenizer) as managed:
+        agent = HermesAgentLoop(
+            server=managed,
+            tool_schemas=[WEATHER_TOOL],
+            valid_tool_names={"get_weather"},
+            max_turns=5,
+            temperature=0.6,
+            max_tokens=1000,
+        )
+
+        messages = [
+            {"role": "user", "content": "What's the weather in Tokyo? Use the get_weather tool."},
+        ]
+
+        with patch("environments.agent_loop.handle_function_call", side_effect=_fake_tool_handler):
+            result = await agent.run(messages)
+
+    assert isinstance(result, AgentResult)
+    assert result.turns_used >= 2, f"Expected at least 2 turns, got {result.turns_used}"
+
+    # Verify tool call happened
+    tool_calls_found = False
+    for msg in result.messages:
+        if msg.get("role") == "assistant" and msg.get("tool_calls"):
+            for tc in msg["tool_calls"]:
+                if tc["function"]["name"] == "get_weather":
+                    tool_calls_found = True
+                    args = json.loads(tc["function"]["arguments"])
+                    assert "city" in args
+    assert tool_calls_found, "Model should have called get_weather"
+
+    # Verify tool results in conversation
+    tool_results = [m for m in result.messages if m.get("role") == "tool"]
+    assert len(tool_results) >= 1
+
+
+@pytest.mark.asyncio
+async def test_vllm_multi_tool_calls():
+    """vLLM model calls multiple tools across turns."""
+    sm = _make_server_manager()
+    tokenizer = _get_tokenizer()
+
+    async with sm.managed_server(tokenizer=tokenizer) as managed:
+        agent = HermesAgentLoop(
+            server=managed,
+            tool_schemas=[WEATHER_TOOL, CALC_TOOL],
+            valid_tool_names={"get_weather", "calculate"},
+            max_turns=10,
+            temperature=0.6,
+            max_tokens=1000,
+        )
+
+        messages = [
+            {"role": "user", "content": (
+                "I need two things: "
+                "1) What's the weather in Paris? Use get_weather. "
+                "2) What is 15 * 7? Use calculate."
+            )},
+        ]
+
+        with patch("environments.agent_loop.handle_function_call", side_effect=_fake_tool_handler):
+            result = await agent.run(messages)
+
+    # Both tools should be called
+    tools_called = set()
+    for msg in result.messages:
+        if msg.get("role") == "assistant" and msg.get("tool_calls"):
+            for tc in msg["tool_calls"]:
+                tools_called.add(tc["function"]["name"])
+
+    assert "get_weather" in tools_called, f"get_weather not called. Called: {tools_called}"
+    assert "calculate" in tools_called, f"calculate not called. Called: {tools_called}"
+
+
+@pytest.mark.asyncio
+async def test_vllm_managed_server_produces_nodes():
+    """ManagedServer should produce SequenceNodes with tokens and logprobs."""
+    sm = _make_server_manager()
+    tokenizer = _get_tokenizer()
+
+    async with sm.managed_server(tokenizer=tokenizer) as managed:
+        agent = HermesAgentLoop(
+            server=managed,
+            tool_schemas=[WEATHER_TOOL],
+            valid_tool_names={"get_weather"},
+            max_turns=5,
+            temperature=0.6,
+            max_tokens=1000,
+        )
+
+        messages = [
+            {"role": "user", "content": "What's the weather in Berlin? Use get_weather."},
+        ]
+
+        with patch("environments.agent_loop.handle_function_call", side_effect=_fake_tool_handler):
+            result = await agent.run(messages)
+
+        # Get the managed state — should have SequenceNodes
+        state = managed.get_state()
+
+    assert state is not None, "ManagedServer should return state"
+    nodes = state.get("nodes", [])
+    assert len(nodes) >= 1, f"Should have at least 1 node, got {len(nodes)}"
+
+    node = nodes[0]
+    assert hasattr(node, "tokens"), "Node should have tokens"
+    assert hasattr(node, "logprobs"), "Node should have logprobs"
+    assert len(node.tokens) > 0, "Tokens should not be empty"
+    assert len(node.logprobs) > 0, "Logprobs should not be empty"
+    assert len(node.tokens) == len(node.logprobs), (
+        f"Tokens ({len(node.tokens)}) and logprobs ({len(node.logprobs)}) should have same length"
+    )
+
+
+@pytest.mark.asyncio
+async def test_vllm_no_tools_direct_response():
+    """vLLM model should respond directly when no tools are needed."""
+    sm = _make_server_manager()
+    tokenizer = _get_tokenizer()
+
+    async with sm.managed_server(tokenizer=tokenizer) as managed:
+        agent = HermesAgentLoop(
+            server=managed,
+            tool_schemas=[WEATHER_TOOL],
+            valid_tool_names={"get_weather"},
+            max_turns=5,
+            temperature=0.6,
+            max_tokens=500,
+        )
+
+        messages = [
+            {"role": "user", "content": "What is 2 + 2? Answer directly, no tools."},
+        ]
+
+        with patch("environments.agent_loop.handle_function_call", side_effect=_fake_tool_handler):
+            result = await agent.run(messages)
+
+    assert result.finished_naturally, "Should finish naturally"
+    assert result.turns_used == 1, f"Should take 1 turn, took {result.turns_used}"
+
+    final = result.messages[-1]
+    assert final["role"] == "assistant"
+    assert final["content"], "Should have content"
+
+
+@pytest.mark.asyncio
+async def test_vllm_thinking_content_extracted():
+    """Qwen3-Thinking model should produce reasoning content."""
+    sm = _make_server_manager()
+    tokenizer = _get_tokenizer()
+
+    async with sm.managed_server(
+        tokenizer=tokenizer,
+        preserve_think_blocks=True,
+    ) as managed:
+        agent = HermesAgentLoop(
+            server=managed,
+            tool_schemas=[CALC_TOOL],
+            valid_tool_names={"calculate"},
+            max_turns=5,
+            temperature=0.6,
+            max_tokens=1000,
+        )
+
+        messages = [
+            {"role": "user", "content": "What is 123 * 456? Use the calculate tool."},
+        ]
+
+        with patch("environments.agent_loop.handle_function_call", side_effect=_fake_tool_handler):
+            result = await agent.run(messages)
+
+    # Qwen3-Thinking should generate <think> blocks
+    # Check if any content contains thinking markers
+    has_thinking = False
+    for msg in result.messages:
+        content = msg.get("content", "") or ""
+        if "<think>" in content or "</think>" in content:
+            has_thinking = True
+            break
+
+    # Also check reasoning_per_turn
+    has_reasoning = any(r for r in result.reasoning_per_turn if r)
+
+    # At least one of these should be true for a thinking model
+    assert has_thinking or has_reasoning, (
+        "Qwen3-Thinking should produce <think> blocks or reasoning content"
+    )
@@ -1,292 +0,0 @@
-"""Tests for auxiliary model config bridging — verifies that config.yaml values
-are properly mapped to environment variables by both CLI and gateway loaders.
-
-Also tests the vision_tools and browser_tool model override env vars.
-"""
-
-import json
-import os
-import sys
-from pathlib import Path
-from unittest.mock import patch, MagicMock
-
-import pytest
-import yaml
-
-sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
-
-
-def _run_auxiliary_bridge(config_dict, monkeypatch):
-    """Simulate the auxiliary config → env var bridging logic shared by CLI and gateway.
-
-    This mirrors the code in cli.py load_cli_config() and gateway/run.py.
-    Both use the same pattern; we test it once here.
-    """
-    # Clear env vars
-    for key in (
-        "AUXILIARY_VISION_PROVIDER", "AUXILIARY_VISION_MODEL",
-        "AUXILIARY_WEB_EXTRACT_PROVIDER", "AUXILIARY_WEB_EXTRACT_MODEL",
-        "CONTEXT_COMPRESSION_PROVIDER", "CONTEXT_COMPRESSION_MODEL",
-    ):
-        monkeypatch.delenv(key, raising=False)
-
-    # Compression bridge
-    compression_cfg = config_dict.get("compression", {})
-    if compression_cfg and isinstance(compression_cfg, dict):
-        compression_env_map = {
-            "enabled": "CONTEXT_COMPRESSION_ENABLED",
-            "threshold": "CONTEXT_COMPRESSION_THRESHOLD",
-            "summary_model": "CONTEXT_COMPRESSION_MODEL",
-            "summary_provider": "CONTEXT_COMPRESSION_PROVIDER",
-        }
-        for cfg_key, env_var in compression_env_map.items():
-            if cfg_key in compression_cfg:
-                os.environ[env_var] = str(compression_cfg[cfg_key])
-
-    # Auxiliary bridge
-    auxiliary_cfg = config_dict.get("auxiliary", {})
-    if auxiliary_cfg and isinstance(auxiliary_cfg, dict):
-        aux_task_env = {
-            "vision":      ("AUXILIARY_VISION_PROVIDER",      "AUXILIARY_VISION_MODEL"),
-            "web_extract": ("AUXILIARY_WEB_EXTRACT_PROVIDER",  "AUXILIARY_WEB_EXTRACT_MODEL"),
-        }
-        for task_key, (prov_env, model_env) in aux_task_env.items():
-            task_cfg = auxiliary_cfg.get(task_key, {})
-            if not isinstance(task_cfg, dict):
-                continue
-            prov = str(task_cfg.get("provider", "")).strip()
-            model = str(task_cfg.get("model", "")).strip()
-            if prov and prov != "auto":
-                os.environ[prov_env] = prov
-            if model:
-                os.environ[model_env] = model
-
-
-# ── Config bridging tests ────────────────────────────────────────────────────
-
-
-class TestAuxiliaryConfigBridge:
-    """Verify the config.yaml → env var bridging logic used by CLI and gateway."""
-
-    def test_vision_provider_bridged(self, monkeypatch):
-        config = {
-            "auxiliary": {
-                "vision": {"provider": "openrouter", "model": ""},
-                "web_extract": {"provider": "auto", "model": ""},
-            }
-        }
-        _run_auxiliary_bridge(config, monkeypatch)
-        assert os.environ.get("AUXILIARY_VISION_PROVIDER") == "openrouter"
-        # auto should not be set
-        assert os.environ.get("AUXILIARY_WEB_EXTRACT_PROVIDER") is None
-
-    def test_vision_model_bridged(self, monkeypatch):
-        config = {
-            "auxiliary": {
-                "vision": {"provider": "auto", "model": "openai/gpt-4o"},
-            }
-        }
-        _run_auxiliary_bridge(config, monkeypatch)
-        assert os.environ.get("AUXILIARY_VISION_MODEL") == "openai/gpt-4o"
-        # auto provider should not be set
-        assert os.environ.get("AUXILIARY_VISION_PROVIDER") is None
-
-    def test_web_extract_bridged(self, monkeypatch):
-        config = {
-            "auxiliary": {
-                "web_extract": {"provider": "nous", "model": "gemini-2.5-flash"},
-            }
-        }
-        _run_auxiliary_bridge(config, monkeypatch)
-        assert os.environ.get("AUXILIARY_WEB_EXTRACT_PROVIDER") == "nous"
-        assert os.environ.get("AUXILIARY_WEB_EXTRACT_MODEL") == "gemini-2.5-flash"
-
-    def test_compression_provider_bridged(self, monkeypatch):
-        config = {
-            "compression": {
-                "summary_provider": "nous",
-                "summary_model": "gemini-3-flash",
-            }
-        }
-        _run_auxiliary_bridge(config, monkeypatch)
-        assert os.environ.get("CONTEXT_COMPRESSION_PROVIDER") == "nous"
-        assert os.environ.get("CONTEXT_COMPRESSION_MODEL") == "gemini-3-flash"
-
-    def test_empty_values_not_bridged(self, monkeypatch):
-        config = {
-            "auxiliary": {
-                "vision": {"provider": "auto", "model": ""},
-            }
-        }
-        _run_auxiliary_bridge(config, monkeypatch)
-        assert os.environ.get("AUXILIARY_VISION_PROVIDER") is None
-        assert os.environ.get("AUXILIARY_VISION_MODEL") is None
-
-    def test_missing_auxiliary_section_safe(self, monkeypatch):
-        """Config without auxiliary section should not crash."""
-        config = {"model": {"default": "test-model"}}
-        _run_auxiliary_bridge(config, monkeypatch)
-        assert os.environ.get("AUXILIARY_VISION_PROVIDER") is None
-
-    def test_non_dict_task_config_ignored(self, monkeypatch):
-        """Malformed task config (e.g. string instead of dict) is safely ignored."""
-        config = {
-            "auxiliary": {
-                "vision": "openrouter",  # should be a dict
-            }
-        }
-        _run_auxiliary_bridge(config, monkeypatch)
-        assert os.environ.get("AUXILIARY_VISION_PROVIDER") is None
-
-    def test_mixed_tasks(self, monkeypatch):
-        config = {
-            "auxiliary": {
-                "vision": {"provider": "openrouter", "model": ""},
-                "web_extract": {"provider": "auto", "model": "custom-llm"},
-            }
-        }
-        _run_auxiliary_bridge(config, monkeypatch)
-        assert os.environ.get("AUXILIARY_VISION_PROVIDER") == "openrouter"
-        assert os.environ.get("AUXILIARY_VISION_MODEL") is None
-        assert os.environ.get("AUXILIARY_WEB_EXTRACT_PROVIDER") is None
-        assert os.environ.get("AUXILIARY_WEB_EXTRACT_MODEL") == "custom-llm"
-
-    def test_all_tasks_with_overrides(self, monkeypatch):
-        config = {
-            "compression": {
-                "summary_provider": "main",
-                "summary_model": "local-model",
-            },
-            "auxiliary": {
-                "vision": {"provider": "openrouter", "model": "google/gemini-2.5-flash"},
-                "web_extract": {"provider": "nous", "model": "gemini-3-flash"},
-            }
-        }
-        _run_auxiliary_bridge(config, monkeypatch)
-        assert os.environ.get("CONTEXT_COMPRESSION_PROVIDER") == "main"
-        assert os.environ.get("CONTEXT_COMPRESSION_MODEL") == "local-model"
-        assert os.environ.get("AUXILIARY_VISION_PROVIDER") == "openrouter"
-        assert os.environ.get("AUXILIARY_VISION_MODEL") == "google/gemini-2.5-flash"
-        assert os.environ.get("AUXILIARY_WEB_EXTRACT_PROVIDER") == "nous"
-        assert os.environ.get("AUXILIARY_WEB_EXTRACT_MODEL") == "gemini-3-flash"
-
-    def test_whitespace_in_values_stripped(self, monkeypatch):
-        config = {
-            "auxiliary": {
-                "vision": {"provider": "  openrouter  ", "model": "  my-model  "},
-            }
-        }
-        _run_auxiliary_bridge(config, monkeypatch)
-        assert os.environ.get("AUXILIARY_VISION_PROVIDER") == "openrouter"
-        assert os.environ.get("AUXILIARY_VISION_MODEL") == "my-model"
-
-    def test_empty_auxiliary_dict_safe(self, monkeypatch):
-        config = {"auxiliary": {}}
-        _run_auxiliary_bridge(config, monkeypatch)
-        assert os.environ.get("AUXILIARY_VISION_PROVIDER") is None
-        assert os.environ.get("AUXILIARY_WEB_EXTRACT_PROVIDER") is None
-
-
-# ── Gateway bridge parity test ───────────────────────────────────────────────
-
-
-class TestGatewayBridgeCodeParity:
-    """Verify the gateway/run.py config bridge contains the auxiliary section."""
-
-    def test_gateway_has_auxiliary_bridge(self):
-        """The gateway config bridge must include auxiliary.* bridging."""
-        gateway_path = Path(__file__).parent.parent / "gateway" / "run.py"
-        content = gateway_path.read_text()
-        # Check for key patterns that indicate the bridge is present
-        assert "AUXILIARY_VISION_PROVIDER" in content
-        assert "AUXILIARY_VISION_MODEL" in content
-        assert "AUXILIARY_WEB_EXTRACT_PROVIDER" in content
-        assert "AUXILIARY_WEB_EXTRACT_MODEL" in content
-
-    def test_gateway_has_compression_provider(self):
-        """Gateway must bridge compression.summary_provider."""
-        gateway_path = Path(__file__).parent.parent / "gateway" / "run.py"
-        content = gateway_path.read_text()
-        assert "summary_provider" in content
-        assert "CONTEXT_COMPRESSION_PROVIDER" in content
-
-
-# ── Vision model override tests ──────────────────────────────────────────────
-
-
-class TestVisionModelOverride:
-    """Test that AUXILIARY_VISION_MODEL env var overrides the default model in the handler."""
-
-    def test_env_var_overrides_default(self, monkeypatch):
-        monkeypatch.setenv("AUXILIARY_VISION_MODEL", "openai/gpt-4o")
-        from tools.vision_tools import _handle_vision_analyze
-        with patch("tools.vision_tools.vision_analyze_tool", new_callable=MagicMock) as mock_tool:
-            mock_tool.return_value = '{"success": true}'
-            _handle_vision_analyze({"image_url": "http://test.jpg", "question": "test"})
-            call_args = mock_tool.call_args
-            # 3rd positional arg = model
-            assert call_args[0][2] == "openai/gpt-4o"
-
-    def test_default_model_when_no_override(self, monkeypatch):
-        monkeypatch.delenv("AUXILIARY_VISION_MODEL", raising=False)
-        from tools.vision_tools import _handle_vision_analyze, DEFAULT_VISION_MODEL
-        with patch("tools.vision_tools.vision_analyze_tool", new_callable=MagicMock) as mock_tool:
-            mock_tool.return_value = '{"success": true}'
-            _handle_vision_analyze({"image_url": "http://test.jpg", "question": "test"})
-            call_args = mock_tool.call_args
-            expected = DEFAULT_VISION_MODEL or "google/gemini-3-flash-preview"
-            assert call_args[0][2] == expected
-
-
-# ── DEFAULT_CONFIG shape tests ───────────────────────────────────────────────
-
-
-class TestDefaultConfigShape:
-    """Verify the DEFAULT_CONFIG in hermes_cli/config.py has correct auxiliary structure."""
-
-    def test_auxiliary_section_exists(self):
-        from hermes_cli.config import DEFAULT_CONFIG
-        assert "auxiliary" in DEFAULT_CONFIG
-
-    def test_vision_task_structure(self):
-        from hermes_cli.config import DEFAULT_CONFIG
-        vision = DEFAULT_CONFIG["auxiliary"]["vision"]
-        assert "provider" in vision
-        assert "model" in vision
-        assert vision["provider"] == "auto"
-        assert vision["model"] == ""
-
-    def test_web_extract_task_structure(self):
-        from hermes_cli.config import DEFAULT_CONFIG
-        web = DEFAULT_CONFIG["auxiliary"]["web_extract"]
-        assert "provider" in web
-        assert "model" in web
-        assert web["provider"] == "auto"
-        assert web["model"] == ""
-
-    def test_compression_provider_default(self):
-        from hermes_cli.config import DEFAULT_CONFIG
-        compression = DEFAULT_CONFIG["compression"]
-        assert "summary_provider" in compression
-        assert compression["summary_provider"] == "auto"
-
-
-# ── CLI defaults parity ─────────────────────────────────────────────────────
-
-
-class TestCLIDefaultsHaveAuxiliaryKeys:
-    """Verify cli.py load_cli_config() defaults dict does NOT include auxiliary
-    (it comes from config.yaml deep merge, not hardcoded defaults)."""
-
-    def test_cli_defaults_can_merge_auxiliary(self):
-        """The load_cli_config deep merge logic handles keys not in defaults.
-        Verify auxiliary would be picked up from config.yaml."""
-        # This is a structural assertion: cli.py's second-pass loop
-        # carries over keys from file_config that aren't in defaults.
-        # So auxiliary config from config.yaml gets merged even though
-        # cli.py's defaults dict doesn't define it.
-        import cli as _cli_mod
-        source = Path(_cli_mod.__file__).read_text()
-        assert "auxiliary_config = defaults.get(\"auxiliary\"" in source
-        assert "AUXILIARY_VISION_PROVIDER" in source
-        assert "AUXILIARY_VISION_MODEL" in source
@@ -162,124 +162,6 @@ def test_runtime_resolution_rebuilds_agent_on_routing_change(monkeypatch):
    assert shell.api_mode == "codex_responses"


-def test_codex_provider_replaces_incompatible_default_model(monkeypatch):
-    """When provider resolves to openai-codex and no model was explicitly
-    chosen, the global config default (e.g. anthropic/claude-opus-4.6) must
-    be replaced with a Codex-compatible model.  Fixes #651."""
-    cli = _import_cli()
-
-    monkeypatch.delenv("LLM_MODEL", raising=False)
-    monkeypatch.delenv("OPENAI_MODEL", raising=False)
-
-    def _runtime_resolve(**kwargs):
-        return {
-            "provider": "openai-codex",
-            "api_mode": "codex_responses",
-            "base_url": "https://chatgpt.com/backend-api/codex",
-            "api_key": "test-key",
-            "source": "env/config",
-        }
-
-    monkeypatch.setattr("hermes_cli.runtime_provider.resolve_runtime_provider", _runtime_resolve)
-    monkeypatch.setattr("hermes_cli.runtime_provider.format_runtime_provider_error", lambda exc: str(exc))
-    monkeypatch.setattr(
-        "hermes_cli.codex_models.get_codex_model_ids",
-        lambda access_token=None: ["gpt-5.2-codex", "gpt-5.1-codex-mini"],
-    )
-
-    shell = cli.HermesCLI(compact=True, max_turns=1)
-
-    assert shell._model_is_default is True
-    assert shell._ensure_runtime_credentials() is True
-    assert shell.provider == "openai-codex"
-    assert "anthropic" not in shell.model
-    assert "claude" not in shell.model
-    assert shell.model == "gpt-5.2-codex"
-
-
-def test_codex_provider_trusts_explicit_envvar_model(monkeypatch):
-    """When the user explicitly sets LLM_MODEL, we trust their choice and
-    let the API be the judge — even if it's a non-OpenAI model.  Only
-    provider prefixes are stripped; the bare model passes through."""
-    cli = _import_cli()
-
-    monkeypatch.setenv("LLM_MODEL", "claude-opus-4-6")
-    monkeypatch.delenv("OPENAI_MODEL", raising=False)
-
-    def _runtime_resolve(**kwargs):
-        return {
-            "provider": "openai-codex",
-            "api_mode": "codex_responses",
-            "base_url": "https://chatgpt.com/backend-api/codex",
-            "api_key": "test-key",
-            "source": "env/config",
-        }
-
-    monkeypatch.setattr("hermes_cli.runtime_provider.resolve_runtime_provider", _runtime_resolve)
-    monkeypatch.setattr("hermes_cli.runtime_provider.format_runtime_provider_error", lambda exc: str(exc))
-
-    shell = cli.HermesCLI(compact=True, max_turns=1)
-
-    assert shell._model_is_default is False
-    assert shell._ensure_runtime_credentials() is True
-    assert shell.provider == "openai-codex"
-    # User explicitly chose this model — it passes through untouched
-    assert shell.model == "claude-opus-4-6"
-
-
-def test_codex_provider_preserves_explicit_codex_model(monkeypatch):
-    """If the user explicitly passes a Codex-compatible model, it must be
-    preserved even when the provider resolves to openai-codex."""
-    cli = _import_cli()
-
-    monkeypatch.delenv("LLM_MODEL", raising=False)
-    monkeypatch.delenv("OPENAI_MODEL", raising=False)
-
-    def _runtime_resolve(**kwargs):
-        return {
-            "provider": "openai-codex",
-            "api_mode": "codex_responses",
-            "base_url": "https://chatgpt.com/backend-api/codex",
-            "api_key": "test-key",
-            "source": "env/config",
-        }
-
-    monkeypatch.setattr("hermes_cli.runtime_provider.resolve_runtime_provider", _runtime_resolve)
-    monkeypatch.setattr("hermes_cli.runtime_provider.format_runtime_provider_error", lambda exc: str(exc))
-
-    shell = cli.HermesCLI(model="gpt-5.1-codex-mini", compact=True, max_turns=1)
-
-    assert shell._model_is_default is False
-    assert shell._ensure_runtime_credentials() is True
-    assert shell.model == "gpt-5.1-codex-mini"
-
-
-def test_codex_provider_strips_provider_prefix_from_model(monkeypatch):
-    """openai/gpt-5.3-codex should become gpt-5.3-codex — the Codex
-    Responses API does not accept provider-prefixed model slugs."""
-    cli = _import_cli()
-
-    monkeypatch.delenv("LLM_MODEL", raising=False)
-    monkeypatch.delenv("OPENAI_MODEL", raising=False)
-
-    def _runtime_resolve(**kwargs):
-        return {
-            "provider": "openai-codex",
-            "api_mode": "codex_responses",
-            "base_url": "https://chatgpt.com/backend-api/codex",
-            "api_key": "test-key",
-            "source": "env/config",
-        }
-
-    monkeypatch.setattr("hermes_cli.runtime_provider.resolve_runtime_provider", _runtime_resolve)
-    monkeypatch.setattr("hermes_cli.runtime_provider.format_runtime_provider_error", lambda exc: str(exc))
-
-    shell = cli.HermesCLI(model="openai/gpt-5.3-codex", compact=True, max_turns=1)
-
-    assert shell._ensure_runtime_credentials() is True
-    assert shell.model == "gpt-5.3-codex"
-
-
 def test_cmd_model_falls_back_to_auto_on_invalid_provider(monkeypatch, capsys):
    monkeypatch.setattr(
        "hermes_cli.config.load_config",
@@ -149,7 +149,6 @@ def test_gateway_run_agent_codex_path_handles_internal_401_refresh(monkeypatch):
    runner._prefill_messages = []
    runner._reasoning_config = None
    runner._provider_routing = {}
-    runner._fallback_model = None
    runner._running_agents = {}
    from unittest.mock import MagicMock, AsyncMock
    runner.hooks = MagicMock()
@@ -1,9 +1,4 @@
 import json
-import os
-import sys
-from unittest.mock import patch
-
-sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))

 from hermes_cli.codex_models import DEFAULT_CODEX_MODELS, get_codex_model_ids

@@ -18,7 +13,7 @@ def test_get_codex_model_ids_prioritizes_default_and_cache(tmp_path, monkeypatch
                "models": [
                    {"slug": "gpt-5.3-codex", "priority": 20, "supported_in_api": True},
                    {"slug": "gpt-5.1-codex", "priority": 5, "supported_in_api": True},
-                    {"slug": "gpt-5.4", "priority": 1, "supported_in_api": True},
+                    {"slug": "gpt-4o", "priority": 1, "supported_in_api": True},
                    {"slug": "gpt-5-hidden-codex", "priority": 2, "visibility": "hidden"},
                ]
            }
@@ -31,19 +26,10 @@ def test_get_codex_model_ids_prioritizes_default_and_cache(tmp_path, monkeypatch
    assert models[0] == "gpt-5.2-codex"
    assert "gpt-5.1-codex" in models
    assert "gpt-5.3-codex" in models
-    # Non-codex-suffixed models are included when the cache says they're available
-    assert "gpt-5.4" in models
+    assert "gpt-4o" not in models
    assert "gpt-5-hidden-codex" not in models


-def test_setup_wizard_codex_import_resolves():
-    """Regression test for #712: setup.py must import the correct function name."""
-    # This mirrors the exact import used in hermes_cli/setup.py line 873.
-    # A prior bug had 'get_codex_models' (wrong) instead of 'get_codex_model_ids'.
-    from hermes_cli.codex_models import get_codex_model_ids as setup_import
-    assert callable(setup_import)
-
-
 def test_get_codex_model_ids_falls_back_to_curated_defaults(tmp_path, monkeypatch):
    codex_home = tmp_path / "codex-home"
    codex_home.mkdir(parents=True, exist_ok=True)
@@ -52,144 +38,3 @@ def test_get_codex_model_ids_falls_back_to_curated_defaults(tmp_path, monkeypatc
    models = get_codex_model_ids()

    assert models[: len(DEFAULT_CODEX_MODELS)] == DEFAULT_CODEX_MODELS
-
-
-# ── Tests for _normalize_model_for_provider ──────────────────────────
-
-
-def _make_cli(model="anthropic/claude-opus-4.6", **kwargs):
-    """Create a HermesCLI with minimal mocking."""
-    import cli as _cli_mod
-    from cli import HermesCLI
-
-    _clean_config = {
-        "model": {
-            "default": "anthropic/claude-opus-4.6",
-            "base_url": "https://openrouter.ai/api/v1",
-            "provider": "auto",
-        },
-        "display": {"compact": False, "tool_progress": "all", "resume_display": "full"},
-        "agent": {},
-        "terminal": {"env_type": "local"},
-    }
-    clean_env = {"LLM_MODEL": "", "HERMES_MAX_ITERATIONS": ""}
-    with (
-        patch("cli.get_tool_definitions", return_value=[]),
-        patch.dict("os.environ", clean_env, clear=False),
-        patch.dict(_cli_mod.__dict__, {"CLI_CONFIG": _clean_config}),
-    ):
-        cli = HermesCLI(model=model, **kwargs)
-    return cli
-
-
-class TestNormalizeModelForProvider:
-    """_normalize_model_for_provider() trusts user-selected models.
-
-    Only two things happen:
-    1. Provider prefixes are stripped (API needs bare slugs)
-    2. The *untouched default* model is swapped for a Codex model
-    Everything else passes through — the API is the judge.
-    """
-
-    def test_non_codex_provider_is_noop(self):
-        cli = _make_cli(model="gpt-5.4")
-        changed = cli._normalize_model_for_provider("openrouter")
-        assert changed is False
-        assert cli.model == "gpt-5.4"
-
-    def test_bare_codex_model_passes_through(self):
-        cli = _make_cli(model="gpt-5.3-codex")
-        changed = cli._normalize_model_for_provider("openai-codex")
-        assert changed is False
-        assert cli.model == "gpt-5.3-codex"
-
-    def test_bare_non_codex_model_passes_through(self):
-        """gpt-5.4 (no 'codex' suffix) passes through — user chose it."""
-        cli = _make_cli(model="gpt-5.4")
-        changed = cli._normalize_model_for_provider("openai-codex")
-        assert changed is False
-        assert cli.model == "gpt-5.4"
-
-    def test_any_bare_model_trusted(self):
-        """Even a non-OpenAI bare model passes through — user explicitly set it."""
-        cli = _make_cli(model="claude-opus-4-6")
-        changed = cli._normalize_model_for_provider("openai-codex")
-        # User explicitly chose this model — we trust them, API will error if wrong
-        assert changed is False
-        assert cli.model == "claude-opus-4-6"
-
-    def test_provider_prefix_stripped(self):
-        """openai/gpt-5.4 → gpt-5.4 (strip prefix, keep model)."""
-        cli = _make_cli(model="openai/gpt-5.4")
-        changed = cli._normalize_model_for_provider("openai-codex")
-        assert changed is True
-        assert cli.model == "gpt-5.4"
-
-    def test_any_provider_prefix_stripped(self):
-        """anthropic/claude-opus-4.6 → claude-opus-4.6 (strip prefix only).
-        User explicitly chose this — let the API decide if it works."""
-        cli = _make_cli(model="anthropic/claude-opus-4.6")
-        changed = cli._normalize_model_for_provider("openai-codex")
-        assert changed is True
-        assert cli.model == "claude-opus-4.6"
-
-    def test_default_model_replaced(self):
-        """The untouched default (anthropic/claude-opus-4.6) gets swapped."""
-        import cli as _cli_mod
-        _clean_config = {
-            "model": {
-                "default": "anthropic/claude-opus-4.6",
-                "base_url": "https://openrouter.ai/api/v1",
-                "provider": "auto",
-            },
-            "display": {"compact": False, "tool_progress": "all", "resume_display": "full"},
-            "agent": {},
-            "terminal": {"env_type": "local"},
-        }
-        # Don't pass model= so _model_is_default is True
-        with (
-            patch("cli.get_tool_definitions", return_value=[]),
-            patch.dict("os.environ", {"LLM_MODEL": "", "HERMES_MAX_ITERATIONS": ""}, clear=False),
-            patch.dict(_cli_mod.__dict__, {"CLI_CONFIG": _clean_config}),
-        ):
-            from cli import HermesCLI
-            cli = HermesCLI()
-
-        assert cli._model_is_default is True
-        with patch(
-            "hermes_cli.codex_models.get_codex_model_ids",
-            return_value=["gpt-5.3-codex", "gpt-5.4"],
-        ):
-            changed = cli._normalize_model_for_provider("openai-codex")
-        assert changed is True
-        # Uses first from available list
-        assert cli.model == "gpt-5.3-codex"
-
-    def test_default_fallback_when_api_fails(self):
-        """Default model falls back to gpt-5.3-codex when API unreachable."""
-        import cli as _cli_mod
-        _clean_config = {
-            "model": {
-                "default": "anthropic/claude-opus-4.6",
-                "base_url": "https://openrouter.ai/api/v1",
-                "provider": "auto",
-            },
-            "display": {"compact": False, "tool_progress": "all", "resume_display": "full"},
-            "agent": {},
-            "terminal": {"env_type": "local"},
-        }
-        with (
-            patch("cli.get_tool_definitions", return_value=[]),
-            patch.dict("os.environ", {"LLM_MODEL": "", "HERMES_MAX_ITERATIONS": ""}, clear=False),
-            patch.dict(_cli_mod.__dict__, {"CLI_CONFIG": _clean_config}),
-        ):
-            from cli import HermesCLI
-            cli = HermesCLI()
-
-        with patch(
-            "hermes_cli.codex_models.get_codex_model_ids",
-            side_effect=Exception("offline"),
-        ):
-            changed = cli._normalize_model_for_provider("openai-codex")
-        assert changed is True
-        assert cli.model == "gpt-5.3-codex"
@@ -1,339 +0,0 @@
-"""Tests for the provider fallback model feature.
-
-Verifies that AIAgent can switch to a configured fallback model/provider
-when the primary fails after retries.
-"""
-
-import os
-from types import SimpleNamespace
-from unittest.mock import MagicMock, patch
-
-import pytest
-
-from run_agent import AIAgent
-
-
-def _make_tool_defs(*names: str) -> list:
-    return [
-        {
-            "type": "function",
-            "function": {
-                "name": n,
-                "description": f"{n} tool",
-                "parameters": {"type": "object", "properties": {}},
-            },
-        }
-        for n in names
-    ]
-
-
-def _make_agent(fallback_model=None):
-    """Create a minimal AIAgent with optional fallback config."""
-    with (
-        patch("run_agent.get_tool_definitions", return_value=_make_tool_defs("web_search")),
-        patch("run_agent.check_toolset_requirements", return_value={}),
-        patch("run_agent.OpenAI"),
-    ):
-        agent = AIAgent(
-            api_key="test-key-primary",
-            quiet_mode=True,
-            skip_context_files=True,
-            skip_memory=True,
-            fallback_model=fallback_model,
-        )
-        agent.client = MagicMock()
-        return agent
-
-
-# =============================================================================
-# _try_activate_fallback()
-# =============================================================================
-
-class TestTryActivateFallback:
-    def test_returns_false_when_not_configured(self):
-        agent = _make_agent(fallback_model=None)
-        assert agent._try_activate_fallback() is False
-        assert agent._fallback_activated is False
-
-    def test_returns_false_for_empty_config(self):
-        agent = _make_agent(fallback_model={"provider": "", "model": ""})
-        assert agent._try_activate_fallback() is False
-
-    def test_returns_false_for_missing_provider(self):
-        agent = _make_agent(fallback_model={"model": "gpt-4.1"})
-        assert agent._try_activate_fallback() is False
-
-    def test_returns_false_for_missing_model(self):
-        agent = _make_agent(fallback_model={"provider": "openrouter"})
-        assert agent._try_activate_fallback() is False
-
-    def test_activates_openrouter_fallback(self):
-        agent = _make_agent(
-            fallback_model={"provider": "openrouter", "model": "anthropic/claude-sonnet-4"},
-        )
-        with (
-            patch.dict("os.environ", {"OPENROUTER_API_KEY": "sk-or-fallback-key"}),
-            patch("run_agent.OpenAI") as mock_openai,
-        ):
-            result = agent._try_activate_fallback()
-            assert result is True
-            assert agent._fallback_activated is True
-            assert agent.model == "anthropic/claude-sonnet-4"
-            assert agent.provider == "openrouter"
-            assert agent.api_mode == "chat_completions"
-            mock_openai.assert_called_once()
-            call_kwargs = mock_openai.call_args[1]
-            assert call_kwargs["api_key"] == "sk-or-fallback-key"
-            assert "openrouter" in call_kwargs["base_url"].lower()
-            # OpenRouter should get attribution headers
-            assert "default_headers" in call_kwargs
-
-    def test_activates_zai_fallback(self):
-        agent = _make_agent(
-            fallback_model={"provider": "zai", "model": "glm-5"},
-        )
-        with (
-            patch.dict("os.environ", {"ZAI_API_KEY": "sk-zai-key"}),
-            patch("run_agent.OpenAI") as mock_openai,
-        ):
-            result = agent._try_activate_fallback()
-            assert result is True
-            assert agent.model == "glm-5"
-            assert agent.provider == "zai"
-            call_kwargs = mock_openai.call_args[1]
-            assert call_kwargs["api_key"] == "sk-zai-key"
-            assert "z.ai" in call_kwargs["base_url"].lower()
-
-    def test_activates_kimi_fallback(self):
-        agent = _make_agent(
-            fallback_model={"provider": "kimi-coding", "model": "kimi-k2.5"},
-        )
-        with (
-            patch.dict("os.environ", {"KIMI_API_KEY": "sk-kimi-key"}),
-            patch("run_agent.OpenAI"),
-        ):
-            assert agent._try_activate_fallback() is True
-            assert agent.model == "kimi-k2.5"
-            assert agent.provider == "kimi-coding"
-
-    def test_activates_minimax_fallback(self):
-        agent = _make_agent(
-            fallback_model={"provider": "minimax", "model": "MiniMax-M2.5"},
-        )
-        with (
-            patch.dict("os.environ", {"MINIMAX_API_KEY": "sk-mm-key"}),
-            patch("run_agent.OpenAI") as mock_openai,
-        ):
-            assert agent._try_activate_fallback() is True
-            assert agent.model == "MiniMax-M2.5"
-            assert agent.provider == "minimax"
-            call_kwargs = mock_openai.call_args[1]
-            assert "minimax.io" in call_kwargs["base_url"]
-
-    def test_only_fires_once(self):
-        agent = _make_agent(
-            fallback_model={"provider": "openrouter", "model": "anthropic/claude-sonnet-4"},
-        )
-        with (
-            patch.dict("os.environ", {"OPENROUTER_API_KEY": "sk-or-key"}),
-            patch("run_agent.OpenAI"),
-        ):
-            assert agent._try_activate_fallback() is True
-            # Second attempt should return False
-            assert agent._try_activate_fallback() is False
-
-    def test_returns_false_when_no_api_key(self):
-        """Fallback should fail gracefully when the API key env var is unset."""
-        agent = _make_agent(
-            fallback_model={"provider": "minimax", "model": "MiniMax-M2.5"},
-        )
-        # Ensure MINIMAX_API_KEY is not in the environment
-        env = {k: v for k, v in os.environ.items() if k != "MINIMAX_API_KEY"}
-        with patch.dict("os.environ", env, clear=True):
-            assert agent._try_activate_fallback() is False
-            assert agent._fallback_activated is False
-
-    def test_custom_base_url(self):
-        """Custom base_url in config should override the provider default."""
-        agent = _make_agent(
-            fallback_model={
-                "provider": "custom",
-                "model": "my-model",
-                "base_url": "http://localhost:8080/v1",
-                "api_key_env": "MY_CUSTOM_KEY",
-            },
-        )
-        with (
-            patch.dict("os.environ", {"MY_CUSTOM_KEY": "custom-secret"}),
-            patch("run_agent.OpenAI") as mock_openai,
-        ):
-            assert agent._try_activate_fallback() is True
-            call_kwargs = mock_openai.call_args[1]
-            assert call_kwargs["base_url"] == "http://localhost:8080/v1"
-            assert call_kwargs["api_key"] == "custom-secret"
-
-    def test_prompt_caching_enabled_for_claude_on_openrouter(self):
-        agent = _make_agent(
-            fallback_model={"provider": "openrouter", "model": "anthropic/claude-sonnet-4"},
-        )
-        with (
-            patch.dict("os.environ", {"OPENROUTER_API_KEY": "sk-or-key"}),
-            patch("run_agent.OpenAI"),
-        ):
-            agent._try_activate_fallback()
-            assert agent._use_prompt_caching is True
-
-    def test_prompt_caching_disabled_for_non_claude(self):
-        agent = _make_agent(
-            fallback_model={"provider": "openrouter", "model": "google/gemini-2.5-flash"},
-        )
-        with (
-            patch.dict("os.environ", {"OPENROUTER_API_KEY": "sk-or-key"}),
-            patch("run_agent.OpenAI"),
-        ):
-            agent._try_activate_fallback()
-            assert agent._use_prompt_caching is False
-
-    def test_prompt_caching_disabled_for_non_openrouter(self):
-        agent = _make_agent(
-            fallback_model={"provider": "zai", "model": "glm-5"},
-        )
-        with (
-            patch.dict("os.environ", {"ZAI_API_KEY": "sk-zai-key"}),
-            patch("run_agent.OpenAI"),
-        ):
-            agent._try_activate_fallback()
-            assert agent._use_prompt_caching is False
-
-    def test_zai_alt_env_var(self):
-        """Z.AI should also check Z_AI_API_KEY as fallback env var."""
-        agent = _make_agent(
-            fallback_model={"provider": "zai", "model": "glm-5"},
-        )
-        with (
-            patch.dict("os.environ", {"Z_AI_API_KEY": "sk-alt-key"}),
-            patch("run_agent.OpenAI") as mock_openai,
-        ):
-            assert agent._try_activate_fallback() is True
-            call_kwargs = mock_openai.call_args[1]
-            assert call_kwargs["api_key"] == "sk-alt-key"
-
-    def test_activates_codex_fallback(self):
-        """OpenAI Codex fallback should use OAuth credentials and codex_responses mode."""
-        agent = _make_agent(
-            fallback_model={"provider": "openai-codex", "model": "gpt-5.3-codex"},
-        )
-        mock_creds = {
-            "api_key": "codex-oauth-token",
-            "base_url": "https://chatgpt.com/backend-api/codex",
-        }
-        with (
-            patch("hermes_cli.auth.resolve_codex_runtime_credentials", return_value=mock_creds),
-            patch("run_agent.OpenAI") as mock_openai,
-        ):
-            result = agent._try_activate_fallback()
-            assert result is True
-            assert agent.model == "gpt-5.3-codex"
-            assert agent.provider == "openai-codex"
-            assert agent.api_mode == "codex_responses"
-            call_kwargs = mock_openai.call_args[1]
-            assert call_kwargs["api_key"] == "codex-oauth-token"
-            assert "chatgpt.com" in call_kwargs["base_url"]
-
-    def test_codex_fallback_fails_gracefully_without_credentials(self):
-        """Codex fallback should return False if no OAuth credentials available."""
-        agent = _make_agent(
-            fallback_model={"provider": "openai-codex", "model": "gpt-5.3-codex"},
-        )
-        with patch(
-            "hermes_cli.auth.resolve_codex_runtime_credentials",
-            side_effect=Exception("No Codex credentials"),
-        ):
-            assert agent._try_activate_fallback() is False
-            assert agent._fallback_activated is False
-
-    def test_activates_nous_fallback(self):
-        """Nous Portal fallback should use OAuth credentials and chat_completions mode."""
-        agent = _make_agent(
-            fallback_model={"provider": "nous", "model": "nous-hermes-3"},
-        )
-        mock_creds = {
-            "api_key": "nous-agent-key-abc",
-            "base_url": "https://inference-api.nousresearch.com/v1",
-        }
-        with (
-            patch("hermes_cli.auth.resolve_nous_runtime_credentials", return_value=mock_creds),
-            patch("run_agent.OpenAI") as mock_openai,
-        ):
-            result = agent._try_activate_fallback()
-            assert result is True
-            assert agent.model == "nous-hermes-3"
-            assert agent.provider == "nous"
-            assert agent.api_mode == "chat_completions"
-            call_kwargs = mock_openai.call_args[1]
-            assert call_kwargs["api_key"] == "nous-agent-key-abc"
-            assert "nousresearch.com" in call_kwargs["base_url"]
-
-    def test_nous_fallback_fails_gracefully_without_login(self):
-        """Nous fallback should return False if not logged in."""
-        agent = _make_agent(
-            fallback_model={"provider": "nous", "model": "nous-hermes-3"},
-        )
-        with patch(
-            "hermes_cli.auth.resolve_nous_runtime_credentials",
-            side_effect=Exception("Not logged in to Nous Portal"),
-        ):
-            assert agent._try_activate_fallback() is False
-            assert agent._fallback_activated is False
-
-
-# =============================================================================
-# Fallback config init
-# =============================================================================
-
-class TestFallbackInit:
-    def test_fallback_stored_when_configured(self):
-        agent = _make_agent(
-            fallback_model={"provider": "openrouter", "model": "anthropic/claude-sonnet-4"},
-        )
-        assert agent._fallback_model is not None
-        assert agent._fallback_model["provider"] == "openrouter"
-        assert agent._fallback_activated is False
-
-    def test_fallback_none_when_not_configured(self):
-        agent = _make_agent(fallback_model=None)
-        assert agent._fallback_model is None
-        assert agent._fallback_activated is False
-
-    def test_fallback_none_for_non_dict(self):
-        agent = _make_agent(fallback_model="not-a-dict")
-        assert agent._fallback_model is None
-
-
-# =============================================================================
-# Provider credential resolution
-# =============================================================================
-
-class TestProviderCredentials:
-    """Verify that each supported provider resolves its API key correctly."""
-
-    @pytest.mark.parametrize("provider,env_var,base_url_fragment", [
-        ("openrouter", "OPENROUTER_API_KEY", "openrouter"),
-        ("zai", "ZAI_API_KEY", "z.ai"),
-        ("kimi-coding", "KIMI_API_KEY", "moonshot.ai"),
-        ("minimax", "MINIMAX_API_KEY", "minimax.io"),
-        ("minimax-cn", "MINIMAX_CN_API_KEY", "minimaxi.com"),
-    ])
-    def test_provider_resolves(self, provider, env_var, base_url_fragment):
-        agent = _make_agent(
-            fallback_model={"provider": provider, "model": "test-model"},
-        )
-        with (
-            patch.dict("os.environ", {env_var: "test-key-123"}),
-            patch("run_agent.OpenAI") as mock_openai,
-        ):
-            result = agent._try_activate_fallback()
-            assert result is True, f"Failed to activate fallback for {provider}"
-            call_kwargs = mock_openai.call_args[1]
-            assert call_kwargs["api_key"] == "test-key-123"
-            assert base_url_fragment in call_kwargs["base_url"].lower()
@@ -0,0 +1,178 @@
+"""
+Tests for ManagedServer tool_call_parser integration.
+
+Validates that:
+1. ManagedServer accepts tool_call_parser parameter (tool_call_support branch)
+2. ServerManager.managed_server() passes tool_call_parser through
+3. The parser's parse() output is correctly attached to ChatCompletion responses
+4. hermes-agent's tool_call_parsers are compatible with ManagedServer's expectations
+
+These tests verify the contract between hermes-agent's environments/ code
+and atroposlib's ManagedServer. They detect API incompatibilities early.
+"""
+
+import inspect
+import sys
+from pathlib import Path
+
+import pytest
+
+sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
+
+try:
+    import atroposlib  # noqa: F401
+except ImportError:
+    pytest.skip("atroposlib not installed", allow_module_level=True)
+
+
+class TestManagedServerAPI:
+    """Test that ManagedServer's API matches what hermes-agent expects."""
+
+    def test_managed_server_init_signature(self):
+        """ManagedServer should accept tool_call_parser parameter."""
+        from atroposlib.envs.server_handling.managed_server import ManagedServer
+
+        sig = inspect.signature(ManagedServer.__init__)
+        params = list(sig.parameters.keys())
+
+        # Core params that must exist
+        assert "self" in params
+        assert "server" in params
+        assert "tokenizer" in params
+        assert "track_tree" in params
+
+        # tool_call_parser — required for tool_call_support branch
+        # If this fails, atroposlib hasn't been updated to tool_call_support
+        has_tool_parser = "tool_call_parser" in params
+        if not has_tool_parser:
+            pytest.skip(
+                "ManagedServer does not have tool_call_parser param — "
+                "baseline atroposlib (pre tool_call_support branch)"
+            )
+
+    def test_server_manager_managed_server_signature(self):
+        """ServerManager.managed_server() should accept tool_call_parser."""
+        from atroposlib.envs.server_handling.server_manager import ServerManager
+
+        sig = inspect.signature(ServerManager.managed_server)
+        params = list(sig.parameters.keys())
+
+        assert "self" in params
+        assert "tokenizer" in params
+
+        has_tool_parser = "tool_call_parser" in params
+        if not has_tool_parser:
+            pytest.skip(
+                "ServerManager.managed_server() does not have tool_call_parser param — "
+                "baseline atroposlib (pre tool_call_support branch)"
+            )
+
+    def test_managed_server_chat_template_kwargs(self):
+        """ManagedServer should have CHAT_TEMPLATE_KWARGS for forwarding tools/thinking."""
+        from atroposlib.envs.server_handling.managed_server import ManagedServer
+
+        if not hasattr(ManagedServer, "CHAT_TEMPLATE_KWARGS"):
+            pytest.skip(
+                "ManagedServer does not have CHAT_TEMPLATE_KWARGS — "
+                "baseline atroposlib (pre tool_call_support branch)"
+            )
+
+        kwargs = ManagedServer.CHAT_TEMPLATE_KWARGS
+        assert "tools" in kwargs, "tools must be in CHAT_TEMPLATE_KWARGS"
+
+    def test_no_get_logprobs_method(self):
+        """get_logprobs should be removed in tool_call_support branch."""
+        from atroposlib.envs.server_handling.managed_server import ManagedServer
+
+        # In baseline, get_logprobs exists. In tool_call_support, it's removed.
+        # We just note the state — not a hard fail either way.
+        has_get_logprobs = hasattr(ManagedServer, "get_logprobs")
+        if has_get_logprobs:
+            pytest.skip(
+                "ManagedServer still has get_logprobs — baseline atroposlib"
+            )
+
+
+class TestParserCompatibility:
+    """Test that hermes-agent's parsers match ManagedServer's expectations."""
+
+    def test_parser_parse_returns_correct_format(self):
+        """
+        ManagedServer expects parser.parse(text) -> (content, tool_calls)
+        where tool_calls is a list of objects with .id, .function.name, .function.arguments
+        """
+        from environments.tool_call_parsers import get_parser
+
+        parser = get_parser("hermes")
+        text = '<tool_call>{"name": "terminal", "arguments": {"command": "ls"}}</tool_call>'
+        content, tool_calls = parser.parse(text)
+
+        assert tool_calls is not None
+        assert len(tool_calls) == 1
+
+        tc = tool_calls[0]
+        # ManagedServer accesses these attrs directly
+        assert hasattr(tc, "id")
+        assert hasattr(tc, "function")
+        assert hasattr(tc.function, "name")
+        assert hasattr(tc.function, "arguments")
+
+    def test_parser_no_tools_returns_none(self):
+        """ManagedServer checks `if parsed_tool_calls:` — None should be falsy."""
+        from environments.tool_call_parsers import get_parser
+
+        parser = get_parser("hermes")
+        content, tool_calls = parser.parse("Just text, no tools")
+        assert tool_calls is None
+
+    def test_parser_content_is_string_or_none(self):
+        """ManagedServer uses `parsed_content or ""` — must be str or None."""
+        from environments.tool_call_parsers import get_parser
+
+        parser = get_parser("hermes")
+
+        # With tool calls
+        text = '<tool_call>{"name": "terminal", "arguments": {"command": "ls"}}</tool_call>'
+        content, _ = parser.parse(text)
+        assert content is None or isinstance(content, str)
+
+        # Without tool calls
+        content2, _ = parser.parse("Just text")
+        assert isinstance(content2, str)
+
+
+class TestBaseEnvCompatibility:
+    """Test that hermes_base_env.py's managed_server() call matches the API."""
+
+    def test_hermes_base_env_managed_server_call_pattern(self):
+        """
+        Verify that hermes_base_env.py passes tool_call_parser to managed_server().
+        This is a source-level check — the actual managed_server() call must match.
+        """
+        import ast
+
+        base_env_path = Path(__file__).parent.parent / "environments" / "hermes_base_env.py"
+        source = base_env_path.read_text()
+        tree = ast.parse(source)
+
+        # Find the managed_server() call
+        found_tool_call_parser_kwarg = False
+        for node in ast.walk(tree):
+            if isinstance(node, ast.Call):
+                # Look for self.server.managed_server(...)
+                if isinstance(node.func, ast.Attribute) and node.func.attr == "managed_server":
+                    for kw in node.keywords:
+                        if kw.arg == "tool_call_parser":
+                            found_tool_call_parser_kwarg = True
+
+        assert found_tool_call_parser_kwarg, (
+            "hermes_base_env.py should pass tool_call_parser= to managed_server()"
+        )
+
+    def test_hermes_base_env_uses_get_parser(self):
+        """Verify hermes_base_env imports and uses get_parser from tool_call_parsers."""
+        base_env_path = Path(__file__).parent.parent / "environments" / "hermes_base_env.py"
+        source = base_env_path.read_text()
+
+        assert "from environments.tool_call_parsers import get_parser" in source
+        assert "get_parser(" in source
@@ -1,488 +0,0 @@
-"""Tests for session resume history display — _display_resumed_history() and
-_preload_resumed_session().
-
-Verifies that resuming a session shows a compact recap of the previous
-conversation with correct formatting, truncation, and config behavior.
-"""
-
-import os
-import sys
-from io import StringIO
-from unittest.mock import MagicMock, patch
-
-import pytest
-
-sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
-
-
-def _make_cli(config_overrides=None, env_overrides=None, **kwargs):
-    """Create a HermesCLI instance with minimal mocking."""
-    import cli as _cli_mod
-    from cli import HermesCLI
-
-    _clean_config = {
-        "model": {
-            "default": "anthropic/claude-opus-4.6",
-            "base_url": "https://openrouter.ai/api/v1",
-            "provider": "auto",
-        },
-        "display": {"compact": False, "tool_progress": "all", "resume_display": "full"},
-        "agent": {},
-        "terminal": {"env_type": "local"},
-    }
-    if config_overrides:
-        for k, v in config_overrides.items():
-            if isinstance(v, dict) and k in _clean_config and isinstance(_clean_config[k], dict):
-                _clean_config[k].update(v)
-            else:
-                _clean_config[k] = v
-
-    clean_env = {"LLM_MODEL": "", "HERMES_MAX_ITERATIONS": ""}
-    if env_overrides:
-        clean_env.update(env_overrides)
-    with (
-        patch("cli.get_tool_definitions", return_value=[]),
-        patch.dict("os.environ", clean_env, clear=False),
-        patch.dict(_cli_mod.__dict__, {"CLI_CONFIG": _clean_config}),
-    ):
-        return HermesCLI(**kwargs)
-
-
-# ── Sample conversation histories for tests ──────────────────────────
-
-
-def _simple_history():
-    """Two-turn conversation: user → assistant → user → assistant."""
-    return [
-        {"role": "system", "content": "You are a helpful assistant."},
-        {"role": "user", "content": "What is Python?"},
-        {"role": "assistant", "content": "Python is a high-level programming language."},
-        {"role": "user", "content": "How do I install it?"},
-        {"role": "assistant", "content": "You can install Python from python.org."},
-    ]
-
-
-def _tool_call_history():
-    """Conversation with tool calls and tool results."""
-    return [
-        {"role": "system", "content": "system prompt"},
-        {"role": "user", "content": "Search for Python tutorials"},
-        {
-            "role": "assistant",
-            "content": None,
-            "tool_calls": [
-                {
-                    "id": "call_1",
-                    "type": "function",
-                    "function": {"name": "web_search", "arguments": '{"query":"python tutorials"}'},
-                },
-                {
-                    "id": "call_2",
-                    "type": "function",
-                    "function": {"name": "web_extract", "arguments": '{"urls":["https://example.com"]}'},
-                },
-            ],
-        },
-        {"role": "tool", "tool_call_id": "call_1", "content": "Found 5 results..."},
-        {"role": "tool", "tool_call_id": "call_2", "content": "Page content..."},
-        {"role": "assistant", "content": "Here are some great Python tutorials I found."},
-    ]
-
-
-def _large_history(n_exchanges=15):
-    """Build a history with many exchanges to test truncation."""
-    msgs = [{"role": "system", "content": "system prompt"}]
-    for i in range(n_exchanges):
-        msgs.append({"role": "user", "content": f"Question #{i + 1}: What is item {i + 1}?"})
-        msgs.append({"role": "assistant", "content": f"Answer #{i + 1}: Item {i + 1} is great."})
-    return msgs
-
-
-def _multimodal_history():
-    """Conversation with multimodal (image) content."""
-    return [
-        {"role": "system", "content": "system prompt"},
-        {
-            "role": "user",
-            "content": [
-                {"type": "text", "text": "What's in this image?"},
-                {"type": "image_url", "image_url": {"url": "https://example.com/cat.jpg"}},
-            ],
-        },
-        {"role": "assistant", "content": "I see a cat in the image."},
-    ]
-
-
-# ── Tests for _display_resumed_history ───────────────────────────────
-
-
-class TestDisplayResumedHistory:
-    """_display_resumed_history() renders a Rich panel with conversation recap."""
-
-    def _capture_display(self, cli_obj):
-        """Run _display_resumed_history and capture the Rich console output."""
-        buf = StringIO()
-        cli_obj.console.file = buf
-        cli_obj._display_resumed_history()
-        return buf.getvalue()
-
-    def test_simple_history_shows_user_and_assistant(self):
-        cli = _make_cli()
-        cli.conversation_history = _simple_history()
-        output = self._capture_display(cli)
-
-        assert "You:" in output
-        assert "Hermes:" in output
-        assert "What is Python?" in output
-        assert "Python is a high-level programming language." in output
-        assert "How do I install it?" in output
-
-    def test_system_messages_hidden(self):
-        cli = _make_cli()
-        cli.conversation_history = _simple_history()
-        output = self._capture_display(cli)
-
-        assert "You are a helpful assistant" not in output
-
-    def test_tool_messages_hidden(self):
-        cli = _make_cli()
-        cli.conversation_history = _tool_call_history()
-        output = self._capture_display(cli)
-
-        # Tool result content should NOT appear
-        assert "Found 5 results" not in output
-        assert "Page content" not in output
-
-    def test_tool_calls_shown_as_summary(self):
-        cli = _make_cli()
-        cli.conversation_history = _tool_call_history()
-        output = self._capture_display(cli)
-
-        assert "2 tool calls" in output
-        assert "web_search" in output
-        assert "web_extract" in output
-
-    def test_long_user_message_truncated(self):
-        cli = _make_cli()
-        long_text = "A" * 500
-        cli.conversation_history = [
-            {"role": "user", "content": long_text},
-            {"role": "assistant", "content": "OK."},
-        ]
-        output = self._capture_display(cli)
-
-        # Should have truncation indicator and NOT contain the full 500 chars
-        assert "..." in output
-        assert "A" * 500 not in output
-        # The 300-char truncated text is present but may be line-wrapped by
-        # Rich's panel renderer, so check the total A count in the output
-        a_count = output.count("A")
-        assert 200 <= a_count <= 310  # roughly 300 chars (±panel padding)
-
-    def test_long_assistant_message_truncated(self):
-        cli = _make_cli()
-        long_text = "B" * 400
-        cli.conversation_history = [
-            {"role": "user", "content": "Tell me a lot."},
-            {"role": "assistant", "content": long_text},
-        ]
-        output = self._capture_display(cli)
-
-        assert "..." in output
-        assert "B" * 400 not in output
-
-    def test_multiline_assistant_truncated(self):
-        cli = _make_cli()
-        multi = "\n".join([f"Line {i}" for i in range(20)])
-        cli.conversation_history = [
-            {"role": "user", "content": "Show me lines."},
-            {"role": "assistant", "content": multi},
-        ]
-        output = self._capture_display(cli)
-
-        # First 3 lines should be there
-        assert "Line 0" in output
-        assert "Line 1" in output
-        assert "Line 2" in output
-        # Line 19 should NOT be there (truncated after 3 lines)
-        assert "Line 19" not in output
-
-    def test_large_history_shows_truncation_indicator(self):
-        cli = _make_cli()
-        cli.conversation_history = _large_history(n_exchanges=15)
-        output = self._capture_display(cli)
-
-        # Should show "earlier messages" indicator
-        assert "earlier messages" in output
-        # Last question should still be visible
-        assert "Question #15" in output
-
-    def test_multimodal_content_handled(self):
-        cli = _make_cli()
-        cli.conversation_history = _multimodal_history()
-        output = self._capture_display(cli)
-
-        assert "What's in this image?" in output
-        assert "[image]" in output
-
-    def test_empty_history_no_output(self):
-        cli = _make_cli()
-        cli.conversation_history = []
-        output = self._capture_display(cli)
-
-        assert output.strip() == ""
-
-    def test_minimal_config_suppresses_display(self):
-        cli = _make_cli(config_overrides={"display": {"resume_display": "minimal"}})
-        # resume_display is captured as an instance variable during __init__
-        assert cli.resume_display == "minimal"
-        cli.conversation_history = _simple_history()
-        output = self._capture_display(cli)
-
-        assert output.strip() == ""
-
-    def test_panel_has_title(self):
-        cli = _make_cli()
-        cli.conversation_history = _simple_history()
-        output = self._capture_display(cli)
-
-        assert "Previous Conversation" in output
-
-    def test_assistant_with_no_content_no_tools_skipped(self):
-        """Assistant messages with no visible output (e.g. pure reasoning)
-        are skipped in the recap."""
-        cli = _make_cli()
-        cli.conversation_history = [
-            {"role": "user", "content": "Hello"},
-            {"role": "assistant", "content": None},
-        ]
-        output = self._capture_display(cli)
-
-        # The assistant entry should be skipped, only the user message shown
-        assert "You:" in output
-        assert "Hermes:" not in output
-
-    def test_only_system_messages_no_output(self):
-        cli = _make_cli()
-        cli.conversation_history = [
-            {"role": "system", "content": "You are helpful."},
-        ]
-        output = self._capture_display(cli)
-
-        assert output.strip() == ""
-
-    def test_reasoning_scratchpad_stripped(self):
-        """<REASONING_SCRATCHPAD> blocks should be stripped from display."""
-        cli = _make_cli()
-        cli.conversation_history = [
-            {"role": "user", "content": "Think about this"},
-            {
-                "role": "assistant",
-                "content": (
-                    "<REASONING_SCRATCHPAD>\nLet me think step by step.\n"
-                    "</REASONING_SCRATCHPAD>\n\nThe answer is 42."
-                ),
-            },
-        ]
-        output = self._capture_display(cli)
-
-        assert "REASONING_SCRATCHPAD" not in output
-        assert "Let me think step by step" not in output
-        assert "The answer is 42" in output
-
-    def test_pure_reasoning_message_skipped(self):
-        """Assistant messages that are only reasoning should be skipped."""
-        cli = _make_cli()
-        cli.conversation_history = [
-            {"role": "user", "content": "Hello"},
-            {
-                "role": "assistant",
-                "content": "<REASONING_SCRATCHPAD>\nJust thinking...\n</REASONING_SCRATCHPAD>",
-            },
-            {"role": "assistant", "content": "Hi there!"},
-        ]
-        output = self._capture_display(cli)
-
-        assert "Just thinking" not in output
-        assert "Hi there!" in output
-
-    def test_assistant_with_text_and_tool_calls(self):
-        """When an assistant message has both text content AND tool_calls."""
-        cli = _make_cli()
-        cli.conversation_history = [
-            {"role": "user", "content": "Do something complex"},
-            {
-                "role": "assistant",
-                "content": "Let me search for that.",
-                "tool_calls": [
-                    {
-                        "id": "call_1",
-                        "type": "function",
-                        "function": {"name": "terminal", "arguments": '{"command":"ls"}'},
-                    }
-                ],
-            },
-        ]
-        output = self._capture_display(cli)
-
-        assert "Let me search for that." in output
-        assert "1 tool call" in output
-        assert "terminal" in output
-
-
-# ── Tests for _preload_resumed_session ──────────────────────────────
-
-
-class TestPreloadResumedSession:
-    """_preload_resumed_session() loads session from DB early."""
-
-    def test_returns_false_when_not_resumed(self):
-        cli = _make_cli()
-        assert cli._preload_resumed_session() is False
-
-    def test_returns_false_when_no_session_db(self):
-        cli = _make_cli(resume="test_session_id")
-        cli._session_db = None
-        assert cli._preload_resumed_session() is False
-
-    def test_returns_false_when_session_not_found(self):
-        cli = _make_cli(resume="nonexistent_session")
-        mock_db = MagicMock()
-        mock_db.get_session.return_value = None
-        cli._session_db = mock_db
-
-        buf = StringIO()
-        cli.console.file = buf
-        result = cli._preload_resumed_session()
-
-        assert result is False
-        output = buf.getvalue()
-        assert "Session not found" in output
-
-    def test_returns_false_when_session_has_no_messages(self):
-        cli = _make_cli(resume="empty_session")
-        mock_db = MagicMock()
-        mock_db.get_session.return_value = {"id": "empty_session", "title": None}
-        mock_db.get_messages_as_conversation.return_value = []
-        cli._session_db = mock_db
-
-        buf = StringIO()
-        cli.console.file = buf
-        result = cli._preload_resumed_session()
-
-        assert result is False
-        output = buf.getvalue()
-        assert "no messages" in output
-
-    def test_loads_session_successfully(self):
-        cli = _make_cli(resume="good_session")
-        messages = _simple_history()
-        mock_db = MagicMock()
-        mock_db.get_session.return_value = {"id": "good_session", "title": "Test Session"}
-        mock_db.get_messages_as_conversation.return_value = messages
-        cli._session_db = mock_db
-
-        buf = StringIO()
-        cli.console.file = buf
-        result = cli._preload_resumed_session()
-
-        assert result is True
-        assert cli.conversation_history == messages
-        output = buf.getvalue()
-        assert "Resumed session" in output
-        assert "good_session" in output
-        assert "Test Session" in output
-        assert "2 user messages" in output
-
-    def test_reopens_session_in_db(self):
-        cli = _make_cli(resume="reopen_session")
-        messages = [{"role": "user", "content": "hi"}]
-        mock_db = MagicMock()
-        mock_db.get_session.return_value = {"id": "reopen_session", "title": None}
-        mock_db.get_messages_as_conversation.return_value = messages
-        mock_conn = MagicMock()
-        mock_db._conn = mock_conn
-        cli._session_db = mock_db
-
-        buf = StringIO()
-        cli.console.file = buf
-        cli._preload_resumed_session()
-
-        # Should have executed UPDATE to clear ended_at
-        mock_conn.execute.assert_called_once()
-        call_args = mock_conn.execute.call_args
-        assert "ended_at = NULL" in call_args[0][0]
-        mock_conn.commit.assert_called_once()
-
-    def test_singular_user_message_grammar(self):
-        """1 user message should say 'message' not 'messages'."""
-        cli = _make_cli(resume="one_msg_session")
-        messages = [
-            {"role": "user", "content": "hello"},
-            {"role": "assistant", "content": "hi"},
-        ]
-        mock_db = MagicMock()
-        mock_db.get_session.return_value = {"id": "one_msg_session", "title": None}
-        mock_db.get_messages_as_conversation.return_value = messages
-        mock_db._conn = MagicMock()
-        cli._session_db = mock_db
-
-        buf = StringIO()
-        cli.console.file = buf
-        cli._preload_resumed_session()
-
-        output = buf.getvalue()
-        assert "1 user message," in output
-        assert "1 user messages" not in output
-
-
-# ── Integration: _init_agent skips when preloaded ────────────────────
-
-
-class TestInitAgentSkipsPreloaded:
-    """_init_agent() should skip DB load when history is already populated."""
-
-    def test_init_agent_skips_db_when_preloaded(self):
-        """If conversation_history is already set, _init_agent should not
-        reload from the DB."""
-        cli = _make_cli(resume="preloaded_session")
-        cli.conversation_history = _simple_history()
-
-        mock_db = MagicMock()
-        cli._session_db = mock_db
-
-        # _init_agent will fail at credential resolution (no real API key),
-        # but the session-loading block should be skipped entirely
-        with patch.object(cli, "_ensure_runtime_credentials", return_value=False):
-            cli._init_agent()
-
-        # get_messages_as_conversation should NOT have been called
-        mock_db.get_messages_as_conversation.assert_not_called()
-
-
-# ── Config default tests ─────────────────────────────────────────────
-
-
-class TestResumeDisplayConfig:
-    """resume_display config option defaults and behavior."""
-
-    def test_default_config_has_resume_display(self):
-        """DEFAULT_CONFIG in hermes_cli/config.py includes resume_display."""
-        from hermes_cli.config import DEFAULT_CONFIG
-        display = DEFAULT_CONFIG.get("display", {})
-        assert "resume_display" in display
-        assert display["resume_display"] == "full"
-
-    def test_cli_defaults_have_resume_display(self):
-        """cli.py load_cli_config defaults include resume_display."""
-        import cli as _cli_mod
-        from cli import load_cli_config
-
-        with (
-            patch("pathlib.Path.exists", return_value=False),
-            patch.dict("os.environ", {"LLM_MODEL": ""}, clear=False),
-        ):
-            config = load_cli_config()
-
-        display = config.get("display", {})
-        assert display.get("resume_display") == "full"
@@ -1040,136 +1040,3 @@ class TestMaxTokensParam:
        agent.base_url = "https://openrouter.ai/api/v1/api.openai.com"
        result = agent._max_tokens_param(4096)
        assert result == {"max_tokens": 4096}
-
-
-# ---------------------------------------------------------------------------
-# System prompt stability for prompt caching
-# ---------------------------------------------------------------------------
-
-class TestSystemPromptStability:
-    """Verify that the system prompt stays stable across turns for cache hits."""
-
-    def test_stored_prompt_reused_for_continuing_session(self, agent):
-        """When conversation_history is non-empty and session DB has a stored
-        prompt, it should be reused instead of rebuilding from disk."""
-        stored = "You are helpful. [stored from turn 1]"
-        mock_db = MagicMock()
-        mock_db.get_session.return_value = {"system_prompt": stored}
-        agent._session_db = mock_db
-
-        # Simulate a continuing session with history
-        history = [
-            {"role": "user", "content": "hello"},
-            {"role": "assistant", "content": "hi"},
-        ]
-
-        # First call — _cached_system_prompt is None, history is non-empty
-        agent._cached_system_prompt = None
-
-        # Patch run_conversation internals to just test the system prompt logic.
-        # We'll call the prompt caching block directly by simulating what
-        # run_conversation does.
-        conversation_history = history
-
-        # The block under test (from run_conversation):
-        if agent._cached_system_prompt is None:
-            stored_prompt = None
-            if conversation_history and agent._session_db:
-                try:
-                    session_row = agent._session_db.get_session(agent.session_id)
-                    if session_row:
-                        stored_prompt = session_row.get("system_prompt") or None
-                except Exception:
-                    pass
-
-            if stored_prompt:
-                agent._cached_system_prompt = stored_prompt
-
-        assert agent._cached_system_prompt == stored
-        mock_db.get_session.assert_called_once_with(agent.session_id)
-
-    def test_fresh_build_when_no_history(self, agent):
-        """On the first turn (no history), system prompt should be built fresh."""
-        mock_db = MagicMock()
-        agent._session_db = mock_db
-
-        agent._cached_system_prompt = None
-        conversation_history = []
-
-        # The block under test:
-        if agent._cached_system_prompt is None:
-            stored_prompt = None
-            if conversation_history and agent._session_db:
-                session_row = agent._session_db.get_session(agent.session_id)
-                if session_row:
-                    stored_prompt = session_row.get("system_prompt") or None
-
-            if stored_prompt:
-                agent._cached_system_prompt = stored_prompt
-            else:
-                agent._cached_system_prompt = agent._build_system_prompt()
-
-        # Should have built fresh, not queried the DB
-        mock_db.get_session.assert_not_called()
-        assert agent._cached_system_prompt is not None
-        assert "Hermes Agent" in agent._cached_system_prompt
-
-    def test_fresh_build_when_db_has_no_prompt(self, agent):
-        """If the session DB has no stored prompt, build fresh even with history."""
-        mock_db = MagicMock()
-        mock_db.get_session.return_value = {"system_prompt": ""}
-        agent._session_db = mock_db
-
-        agent._cached_system_prompt = None
-        conversation_history = [{"role": "user", "content": "hi"}]
-
-        if agent._cached_system_prompt is None:
-            stored_prompt = None
-            if conversation_history and agent._session_db:
-                try:
-                    session_row = agent._session_db.get_session(agent.session_id)
-                    if session_row:
-                        stored_prompt = session_row.get("system_prompt") or None
-                except Exception:
-                    pass
-
-            if stored_prompt:
-                agent._cached_system_prompt = stored_prompt
-            else:
-                agent._cached_system_prompt = agent._build_system_prompt()
-
-        # Empty string is falsy, so should fall through to fresh build
-        assert "Hermes Agent" in agent._cached_system_prompt
-
-    def test_honcho_context_baked_into_prompt_on_first_turn(self, agent):
-        """Honcho context should be baked into _cached_system_prompt on
-        the first turn, not injected separately per API call."""
-        agent._honcho_context = "User prefers Python over JavaScript."
-        agent._cached_system_prompt = None
-
-        # Simulate first turn: build fresh and bake in Honcho
-        agent._cached_system_prompt = agent._build_system_prompt()
-        if agent._honcho_context:
-            agent._cached_system_prompt = (
-                agent._cached_system_prompt + "\n\n" + agent._honcho_context
-            ).strip()
-
-        assert "User prefers Python over JavaScript" in agent._cached_system_prompt
-
-    def test_honcho_prefetch_skipped_on_continuing_session(self):
-        """Honcho prefetch should not be called when conversation_history
-        is non-empty (continuing session)."""
-        conversation_history = [
-            {"role": "user", "content": "hello"},
-            {"role": "assistant", "content": "hi there"},
-        ]
-
-        # The guard: `not conversation_history` is False when history exists
-        should_prefetch = not conversation_history
-        assert should_prefetch is False
-
-    def test_honcho_prefetch_runs_on_first_turn(self):
-        """Honcho prefetch should run when conversation_history is empty."""
-        conversation_history = []
-        should_prefetch = not conversation_history
-        assert should_prefetch is True
@@ -0,0 +1,159 @@
+"""
+Tests for environments/tool_call_parsers/ — client-side tool call parsers.
+
+These parsers extract structured tool_calls from raw model output text.
+Used in Phase 2 (VLLM/generate) where the server returns raw tokens.
+"""
+
+import json
+import sys
+from pathlib import Path
+
+import pytest
+
+# Ensure repo root is importable
+sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
+
+try:
+    from environments.tool_call_parsers import (
+        ParseResult,
+        ToolCallParser,
+        get_parser,
+        list_parsers,
+    )
+except ImportError:
+    pytest.skip("atroposlib not installed", allow_module_level=True)
+
+
+# ─── Registry tests ─────────────────────────────────────────────────────
+
+class TestParserRegistry:
+    def test_list_parsers_returns_nonempty(self):
+        parsers = list_parsers()
+        assert len(parsers) > 0
+
+    def test_hermes_parser_registered(self):
+        parsers = list_parsers()
+        assert "hermes" in parsers
+
+    def test_get_parser_returns_instance(self):
+        parser = get_parser("hermes")
+        assert isinstance(parser, ToolCallParser)
+
+    def test_get_parser_unknown_raises(self):
+        with pytest.raises(KeyError):
+            get_parser("nonexistent_parser_xyz")
+
+    def test_all_registered_parsers_instantiate(self):
+        """Every registered parser should be importable and instantiable."""
+        for name in list_parsers():
+            parser = get_parser(name)
+            assert isinstance(parser, ToolCallParser)
+            assert hasattr(parser, "parse")
+
+
+# ─── Hermes parser tests ────────────────────────────────────────────────
+
+class TestHermesParser:
+    @pytest.fixture
+    def parser(self):
+        return get_parser("hermes")
+
+    def test_no_tool_call(self, parser):
+        text = "Hello, I can help you with that."
+        content, tool_calls = parser.parse(text)
+        assert content == text
+        assert tool_calls is None
+
+    def test_single_tool_call(self, parser):
+        text = '<tool_call>{"name": "terminal", "arguments": {"command": "ls -la"}}</tool_call>'
+        content, tool_calls = parser.parse(text)
+        assert tool_calls is not None
+        assert len(tool_calls) == 1
+        assert tool_calls[0].function.name == "terminal"
+        args = json.loads(tool_calls[0].function.arguments)
+        assert args["command"] == "ls -la"
+
+    def test_tool_call_with_surrounding_text(self, parser):
+        text = 'Let me check that for you.\n<tool_call>{"name": "terminal", "arguments": {"command": "pwd"}}</tool_call>'
+        content, tool_calls = parser.parse(text)
+        assert tool_calls is not None
+        assert len(tool_calls) == 1
+        assert tool_calls[0].function.name == "terminal"
+        # Content should have the surrounding text
+        if content is not None:
+            assert "check that" in content or content.strip() != ""
+
+    def test_multiple_tool_calls(self, parser):
+        text = (
+            '<tool_call>{"name": "terminal", "arguments": {"command": "ls"}}</tool_call>\n'
+            '<tool_call>{"name": "read_file", "arguments": {"path": "test.py"}}</tool_call>'
+        )
+        content, tool_calls = parser.parse(text)
+        assert tool_calls is not None
+        assert len(tool_calls) == 2
+        names = {tc.function.name for tc in tool_calls}
+        assert "terminal" in names
+        assert "read_file" in names
+
+    def test_tool_call_ids_are_unique(self, parser):
+        text = (
+            '<tool_call>{"name": "terminal", "arguments": {"command": "ls"}}</tool_call>\n'
+            '<tool_call>{"name": "terminal", "arguments": {"command": "pwd"}}</tool_call>'
+        )
+        _, tool_calls = parser.parse(text)
+        assert tool_calls is not None
+        ids = [tc.id for tc in tool_calls]
+        assert len(ids) == len(set(ids)), "Tool call IDs must be unique"
+
+    def test_empty_string(self, parser):
+        content, tool_calls = parser.parse("")
+        assert tool_calls is None
+
+    def test_malformed_json_in_tool_call(self, parser):
+        text = '<tool_call>not valid json</tool_call>'
+        content, tool_calls = parser.parse(text)
+        # Should either return None tool_calls or handle gracefully
+        # (implementation may vary — some parsers return error tool calls)
+
+    def test_truncated_tool_call(self, parser):
+        """Test handling of unclosed tool_call tag (model truncated mid-generation)."""
+        text = '<tool_call>{"name": "terminal", "arguments": {"command": "ls -la"}'
+        content, tool_calls = parser.parse(text)
+        # Parser should handle truncated output gracefully
+        # Either parse it successfully or return None
+
+
+# ─── Parse result contract tests (applies to ALL parsers) ───────────────
+
+class TestParseResultContract:
+    """Ensure all parsers conform to the ParseResult contract."""
+
+    @pytest.fixture(params=["hermes"])  # Add more as needed
+    def parser(self, request):
+        return get_parser(request.param)
+
+    def test_returns_tuple_of_two(self, parser):
+        result = parser.parse("hello world")
+        assert isinstance(result, tuple)
+        assert len(result) == 2
+
+    def test_no_tools_returns_none_tool_calls(self, parser):
+        content, tool_calls = parser.parse("Just plain text, no tools.")
+        assert tool_calls is None
+        assert content is not None
+
+    def test_tool_calls_are_proper_objects(self, parser):
+        """When tool calls are found, they should be ChatCompletionMessageToolCall objects."""
+        # Use hermes format since that's universal
+        text = '<tool_call>{"name": "terminal", "arguments": {"command": "echo hi"}}</tool_call>'
+        content, tool_calls = parser.parse(text)
+        if tool_calls is not None:
+            for tc in tool_calls:
+                assert hasattr(tc, "id")
+                assert hasattr(tc, "function")
+                assert hasattr(tc.function, "name")
+                assert hasattr(tc.function, "arguments")
+                assert tc.id is not None
+                assert isinstance(tc.function.name, str)
+                assert isinstance(tc.function.arguments, str)
@@ -1,276 +0,0 @@
-"""Tests for browser_console tool and browser_vision annotate param."""
-
-import json
-import os
-import sys
-from unittest.mock import patch, MagicMock
-
-import pytest
-
-sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", ".."))
-
-
-# ── browser_console ──────────────────────────────────────────────────
-
-
-class TestBrowserConsole:
-    """browser_console() returns console messages + JS errors in one call."""
-
-    def test_returns_console_messages_and_errors(self):
-        from tools.browser_tool import browser_console
-
-        console_response = {
-            "success": True,
-            "data": {
-                "messages": [
-                    {"text": "hello", "type": "log", "timestamp": 1},
-                    {"text": "oops", "type": "error", "timestamp": 2},
-                ]
-            },
-        }
-        errors_response = {
-            "success": True,
-            "data": {
-                "errors": [
-                    {"message": "Uncaught TypeError", "timestamp": 3},
-                ]
-            },
-        }
-
-        with patch("tools.browser_tool._run_browser_command") as mock_cmd:
-            mock_cmd.side_effect = [console_response, errors_response]
-            result = json.loads(browser_console(task_id="test"))
-
-        assert result["success"] is True
-        assert result["total_messages"] == 2
-        assert result["total_errors"] == 1
-        assert result["console_messages"][0]["text"] == "hello"
-        assert result["console_messages"][1]["text"] == "oops"
-        assert result["js_errors"][0]["message"] == "Uncaught TypeError"
-
-    def test_passes_clear_flag(self):
-        from tools.browser_tool import browser_console
-
-        empty = {"success": True, "data": {"messages": [], "errors": []}}
-        with patch("tools.browser_tool._run_browser_command", return_value=empty) as mock_cmd:
-            browser_console(clear=True, task_id="test")
-
-        calls = mock_cmd.call_args_list
-        # Both console and errors should get --clear
-        assert calls[0][0] == ("test", "console", ["--clear"])
-        assert calls[1][0] == ("test", "errors", ["--clear"])
-
-    def test_no_clear_by_default(self):
-        from tools.browser_tool import browser_console
-
-        empty = {"success": True, "data": {"messages": [], "errors": []}}
-        with patch("tools.browser_tool._run_browser_command", return_value=empty) as mock_cmd:
-            browser_console(task_id="test")
-
-        calls = mock_cmd.call_args_list
-        assert calls[0][0] == ("test", "console", [])
-        assert calls[1][0] == ("test", "errors", [])
-
-    def test_empty_console_and_errors(self):
-        from tools.browser_tool import browser_console
-
-        empty = {"success": True, "data": {"messages": [], "errors": []}}
-        with patch("tools.browser_tool._run_browser_command", return_value=empty):
-            result = json.loads(browser_console(task_id="test"))
-
-        assert result["total_messages"] == 0
-        assert result["total_errors"] == 0
-        assert result["console_messages"] == []
-        assert result["js_errors"] == []
-
-    def test_handles_failed_commands(self):
-        from tools.browser_tool import browser_console
-
-        failed = {"success": False, "error": "No session"}
-        with patch("tools.browser_tool._run_browser_command", return_value=failed):
-            result = json.loads(browser_console(task_id="test"))
-
-        # Should still return success with empty data
-        assert result["success"] is True
-        assert result["total_messages"] == 0
-        assert result["total_errors"] == 0
-
-
-# ── browser_console schema ───────────────────────────────────────────
-
-
-class TestBrowserConsoleSchema:
-    """browser_console is properly registered in the tool registry."""
-
-    def test_schema_in_browser_schemas(self):
-        from tools.browser_tool import BROWSER_TOOL_SCHEMAS
-
-        names = [s["name"] for s in BROWSER_TOOL_SCHEMAS]
-        assert "browser_console" in names
-
-    def test_schema_has_clear_param(self):
-        from tools.browser_tool import BROWSER_TOOL_SCHEMAS
-
-        schema = next(s for s in BROWSER_TOOL_SCHEMAS if s["name"] == "browser_console")
-        props = schema["parameters"]["properties"]
-        assert "clear" in props
-        assert props["clear"]["type"] == "boolean"
-
-
-# ── browser_vision annotate ──────────────────────────────────────────
-
-
-class TestBrowserVisionAnnotate:
-    """browser_vision supports annotate parameter."""
-
-    def test_schema_has_annotate_param(self):
-        from tools.browser_tool import BROWSER_TOOL_SCHEMAS
-
-        schema = next(s for s in BROWSER_TOOL_SCHEMAS if s["name"] == "browser_vision")
-        props = schema["parameters"]["properties"]
-        assert "annotate" in props
-        assert props["annotate"]["type"] == "boolean"
-
-    def test_annotate_false_no_flag(self):
-        """Without annotate, screenshot command has no --annotate flag."""
-        from tools.browser_tool import browser_vision
-
-        with (
-            patch("tools.browser_tool._run_browser_command") as mock_cmd,
-            patch("tools.browser_tool._aux_vision_client") as mock_client,
-            patch("tools.browser_tool._DEFAULT_VISION_MODEL", "test-model"),
-            patch("tools.browser_tool._get_vision_model", return_value="test-model"),
-        ):
-            mock_cmd.return_value = {"success": True, "data": {}}
-            # Will fail at screenshot file read, but we can check the command
-            try:
-                browser_vision("test", annotate=False, task_id="test")
-            except Exception:
-                pass
-
-            if mock_cmd.called:
-                args = mock_cmd.call_args[0]
-                cmd_args = args[2] if len(args) > 2 else []
-                assert "--annotate" not in cmd_args
-
-    def test_annotate_true_adds_flag(self):
-        """With annotate=True, screenshot command includes --annotate."""
-        from tools.browser_tool import browser_vision
-
-        with (
-            patch("tools.browser_tool._run_browser_command") as mock_cmd,
-            patch("tools.browser_tool._aux_vision_client") as mock_client,
-            patch("tools.browser_tool._DEFAULT_VISION_MODEL", "test-model"),
-            patch("tools.browser_tool._get_vision_model", return_value="test-model"),
-        ):
-            mock_cmd.return_value = {"success": True, "data": {}}
-            try:
-                browser_vision("test", annotate=True, task_id="test")
-            except Exception:
-                pass
-
-            if mock_cmd.called:
-                args = mock_cmd.call_args[0]
-                cmd_args = args[2] if len(args) > 2 else []
-                assert "--annotate" in cmd_args
-
-
-# ── auto-recording config ────────────────────────────────────────────
-
-
-class TestRecordSessionsConfig:
-    """browser.record_sessions config option."""
-
-    def test_default_config_has_record_sessions(self):
-        from hermes_cli.config import DEFAULT_CONFIG
-
-        browser_cfg = DEFAULT_CONFIG.get("browser", {})
-        assert "record_sessions" in browser_cfg
-        assert browser_cfg["record_sessions"] is False
-
-    def test_maybe_start_recording_disabled(self):
-        """Recording doesn't start when config says record_sessions: false."""
-        from tools.browser_tool import _maybe_start_recording, _recording_sessions
-
-        with (
-            patch("tools.browser_tool._run_browser_command") as mock_cmd,
-            patch("builtins.open", side_effect=FileNotFoundError),
-        ):
-            _maybe_start_recording("test-task")
-
-        mock_cmd.assert_not_called()
-        assert "test-task" not in _recording_sessions
-
-    def test_maybe_stop_recording_noop_when_not_recording(self):
-        """Stopping when not recording is a no-op."""
-        from tools.browser_tool import _maybe_stop_recording, _recording_sessions
-
-        _recording_sessions.discard("test-task")  # ensure not in set
-        with patch("tools.browser_tool._run_browser_command") as mock_cmd:
-            _maybe_stop_recording("test-task")
-
-        mock_cmd.assert_not_called()
-
-
-# ── dogfood skill files ──────────────────────────────────────────────
-
-
-class TestDogfoodSkill:
-    """Dogfood skill files exist and have correct structure."""
-
-    @pytest.fixture(autouse=True)
-    def _skill_dir(self):
-        # Use the actual repo skills dir (not temp)
-        self.skill_dir = os.path.join(
-            os.path.dirname(__file__), "..", "..", "skills", "dogfood"
-        )
-
-    def test_skill_md_exists(self):
-        assert os.path.exists(os.path.join(self.skill_dir, "SKILL.md"))
-
-    def test_taxonomy_exists(self):
-        assert os.path.exists(
-            os.path.join(self.skill_dir, "references", "issue-taxonomy.md")
-        )
-
-    def test_report_template_exists(self):
-        assert os.path.exists(
-            os.path.join(self.skill_dir, "templates", "dogfood-report-template.md")
-        )
-
-    def test_skill_md_has_frontmatter(self):
-        with open(os.path.join(self.skill_dir, "SKILL.md")) as f:
-            content = f.read()
-        assert content.startswith("---")
-        assert "name: dogfood" in content
-        assert "description:" in content
-
-    def test_skill_references_browser_console(self):
-        with open(os.path.join(self.skill_dir, "SKILL.md")) as f:
-            content = f.read()
-        assert "browser_console" in content
-
-    def test_skill_references_annotate(self):
-        with open(os.path.join(self.skill_dir, "SKILL.md")) as f:
-            content = f.read()
-        assert "annotate" in content
-
-    def test_taxonomy_has_severity_levels(self):
-        with open(
-            os.path.join(self.skill_dir, "references", "issue-taxonomy.md")
-        ) as f:
-            content = f.read()
-        assert "Critical" in content
-        assert "High" in content
-        assert "Medium" in content
-        assert "Low" in content
-
-    def test_taxonomy_has_categories(self):
-        with open(
-            os.path.join(self.skill_dir, "references", "issue-taxonomy.md")
-        ) as f:
-            content = f.read()
-        assert "Functional" in content
-        assert "Visual" in content
-        assert "Accessibility" in content
-        assert "Console" in content
@@ -550,13 +550,14 @@ class TestConvertToPng:
        """BMP file should still be reported as success if no converter available."""
        dest = tmp_path / "img.png"
        dest.write_bytes(FAKE_BMP)  # it's a BMP but named .png
-        # Both Pillow and ImageMagick unavailable
-        with patch.dict(sys.modules, {"PIL": None, "PIL.Image": None}):
-            with patch("hermes_cli.clipboard.subprocess.run", side_effect=FileNotFoundError):
-                result = _convert_to_png(dest)
-                # Raw BMP is better than nothing — function should return True
-                assert result is True
-                assert dest.exists() and dest.stat().st_size > 0
+        # Both Pillow and ImageMagick fail
+        with patch("hermes_cli.clipboard.subprocess.run", side_effect=FileNotFoundError):
+            # Pillow import fails
+            with pytest.raises(Exception):
+                from PIL import Image  # noqa — this may or may not work
+            # The function should still return True if file exists and has content
+            # (raw BMP is better than nothing)
+            assert dest.exists() and dest.stat().st_size > 0


 # ── has_clipboard_image dispatch ─────────────────────────────────────────
@@ -393,56 +393,5 @@ class TestStubSchemaDrift(unittest.TestCase):
        self.assertIn("mode", src)


-class TestHeadTailTruncation(unittest.TestCase):
-    """Tests for head+tail truncation of large stdout in execute_code."""
-
-    def _run(self, code):
-        with patch("model_tools.handle_function_call", side_effect=_mock_handle_function_call):
-            result = execute_code(
-                code=code,
-                task_id="test-task",
-                enabled_tools=list(SANDBOX_ALLOWED_TOOLS),
-            )
-        return json.loads(result)
-
-    def test_short_output_not_truncated(self):
-        """Output under MAX_STDOUT_BYTES should not be truncated."""
-        result = self._run('print("small output")')
-        self.assertEqual(result["status"], "success")
-        self.assertIn("small output", result["output"])
-        self.assertNotIn("TRUNCATED", result["output"])
-
-    def test_large_output_preserves_head_and_tail(self):
-        """Output exceeding MAX_STDOUT_BYTES keeps both head and tail."""
-        code = '''
-# Print HEAD marker, then filler, then TAIL marker
-print("HEAD_MARKER_START")
-for i in range(15000):
-    print(f"filler_line_{i:06d}_padding_to_fill_buffer")
-print("TAIL_MARKER_END")
-'''
-        result = self._run(code)
-        self.assertEqual(result["status"], "success")
-        output = result["output"]
-        # Head should be preserved
-        self.assertIn("HEAD_MARKER_START", output)
-        # Tail should be preserved (this is the key improvement)
-        self.assertIn("TAIL_MARKER_END", output)
-        # Truncation notice should be present
-        self.assertIn("TRUNCATED", output)
-
-    def test_truncation_notice_format(self):
-        """Truncation notice includes character counts."""
-        code = '''
-for i in range(15000):
-    print(f"padding_line_{i:06d}_xxxxxxxxxxxxxxxxxxxxxxxxxx")
-'''
-        result = self._run(code)
-        output = result["output"]
-        if "TRUNCATED" in output:
-            self.assertIn("chars omitted", output)
-            self.assertIn("total", output)
-
-
 if __name__ == "__main__":
    unittest.main()
@@ -259,70 +259,6 @@ class TestShellFileOpsHelpers:
        assert ops.cwd == "/"


-class TestSearchPathValidation:
-    """Test that search() returns an error for non-existent paths."""
-
-    def test_search_nonexistent_path_returns_error(self, mock_env):
-        """search() should return an error when the path doesn't exist."""
-        def side_effect(command, **kwargs):
-            if "test -e" in command:
-                return {"output": "not_found", "returncode": 1}
-            if "command -v" in command:
-                return {"output": "yes", "returncode": 0}
-            return {"output": "", "returncode": 0}
-        mock_env.execute.side_effect = side_effect
-        ops = ShellFileOperations(mock_env)
-        result = ops.search("pattern", path="/nonexistent/path")
-        assert result.error is not None
-        assert "not found" in result.error.lower() or "Path not found" in result.error
-
-    def test_search_nonexistent_path_files_mode(self, mock_env):
-        """search(target='files') should also return error for bad paths."""
-        def side_effect(command, **kwargs):
-            if "test -e" in command:
-                return {"output": "not_found", "returncode": 1}
-            if "command -v" in command:
-                return {"output": "yes", "returncode": 0}
-            return {"output": "", "returncode": 0}
-        mock_env.execute.side_effect = side_effect
-        ops = ShellFileOperations(mock_env)
-        result = ops.search("*.py", path="/nonexistent/path", target="files")
-        assert result.error is not None
-        assert "not found" in result.error.lower() or "Path not found" in result.error
-
-    def test_search_existing_path_proceeds(self, mock_env):
-        """search() should proceed normally when the path exists."""
-        def side_effect(command, **kwargs):
-            if "test -e" in command:
-                return {"output": "exists", "returncode": 0}
-            if "command -v" in command:
-                return {"output": "yes", "returncode": 0}
-            # rg returns exit 1 (no matches) with empty output
-            return {"output": "", "returncode": 1}
-        mock_env.execute.side_effect = side_effect
-        ops = ShellFileOperations(mock_env)
-        result = ops.search("pattern", path="/existing/path")
-        assert result.error is None
-        assert result.total_count == 0  # No matches but no error
-
-    def test_search_rg_error_exit_code(self, mock_env):
-        """search() should report error when rg returns exit code 2."""
-        call_count = {"n": 0}
-        def side_effect(command, **kwargs):
-            call_count["n"] += 1
-            if "test -e" in command:
-                return {"output": "exists", "returncode": 0}
-            if "command -v" in command:
-                return {"output": "yes", "returncode": 0}
-            # rg returns exit 2 (error) with empty output
-            return {"output": "", "returncode": 2}
-        mock_env.execute.side_effect = side_effect
-        ops = ShellFileOperations(mock_env)
-        result = ops.search("pattern", path="/some/path")
-        assert result.error is not None
-        assert "search failed" in result.error.lower() or "Search error" in result.error
-
-
 class TestShellFileOpsWriteDenied:
    def test_write_file_denied_path(self, file_ops):
        result = file_ops.write_file("~/.ssh/authorized_keys", "evil key")
@@ -38,7 +38,6 @@ class TestReadFileHandler:
    def test_returns_file_content(self, mock_get):
        mock_ops = MagicMock()
        result_obj = MagicMock()
-        result_obj.content = "line1\nline2"
        result_obj.to_dict.return_value = {"content": "line1\nline2", "total_lines": 2}
        mock_ops.read_file.return_value = result_obj
        mock_get.return_value = mock_ops
@@ -53,7 +52,6 @@ class TestReadFileHandler:
    def test_custom_offset_and_limit(self, mock_get):
        mock_ops = MagicMock()
        result_obj = MagicMock()
-        result_obj.content = "line10"
        result_obj.to_dict.return_value = {"content": "line10", "total_lines": 50}
        mock_ops.read_file.return_value = result_obj
        mock_get.return_value = mock_ops
@@ -202,91 +200,3 @@ class TestSearchHandler:
        from tools.file_tools import search_tool
        result = json.loads(search_tool(pattern="x"))
        assert "error" in result
-
-
-# ---------------------------------------------------------------------------
-# Tool result hint tests (#722)
-# ---------------------------------------------------------------------------
-
-class TestPatchHints:
-    """Patch tool should hint when old_string is not found."""
-
-    @patch("tools.file_tools._get_file_ops")
-    def test_no_match_includes_hint(self, mock_get):
-        mock_ops = MagicMock()
-        result_obj = MagicMock()
-        result_obj.to_dict.return_value = {
-            "error": "Could not find match for old_string in foo.py"
-        }
-        mock_ops.patch_replace.return_value = result_obj
-        mock_get.return_value = mock_ops
-
-        from tools.file_tools import patch_tool
-        raw = patch_tool(mode="replace", path="foo.py", old_string="x", new_string="y")
-        assert "[Hint:" in raw
-        assert "read_file" in raw
-
-    @patch("tools.file_tools._get_file_ops")
-    def test_success_no_hint(self, mock_get):
-        mock_ops = MagicMock()
-        result_obj = MagicMock()
-        result_obj.to_dict.return_value = {"success": True, "diff": "--- a\n+++ b"}
-        mock_ops.patch_replace.return_value = result_obj
-        mock_get.return_value = mock_ops
-
-        from tools.file_tools import patch_tool
-        raw = patch_tool(mode="replace", path="foo.py", old_string="x", new_string="y")
-        assert "[Hint:" not in raw
-
-
-class TestSearchHints:
-    """Search tool should hint when results are truncated."""
-
-    @patch("tools.file_tools._get_file_ops")
-    def test_truncated_results_hint(self, mock_get):
-        mock_ops = MagicMock()
-        result_obj = MagicMock()
-        result_obj.to_dict.return_value = {
-            "total_count": 100,
-            "matches": [{"path": "a.py", "line": 1, "content": "x"}] * 50,
-            "truncated": True,
-        }
-        mock_ops.search.return_value = result_obj
-        mock_get.return_value = mock_ops
-
-        from tools.file_tools import search_tool
-        raw = search_tool(pattern="foo", offset=0, limit=50)
-        assert "[Hint:" in raw
-        assert "offset=50" in raw
-
-    @patch("tools.file_tools._get_file_ops")
-    def test_non_truncated_no_hint(self, mock_get):
-        mock_ops = MagicMock()
-        result_obj = MagicMock()
-        result_obj.to_dict.return_value = {
-            "total_count": 3,
-            "matches": [{"path": "a.py", "line": 1, "content": "x"}] * 3,
-        }
-        mock_ops.search.return_value = result_obj
-        mock_get.return_value = mock_ops
-
-        from tools.file_tools import search_tool
-        raw = search_tool(pattern="foo")
-        assert "[Hint:" not in raw
-
-    @patch("tools.file_tools._get_file_ops")
-    def test_truncated_hint_with_nonzero_offset(self, mock_get):
-        mock_ops = MagicMock()
-        result_obj = MagicMock()
-        result_obj.to_dict.return_value = {
-            "total_count": 150,
-            "matches": [{"path": "a.py", "line": 1, "content": "x"}] * 50,
-            "truncated": True,
-        }
-        mock_ops.search.return_value = result_obj
-        mock_get.return_value = mock_ops
-
-        from tools.file_tools import search_tool
-        raw = search_tool(pattern="foo", offset=50, limit=50)
-        assert "[Hint:" in raw
-        assert "offset=100" in raw
@@ -0,0 +1,271 @@
+"""Tests for Modal sandbox infrastructure fixes (TBLite baseline).
+
+Covers the 9 bugs discovered while setting up TBLite evaluation:
+1. Tool resolution — terminal + file tools load with minisweagent
+2. CWD fix — host paths get replaced with /root for container backends
+3. ephemeral_disk version check
+4. Tilde ~ replaced with /root for container backends
+5. ensurepip fix in patches.py for Modal image builder
+6. install_pipx stays True for swerex-remote
+7. /home/ added to host prefix check
+"""
+
+import os
+import sys
+from pathlib import Path
+from unittest.mock import patch, MagicMock
+
+import pytest
+
+# Ensure repo root is importable
+_repo_root = Path(__file__).resolve().parent.parent.parent
+if str(_repo_root) not in sys.path:
+    sys.path.insert(0, str(_repo_root))
+
+try:
+    import tools.terminal_tool  # noqa: F401
+    _tt_mod = sys.modules["tools.terminal_tool"]
+except ImportError:
+    pytest.skip("hermes-agent tools not importable (missing deps)", allow_module_level=True)
+
+
+# =========================================================================
+# Test 1: Tool resolution includes terminal + file tools
+# =========================================================================
+
+class TestToolResolution:
+    """Verify get_tool_definitions returns all expected tools for eval."""
+
+    def _has_minisweagent(self):
+        try:
+            import minisweagent  # noqa: F401
+            return True
+        except ImportError:
+            return False
+
+    def test_terminal_and_file_toolsets_resolve_all_tools(self):
+        """enabled_toolsets=['terminal', 'file'] should produce 6 tools."""
+        if not self._has_minisweagent():
+            pytest.skip("minisweagent not installed (git submodule update --init)")
+        from model_tools import get_tool_definitions
+        tools = get_tool_definitions(
+            enabled_toolsets=["terminal", "file"],
+            quiet_mode=True,
+        )
+        names = {t["function"]["name"] for t in tools}
+        expected = {"terminal", "process", "read_file", "write_file", "search_files", "patch"}
+        assert expected == names, f"Expected {expected}, got {names}"
+
+    def test_terminal_tool_present(self):
+        """The terminal tool must be present (not silently dropped)."""
+        if not self._has_minisweagent():
+            pytest.skip("minisweagent not installed (git submodule update --init)")
+        from model_tools import get_tool_definitions
+        tools = get_tool_definitions(
+            enabled_toolsets=["terminal", "file"],
+            quiet_mode=True,
+        )
+        names = [t["function"]["name"] for t in tools]
+        assert "terminal" in names, (
+            f"terminal tool missing! Only got: {names}. "
+            "Check that minisweagent is installed (git submodule update --init)."
+        )
+
+
+# =========================================================================
+# Test 2-4: CWD handling for container backends
+# =========================================================================
+
+class TestCwdHandling:
+    """Verify host paths are sanitized for container backends."""
+
+    def test_home_path_replaced_for_modal(self):
+        """TERMINAL_CWD=/home/user/... should be replaced with /root for modal."""
+        with patch.dict(os.environ, {
+            "TERMINAL_ENV": "modal",
+            "TERMINAL_CWD": "/home/dakota/github/hermes-agent",
+        }):
+            config = _tt_mod._get_env_config()
+            assert config["cwd"] == "/root", (
+                f"Expected /root, got {config['cwd']}. "
+                "/home/ paths should be replaced for modal backend."
+            )
+
+    def test_users_path_replaced_for_docker(self):
+        """TERMINAL_CWD=/Users/... should be replaced with /root for docker."""
+        with patch.dict(os.environ, {
+            "TERMINAL_ENV": "docker",
+            "TERMINAL_CWD": "/Users/someone/projects",
+        }):
+            config = _tt_mod._get_env_config()
+            assert config["cwd"] == "/root", (
+                f"Expected /root, got {config['cwd']}. "
+                "/Users/ paths should be replaced for docker backend."
+            )
+
+    def test_windows_path_replaced_for_modal(self):
+        """TERMINAL_CWD=C:\\Users\\... should be replaced for modal."""
+        with patch.dict(os.environ, {
+            "TERMINAL_ENV": "modal",
+            "TERMINAL_CWD": "C:\\Users\\someone\\projects",
+        }):
+            config = _tt_mod._get_env_config()
+            assert config["cwd"] == "/root"
+
+    def test_default_cwd_is_root_for_container_backends(self):
+        """Container backends should default to /root, not ~."""
+        for backend in ("modal", "docker", "singularity", "daytona"):
+            with patch.dict(os.environ, {"TERMINAL_ENV": backend}, clear=False):
+                # Remove TERMINAL_CWD so it uses default
+                env = os.environ.copy()
+                env.pop("TERMINAL_CWD", None)
+                with patch.dict(os.environ, env, clear=True):
+                    config = _tt_mod._get_env_config()
+                    assert config["cwd"] == "/root", (
+                        f"Backend {backend}: expected /root default, got {config['cwd']}"
+                    )
+
+    def test_local_backend_uses_getcwd(self):
+        """Local backend should use os.getcwd(), not /root."""
+        with patch.dict(os.environ, {"TERMINAL_ENV": "local"}, clear=False):
+            env = os.environ.copy()
+            env.pop("TERMINAL_CWD", None)
+            with patch.dict(os.environ, env, clear=True):
+                config = _tt_mod._get_env_config()
+                assert config["cwd"] == os.getcwd()
+
+    def test_ssh_preserves_home_paths(self):
+        """SSH backend should NOT replace /home/ paths (they're valid remotely)."""
+        with patch.dict(os.environ, {
+            "TERMINAL_ENV": "ssh",
+            "TERMINAL_CWD": "/home/remote-user/work",
+            "TERMINAL_SSH_HOST": "example.com",
+            "TERMINAL_SSH_USER": "user",
+        }):
+            config = _tt_mod._get_env_config()
+            assert config["cwd"] == "/home/remote-user/work", (
+                "SSH backend should preserve /home/ paths"
+            )
+
+
+# =========================================================================
+# Test 5: ephemeral_disk version check
+# =========================================================================
+
+class TestEphemeralDiskCheck:
+    """Verify ephemeral_disk is only passed when modal supports it."""
+
+    def test_ephemeral_disk_skipped_when_unsupported(self):
+        """If modal.Sandbox.create doesn't have ephemeral_disk param, skip it."""
+        # Mock the modal import and Sandbox.create signature
+        mock_modal = MagicMock()
+        mock_sandbox_create = MagicMock()
+        # Simulate a signature WITHOUT ephemeral_disk
+        import inspect
+        mock_params = {
+            "args": inspect.Parameter("args", inspect.Parameter.VAR_POSITIONAL),
+            "image": inspect.Parameter("image", inspect.Parameter.KEYWORD_ONLY),
+            "timeout": inspect.Parameter("timeout", inspect.Parameter.KEYWORD_ONLY),
+            "cpu": inspect.Parameter("cpu", inspect.Parameter.KEYWORD_ONLY),
+            "memory": inspect.Parameter("memory", inspect.Parameter.KEYWORD_ONLY),
+        }
+        mock_sig = inspect.Signature(parameters=list(mock_params.values()))
+
+        with patch.dict(os.environ, {"TERMINAL_ENV": "modal"}):
+            config = _tt_mod._get_env_config()
+            # The config has container_disk default of 51200
+            disk = config.get("container_disk", 51200)
+            assert disk > 0, "disk should default to > 0"
+
+            # Simulate the version check logic from terminal_tool.py
+            sandbox_kwargs = {}
+            if disk > 0:
+                try:
+                    if "ephemeral_disk" in mock_params:
+                        sandbox_kwargs["ephemeral_disk"] = disk
+                except Exception:
+                    pass
+
+            assert "ephemeral_disk" not in sandbox_kwargs, (
+                "ephemeral_disk should not be set when Sandbox.create doesn't support it"
+            )
+
+
+# =========================================================================
+# Test 6: ModalEnvironment defaults
+# =========================================================================
+
+class TestModalEnvironmentDefaults:
+    """Verify ModalEnvironment has correct defaults."""
+
+    def test_default_cwd_is_root(self):
+        """ModalEnvironment default cwd should be /root, not ~."""
+        from tools.environments.modal import ModalEnvironment
+        import inspect
+        sig = inspect.signature(ModalEnvironment.__init__)
+        cwd_default = sig.parameters["cwd"].default
+        assert cwd_default == "/root", (
+            f"ModalEnvironment cwd default should be /root, got {cwd_default!r}. "
+            "Tilde ~ is not expanded by subprocess.run(cwd=...)."
+        )
+
+
+# =========================================================================
+# Test 7: ensurepip fix in patches.py
+# =========================================================================
+
+class TestEnsurepipFix:
+    """Verify the pip fix is applied in the patched Modal init."""
+
+    def test_patched_init_creates_image_with_setup_commands(self):
+        """The patched __init__ should create a modal.Image with pip fix."""
+        try:
+            from environments.patches import _patch_swerex_modal
+        except ImportError:
+            pytest.skip("environments.patches not importable")
+
+        # Check that the patch code references ensurepip
+        import inspect
+        source = inspect.getsource(_patch_swerex_modal)
+        assert "ensurepip" in source, (
+            "patches._patch_swerex_modal should include ensurepip fix "
+            "for Modal's legacy image builder"
+        )
+        assert "setup_dockerfile_commands" in source, (
+            "patches._patch_swerex_modal should use setup_dockerfile_commands "
+            "to fix pip before Modal's bootstrap"
+        )
+
+    def test_patched_init_uses_install_pipx_from_config(self):
+        """The patched init should respect install_pipx from config."""
+        try:
+            from environments.patches import _patch_swerex_modal
+        except ImportError:
+            pytest.skip("environments.patches not importable")
+
+        import inspect
+        source = inspect.getsource(_patch_swerex_modal)
+        assert "install_pipx" in source, (
+            "patches._patch_swerex_modal should pass install_pipx to ModalDeployment"
+        )
+
+
+# =========================================================================
+# Test 8: Host prefix list completeness
+# =========================================================================
+
+class TestHostPrefixList:
+    """Verify the host prefix list catches common host-only paths."""
+
+    def test_all_common_host_prefixes_caught(self):
+        """The host prefix check should catch /Users/, /home/, C:\\, C:/."""
+        # Read the actual source to verify the prefixes
+        import inspect
+        source = inspect.getsource(_tt_mod._get_env_config)
+        for prefix in ["/Users/", "/home/", 'C:\\\\"', "C:/"]:
+            # Normalize for source comparison
+            check = prefix.rstrip('"')
+            assert check in source or prefix in source, (
+                f"Host prefix {prefix!r} not found in _get_env_config. "
+                "Container backends need this to avoid using host paths."
+            )
@@ -63,7 +63,7 @@ import time
 import requests
 from typing import Dict, Any, Optional, List
 from pathlib import Path
-from agent.auxiliary_client import get_vision_auxiliary_client, get_text_auxiliary_client
+from agent.auxiliary_client import get_vision_auxiliary_client

 logger = logging.getLogger(__name__)

@@ -80,38 +80,8 @@ DEFAULT_SESSION_TIMEOUT = 300
 # Max tokens for snapshot content before summarization
 SNAPSHOT_SUMMARIZE_THRESHOLD = 8000

-# Vision client — for browser_vision (screenshot analysis)
-# Wrapped in try/except so a broken auxiliary config doesn't prevent the entire
-# browser_tool module from importing (which would disable all 10 browser tools).
-try:
-    _aux_vision_client, _DEFAULT_VISION_MODEL = get_vision_auxiliary_client()
-except Exception as _init_err:
-    logger.debug("Could not initialise vision auxiliary client: %s", _init_err)
-    _aux_vision_client, _DEFAULT_VISION_MODEL = None, None
-
-# Text client — for page snapshot summarization (same config as web_extract)
-try:
-    _aux_text_client, _DEFAULT_TEXT_MODEL = get_text_auxiliary_client("web_extract")
-except Exception as _init_err:
-    logger.debug("Could not initialise text auxiliary client: %s", _init_err)
-    _aux_text_client, _DEFAULT_TEXT_MODEL = None, None
-
-# Module-level alias for availability checks
-EXTRACTION_MODEL = _DEFAULT_TEXT_MODEL or _DEFAULT_VISION_MODEL
-
-
-def _get_vision_model() -> str:
-    """Model for browser_vision (screenshot analysis — multimodal)."""
-    return (os.getenv("AUXILIARY_VISION_MODEL", "").strip()
-            or _DEFAULT_VISION_MODEL
-            or "google/gemini-3-flash-preview")
-
-
-def _get_extraction_model() -> str:
-    """Model for page snapshot text summarization — same as web_extract."""
-    return (os.getenv("AUXILIARY_WEB_EXTRACT_MODEL", "").strip()
-            or _DEFAULT_TEXT_MODEL
-            or "google/gemini-3-flash-preview")
+# Resolve vision auxiliary client for extraction/vision tasks
+_aux_vision_client, EXTRACTION_MODEL = get_vision_auxiliary_client()


 def _is_local_mode() -> bool:
@@ -124,27 +94,9 @@ def _is_local_mode() -> bool:
    return not (os.environ.get("BROWSERBASE_API_KEY") and os.environ.get("BROWSERBASE_PROJECT_ID"))


-def _socket_safe_tmpdir() -> str:
-    """Return a short temp directory path suitable for Unix domain sockets.
-
-    macOS sets ``TMPDIR`` to ``/var/folders/xx/.../T/`` (~51 chars).  When we
-    append ``agent-browser-hermes_…`` the resulting socket path exceeds the
-    104-byte macOS limit for ``AF_UNIX`` addresses, causing agent-browser to
-    fail with "Failed to create socket directory" or silent screenshot failures.
-
-    Linux ``tempfile.gettempdir()`` already returns ``/tmp``, so this is a
-    no-op there.  On macOS we bypass ``TMPDIR`` and use ``/tmp`` directly
-    (symlink to ``/private/tmp``, sticky-bit protected, always available).
-    """
-    if sys.platform == "darwin":
-        return "/tmp"
-    return tempfile.gettempdir()
-
-
 # Track active sessions per task
 # Stores: session_name (always), bb_session_id + cdp_url (cloud mode only)
 _active_sessions: Dict[str, Dict[str, str]] = {}  # task_id -> {session_name, ...}
-_recording_sessions: set = set()  # task_ids with active recordings

 # Flag to track if cleanup has been done
 _cleanup_done = False
@@ -193,7 +145,7 @@ def _emergency_cleanup_all_sessions():
                    try:
                        browser_cmd = _find_agent_browser()
                        task_socket_dir = os.path.join(
-                            _socket_safe_tmpdir(),
+                            tempfile.gettempdir(),
                            f"agent-browser-{session_name}"
                        )
                        env = {**os.environ, "AGENT_BROWSER_SOCKET_DIR": task_socket_dir}
@@ -479,31 +431,11 @@ BROWSER_TOOL_SCHEMAS = [
                "question": {
                    "type": "string",
                    "description": "What you want to know about the page visually. Be specific about what you're looking for."
-                },
-                "annotate": {
-                    "type": "boolean",
-                    "default": False,
-                    "description": "If true, overlay numbered [N] labels on interactive elements. Each [N] maps to ref @eN for subsequent browser commands. Useful for QA and spatial reasoning about page layout."
                }
            },
            "required": ["question"]
        }
    },
-    {
-        "name": "browser_console",
-        "description": "Get browser console output and JavaScript errors from the current page. Returns console.log/warn/error/info messages and uncaught JS exceptions. Use this to detect silent JavaScript errors, failed API calls, and application warnings. Requires browser_navigate to be called first.",
-        "parameters": {
-            "type": "object",
-            "properties": {
-                "clear": {
-                    "type": "boolean",
-                    "default": False,
-                    "description": "If true, clear the message buffers after reading"
-                }
-            },
-            "required": []
-        }
-    },
 ]


@@ -823,7 +755,6 @@ def _run_browser_command(
    try:
        browser_cmd = _find_agent_browser()
    except FileNotFoundError as e:
-        logger.warning("agent-browser CLI not found: %s", e)
        return {"success": False, "error": str(e)}
    
    from tools.interrupt import is_interrupted
@@ -834,7 +765,6 @@ def _run_browser_command(
    try:
        session_info = _get_session_info(task_id)
    except Exception as e:
-        logger.warning("Failed to create browser session for task=%s: %s", task_id, e)
        return {"success": False, "error": f"Failed to create browser session: {str(e)}"}
    
    # Build the command with the appropriate backend flag.
@@ -860,12 +790,10 @@ def _run_browser_command(
        # Without this, parallel workers fight over the same default socket path,
        # causing "Failed to create socket directory: Permission denied" errors.
        task_socket_dir = os.path.join(
-            _socket_safe_tmpdir(),
+            tempfile.gettempdir(), 
            f"agent-browser-{session_info['session_name']}"
        )
-        os.makedirs(task_socket_dir, mode=0o700, exist_ok=True)
-        logger.debug("browser cmd=%s task=%s socket_dir=%s (%d chars)",
-                     command, task_id, task_socket_dir, len(task_socket_dir))
+        os.makedirs(task_socket_dir, exist_ok=True)
        
        browser_env = {**os.environ}
        # Ensure PATH includes standard dirs (systemd services may have minimal PATH)
@@ -907,29 +835,22 @@ def _run_browser_command(
                                       "returncode=%s", result.returncode)
                return parsed
            except json.JSONDecodeError:
-                # Non-JSON output indicates agent-browser crash or version mismatch
-                raw = result.stdout.strip()[:500]
-                logger.warning("browser '%s' returned non-JSON output (rc=%s): %s",
-                               command, result.returncode, raw[:200])
+                # If not valid JSON, return as raw output
                return {
                    "success": True,
-                    "data": {"raw": raw}
+                    "data": {"raw": result.stdout.strip()}
                }
        
        # Check for errors
        if result.returncode != 0:
            error_msg = result.stderr.strip() if result.stderr else f"Command failed with code {result.returncode}"
-            logger.warning("browser '%s' failed (rc=%s): %s", command, result.returncode, error_msg[:300])
            return {"success": False, "error": error_msg}
        
        return {"success": True, "data": {}}
        
    except subprocess.TimeoutExpired:
-        logger.warning("browser '%s' timed out after %ds (task=%s, socket_dir=%s)",
-                       command, timeout, task_id, task_socket_dir)
        return {"success": False, "error": f"Command timed out after {timeout} seconds"}
    except Exception as e:
-        logger.warning("browser '%s' exception: %s", command, e, exc_info=True)
        return {"success": False, "error": str(e)}


@@ -939,9 +860,9 @@ def _extract_relevant_content(
 ) -> str:
    """Use LLM to extract relevant content from a snapshot based on the user's task.

-    Falls back to simple truncation when no auxiliary text model is configured.
+    Falls back to simple truncation when no auxiliary vision model is configured.
    """
-    if _aux_text_client is None:
+    if _aux_vision_client is None or EXTRACTION_MODEL is None:
        return _truncate_snapshot(snapshot_text)

    if user_task:
@@ -969,8 +890,8 @@ def _extract_relevant_content(

    try:
        from agent.auxiliary_client import auxiliary_max_tokens_param
-        response = _aux_text_client.chat.completions.create(
-            model=_get_extraction_model(),
+        response = _aux_vision_client.chat.completions.create(
+            model=EXTRACTION_MODEL,
            messages=[{"role": "user", "content": extraction_prompt}],
            **auxiliary_max_tokens_param(4000),
            temperature=0.1,
@@ -1019,10 +940,9 @@ def browser_navigate(url: str, task_id: Optional[str] = None) -> str:
    session_info = _get_session_info(effective_task_id)
    is_first_nav = session_info.get("_first_nav", True)
    
-    # Auto-start recording if configured and this is first navigation
+    # Mark that we've done at least one navigation
    if is_first_nav:
        session_info["_first_nav"] = False
-        _maybe_start_recording(effective_task_id)
    
    result = _run_browser_command(effective_task_id, "open", [url], timeout=60)
    
@@ -1286,10 +1206,6 @@ def browser_close(task_id: Optional[str] = None) -> str:
        JSON string with close result
    """
    effective_task_id = task_id or "default"
-    
-    # Stop auto-recording before closing
-    _maybe_stop_recording(effective_task_id)
-    
    result = _run_browser_command(effective_task_id, "close", [])
    
    # Close the backend session (Browserbase API in cloud mode, nothing extra in local mode)
@@ -1320,103 +1236,6 @@ def browser_close(task_id: Optional[str] = None) -> str:
        }, ensure_ascii=False)


-def browser_console(clear: bool = False, task_id: Optional[str] = None) -> str:
-    """Get browser console messages and JavaScript errors.
-    
-    Returns both console output (log/warn/error/info from the page's JS)
-    and uncaught exceptions (crashes, unhandled promise rejections).
-    
-    Args:
-        clear: If True, clear the message/error buffers after reading
-        task_id: Task identifier for session isolation
-        
-    Returns:
-        JSON string with console messages and JS errors
-    """
-    effective_task_id = task_id or "default"
-    
-    console_args = ["--clear"] if clear else []
-    error_args = ["--clear"] if clear else []
-    
-    console_result = _run_browser_command(effective_task_id, "console", console_args)
-    errors_result = _run_browser_command(effective_task_id, "errors", error_args)
-    
-    messages = []
-    if console_result.get("success"):
-        for msg in console_result.get("data", {}).get("messages", []):
-            messages.append({
-                "type": msg.get("type", "log"),
-                "text": msg.get("text", ""),
-                "source": "console",
-            })
-    
-    errors = []
-    if errors_result.get("success"):
-        for err in errors_result.get("data", {}).get("errors", []):
-            errors.append({
-                "message": err.get("message", ""),
-                "source": "exception",
-            })
-    
-    return json.dumps({
-        "success": True,
-        "console_messages": messages,
-        "js_errors": errors,
-        "total_messages": len(messages),
-        "total_errors": len(errors),
-    }, ensure_ascii=False)
-
-
-def _maybe_start_recording(task_id: str):
-    """Start recording if browser.record_sessions is enabled in config."""
-    if task_id in _recording_sessions:
-        return
-    try:
-        hermes_home = Path(os.environ.get("HERMES_HOME", Path.home() / ".hermes"))
-        config_path = hermes_home / "config.yaml"
-        record_enabled = False
-        if config_path.exists():
-            import yaml
-            with open(config_path) as f:
-                cfg = yaml.safe_load(f) or {}
-            record_enabled = cfg.get("browser", {}).get("record_sessions", False)
-        
-        if not record_enabled:
-            return
-        
-        recordings_dir = hermes_home / "browser_recordings"
-        recordings_dir.mkdir(parents=True, exist_ok=True)
-        _cleanup_old_recordings(max_age_hours=72)
-        
-        import time
-        timestamp = time.strftime("%Y%m%d_%H%M%S")
-        recording_path = recordings_dir / f"session_{timestamp}_{task_id[:16]}.webm"
-        
-        result = _run_browser_command(task_id, "record", ["start", str(recording_path)])
-        if result.get("success"):
-            _recording_sessions.add(task_id)
-            logger.info("Auto-recording browser session %s to %s", task_id, recording_path)
-        else:
-            logger.debug("Could not start auto-recording: %s", result.get("error"))
-    except Exception as e:
-        logger.debug("Auto-recording setup failed: %s", e)
-
-
-def _maybe_stop_recording(task_id: str):
-    """Stop recording if one is active for this session."""
-    if task_id not in _recording_sessions:
-        return
-    try:
-        result = _run_browser_command(task_id, "record", ["stop"])
-        if result.get("success"):
-            path = result.get("data", {}).get("path", "")
-            logger.info("Saved browser recording for session %s: %s", task_id, path)
-    except Exception as e:
-        logger.debug("Could not stop recording for %s: %s", task_id, e)
-    finally:
-        _recording_sessions.discard(task_id)
-
-
 def browser_get_images(task_id: Optional[str] = None) -> str:
    """
    Get all images on the current page.
@@ -1471,7 +1290,7 @@ def browser_get_images(task_id: Optional[str] = None) -> str:
        }, ensure_ascii=False)


-def browser_vision(question: str, annotate: bool = False, task_id: Optional[str] = None) -> str:
+def browser_vision(question: str, task_id: Optional[str] = None) -> str:
    """
    Take a screenshot of the current page and analyze it with vision AI.
    
@@ -1485,7 +1304,6 @@ def browser_vision(question: str, annotate: bool = False, task_id: Optional[str]
    
    Args:
        question: What you want to know about the page visually
-        annotate: If True, overlay numbered [N] labels on interactive elements
        task_id: Task identifier for session isolation
        
    Returns:
@@ -1498,7 +1316,7 @@ def browser_vision(question: str, annotate: bool = False, task_id: Optional[str]
    effective_task_id = task_id or "default"
    
    # Check auxiliary vision client
-    if _aux_vision_client is None or _DEFAULT_VISION_MODEL is None:
+    if _aux_vision_client is None or EXTRACTION_MODEL is None:
        return json.dumps({
            "success": False,
            "error": "Browser vision unavailable: no auxiliary vision model configured. "
@@ -1517,35 +1335,24 @@ def browser_vision(question: str, annotate: bool = False, task_id: Optional[str]
        _cleanup_old_screenshots(screenshots_dir, max_age_hours=24)
        
        # Take screenshot using agent-browser
-        screenshot_args = [str(screenshot_path)]
-        if annotate:
-            screenshot_args.insert(0, "--annotate")
        result = _run_browser_command(
            effective_task_id, 
            "screenshot", 
-            screenshot_args,
+            [str(screenshot_path)],
            timeout=30
        )
        
        if not result.get("success"):
-            error_detail = result.get("error", "Unknown error")
-            mode = "local" if _is_local_mode() else "cloud"
            return json.dumps({
                "success": False,
-                "error": f"Failed to take screenshot ({mode} mode): {error_detail}"
+                "error": f"Failed to take screenshot: {result.get('error', 'Unknown error')}"
            }, ensure_ascii=False)
        
        # Check if screenshot file was created
        if not screenshot_path.exists():
-            mode = "local" if _is_local_mode() else "cloud"
            return json.dumps({
                "success": False,
-                "error": (
-                    f"Screenshot file was not created at {screenshot_path} ({mode} mode). "
-                    f"This may indicate a socket path issue (macOS /var/folders/), "
-                    f"a missing Chromium install ('agent-browser install'), "
-                    f"or a stale daemon process."
-                ),
+                "error": "Screenshot file was not created"
            }, ensure_ascii=False)
        
        # Read and convert to base64
@@ -1564,11 +1371,8 @@ def browser_vision(question: str, annotate: bool = False, task_id: Optional[str]

        # Use the sync auxiliary vision client directly
        from agent.auxiliary_client import auxiliary_max_tokens_param
-        vision_model = _get_vision_model()
-        logger.debug("browser_vision: analysing screenshot (%d bytes) with model=%s",
-                     len(image_data), vision_model)
        response = _aux_vision_client.chat.completions.create(
-            model=vision_model,
+            model=EXTRACTION_MODEL,
            messages=[
                {
                    "role": "user",
@@ -1583,27 +1387,23 @@ def browser_vision(question: str, annotate: bool = False, task_id: Optional[str]
        )
        
        analysis = response.choices[0].message.content
-        response_data = {
+        return json.dumps({
            "success": True,
            "analysis": analysis,
            "screenshot_path": str(screenshot_path),
-        }
-        # Include annotation data if annotated screenshot was taken
-        if annotate and result.get("data", {}).get("annotations"):
-            response_data["annotations"] = result["data"]["annotations"]
-        return json.dumps(response_data, ensure_ascii=False)
+        }, ensure_ascii=False)
    
    except Exception as e:
-        # Keep the screenshot if it was captured successfully — the failure is
-        # in the LLM vision analysis, not the capture.  Deleting a valid
-        # screenshot loses evidence the user might need.  The 24-hour cleanup
-        # in _cleanup_old_screenshots prevents unbounded disk growth.
-        logger.warning("browser_vision failed: %s", e, exc_info=True)
-        error_info = {"success": False, "error": f"Error during vision analysis: {str(e)}"}
+        # Clean up screenshot on failure
        if screenshot_path.exists():
-            error_info["screenshot_path"] = str(screenshot_path)
-            error_info["note"] = "Screenshot was captured but vision analysis failed. You can still share it via MEDIA:<path>."
-        return json.dumps(error_info, ensure_ascii=False)
+            try:
+                screenshot_path.unlink()
+            except Exception:
+                pass
+        return json.dumps({
+            "success": False,
+            "error": f"Error during vision analysis: {str(e)}"
+        }, ensure_ascii=False)


 def _cleanup_old_screenshots(screenshots_dir, max_age_hours=24):
@@ -1621,25 +1421,6 @@ def _cleanup_old_screenshots(screenshots_dir, max_age_hours=24):
        pass  # Non-critical — don't fail the screenshot operation


-def _cleanup_old_recordings(max_age_hours=72):
-    """Remove browser recordings older than max_age_hours to prevent disk bloat."""
-    import time
-    try:
-        hermes_home = Path(os.environ.get("HERMES_HOME", Path.home() / ".hermes"))
-        recordings_dir = hermes_home / "browser_recordings"
-        if not recordings_dir.exists():
-            return
-        cutoff = time.time() - (max_age_hours * 3600)
-        for f in recordings_dir.glob("session_*.webm"):
-            try:
-                if f.stat().st_mtime < cutoff:
-                    f.unlink()
-            except Exception:
-                pass
-    except Exception:
-        pass
-
-
 # ============================================================================
 # Cleanup and Management Functions
 # ============================================================================
@@ -1711,9 +1492,6 @@ def cleanup_browser(task_id: Optional[str] = None) -> None:
        bb_session_id = session_info.get("bb_session_id", "unknown")
        logger.debug("Found session for task %s: bb_session_id=%s", task_id, bb_session_id)
        
-        # Stop auto-recording before closing (saves the file)
-        _maybe_stop_recording(task_id)
-        
        # Try to close via agent-browser first (needs session in _active_sessions)
        try:
            _run_browser_command(task_id, "close", [], timeout=10)
@@ -1739,7 +1517,7 @@ def cleanup_browser(task_id: Optional[str] = None) -> None:
        # Kill the daemon process and clean up socket directory
        session_name = session_info.get("session_name", "")
        if session_name:
-            socket_dir = os.path.join(_socket_safe_tmpdir(), f"agent-browser-{session_name}")
+            socket_dir = os.path.join(tempfile.gettempdir(), f"agent-browser-{session_name}")
            if os.path.exists(socket_dir):
                # agent-browser writes {session}.pid in the socket dir
                pid_file = os.path.join(socket_dir, f"{session_name}.pid")
@@ -1929,13 +1707,6 @@ registry.register(
    name="browser_vision",
    toolset="browser",
    schema=_BROWSER_SCHEMA_MAP["browser_vision"],
-    handler=lambda args, **kw: browser_vision(question=args.get("question", ""), annotate=args.get("annotate", False), task_id=kw.get("task_id")),
-    check_fn=check_browser_requirements,
-)
-registry.register(
-    name="browser_console",
-    toolset="browser",
-    schema=_BROWSER_SCHEMA_MAP["browser_console"],
-    handler=lambda args, **kw: browser_console(clear=args.get("clear", False), task_id=kw.get("task_id")),
+    handler=lambda args, **kw: browser_vision(question=args.get("question", ""), task_id=kw.get("task_id")),
    check_fn=check_browser_requirements,
 )
@@ -385,11 +385,7 @@ def execute_code(

    # --- Set up temp directory with hermes_tools.py and script.py ---
    tmpdir = tempfile.mkdtemp(prefix="hermes_sandbox_")
-    # Use /tmp on macOS to avoid the long /var/folders/... path that pushes
-    # Unix domain socket paths past the 104-byte macOS AF_UNIX limit.
-    # On Linux, tempfile.gettempdir() already returns /tmp.
-    _sock_tmpdir = "/tmp" if sys.platform == "darwin" else tempfile.gettempdir()
-    sock_path = os.path.join(_sock_tmpdir, f"hermes_rpc_{uuid.uuid4().hex}.sock")
+    sock_path = os.path.join(tempfile.gettempdir(), f"hermes_rpc_{uuid.uuid4().hex}.sock")

    tool_call_log: list = []
    tool_call_counter = [0]  # mutable so the RPC thread can increment
@@ -457,17 +453,11 @@ def execute_code(

        # --- Poll loop: watch for exit, timeout, and interrupt ---
        deadline = time.monotonic() + timeout
+        stdout_chunks: list = []
        stderr_chunks: list = []

-        # Background readers to avoid pipe buffer deadlocks.
-        # For stdout we use a head+tail strategy: keep the first HEAD_BYTES
-        # and a rolling window of the last TAIL_BYTES so the final print()
-        # output is never lost.  Stderr keeps head-only (errors appear early).
-        _STDOUT_HEAD_BYTES = int(MAX_STDOUT_BYTES * 0.4)   # 40% head
-        _STDOUT_TAIL_BYTES = MAX_STDOUT_BYTES - _STDOUT_HEAD_BYTES  # 60% tail
-
+        # Background readers to avoid pipe buffer deadlocks
        def _drain(pipe, chunks, max_bytes):
-            """Simple head-only drain (used for stderr)."""
            total = 0
            try:
                while True:
@@ -481,48 +471,8 @@ def execute_code(
            except (ValueError, OSError):
                pass

-        stdout_total_bytes = [0]  # mutable ref for total bytes seen
-
-        def _drain_head_tail(pipe, head_chunks, tail_chunks, head_bytes, tail_bytes, total_ref):
-            """Drain stdout keeping both head and tail data."""
-            head_collected = 0
-            from collections import deque
-            tail_buf = deque()
-            tail_collected = 0
-            try:
-                while True:
-                    data = pipe.read(4096)
-                    if not data:
-                        break
-                    total_ref[0] += len(data)
-                    # Fill head buffer first
-                    if head_collected < head_bytes:
-                        keep = min(len(data), head_bytes - head_collected)
-                        head_chunks.append(data[:keep])
-                        head_collected += keep
-                        data = data[keep:]  # remaining goes to tail
-                        if not data:
-                            continue
-                    # Everything past head goes into rolling tail buffer
-                    tail_buf.append(data)
-                    tail_collected += len(data)
-                    # Evict old tail data to stay within tail_bytes budget
-                    while tail_collected > tail_bytes and tail_buf:
-                        oldest = tail_buf.popleft()
-                        tail_collected -= len(oldest)
-            except (ValueError, OSError):
-                pass
-            # Transfer final tail to output list
-            tail_chunks.extend(tail_buf)
-
-        stdout_head_chunks: list = []
-        stdout_tail_chunks: list = []
-
        stdout_reader = threading.Thread(
-            target=_drain_head_tail,
-            args=(proc.stdout, stdout_head_chunks, stdout_tail_chunks,
-                  _STDOUT_HEAD_BYTES, _STDOUT_TAIL_BYTES, stdout_total_bytes),
-            daemon=True
+            target=_drain, args=(proc.stdout, stdout_chunks, MAX_STDOUT_BYTES), daemon=True
        )
        stderr_reader = threading.Thread(
            target=_drain, args=(proc.stderr, stderr_chunks, MAX_STDERR_BYTES), daemon=True
@@ -546,21 +496,12 @@ def execute_code(
        stdout_reader.join(timeout=3)
        stderr_reader.join(timeout=3)

-        stdout_head = b"".join(stdout_head_chunks).decode("utf-8", errors="replace")
-        stdout_tail = b"".join(stdout_tail_chunks).decode("utf-8", errors="replace")
+        stdout_text = b"".join(stdout_chunks).decode("utf-8", errors="replace")
        stderr_text = b"".join(stderr_chunks).decode("utf-8", errors="replace")

-        # Assemble stdout with head+tail truncation
-        total_stdout = stdout_total_bytes[0]
-        if total_stdout > MAX_STDOUT_BYTES and stdout_tail:
-            omitted = total_stdout - len(stdout_head) - len(stdout_tail)
-            truncated_notice = (
-                f"\n\n... [OUTPUT TRUNCATED - {omitted:,} chars omitted "
-                f"out of {total_stdout:,} total] ...\n\n"
-            )
-            stdout_text = stdout_head + truncated_notice + stdout_tail
-        else:
-            stdout_text = stdout_head + stdout_tail
+        # Truncation notice
+        if len(stdout_text) >= MAX_STDOUT_BYTES:
+            stdout_text = stdout_text[:MAX_STDOUT_BYTES] + "\n[output truncated at 50KB]"

        exit_code = proc.returncode if proc.returncode is not None else -1
        duration = round(time.monotonic() - exec_start, 2)
@@ -102,9 +102,7 @@ def schedule_cronjob(
                 - "local": Save to local files only (~/.hermes/cron/output/)
                 - "telegram": Send to Telegram home channel
                 - "discord": Send to Discord home channel
-                 - "signal": Send to Signal home channel
                 - "telegram:123456": Send to specific chat ID
-                 - "signal:+15551234567": Send to specific Signal number
    
    Returns:
        JSON with job_id, next_run time, and confirmation
@@ -218,7 +216,7 @@ Use for: reminders, periodic checks, scheduled reports, automated maintenance.""
            },
            "deliver": {
                "type": "string",
-                "description": "Where to send output: 'origin' (back to this chat), 'local' (files only), 'telegram', 'discord', 'signal', or 'platform:chat_id'"
+                "description": "Where to send output: 'origin' (back to this chat), 'local' (files only), 'telegram', 'discord', or 'platform:chat_id'"
            }
        },
        "required": ["prompt", "schedule"]
@@ -50,7 +50,7 @@ class ModalEnvironment(BaseEnvironment):
    def __init__(
        self,
        image: str,
-        cwd: str = "~",
+        cwd: str = "/root",
        timeout: int = 60,
        modal_sandbox_kwargs: Optional[Dict[str, Any]] = None,
        persistent_filesystem: bool = True,
@@ -95,6 +95,7 @@ class ModalEnvironment(BaseEnvironment):
            startup_timeout=180.0,
            runtime_timeout=3600.0,
            modal_sandbox_kwargs=sandbox_kwargs,
+            install_pipx=True,  # Required: installs pipx + swe-rex runtime (swerex-remote)
        )

    def execute(self, command: str, cwd: str = "", *,
@@ -819,14 +819,6 @@ class ShellFileOperations(FileOperations):
        # Expand ~ and other shell paths
        path = self._expand_path(path)
        
-        # Validate that the path exists before searching
-        check = self._exec(f"test -e {self._escape_shell_arg(path)} && echo exists || echo not_found")
-        if "not_found" in check.stdout:
-            return SearchResult(
-                error=f"Path not found: {path}. Verify the path exists (use 'terminal' to check).",
-                total_count=0
-            )
-        
        if target == "files":
            return self._search_files(pattern, path, limit, offset)
        else:
@@ -927,11 +919,6 @@ class ShellFileOperations(FileOperations):
        cmd = " ".join(cmd_parts)
        result = self._exec(cmd, timeout=60)
        
-        # rg exit codes: 0=matches found, 1=no matches, 2=error
-        if result.exit_code == 2 and not result.stdout.strip():
-            error_msg = result.stderr.strip() if hasattr(result, 'stderr') and result.stderr else "Search error"
-            return SearchResult(error=f"Search failed: {error_msg}", total_count=0)
-        
        # Parse results based on output mode
        if output_mode == "files_only":
            all_files = [f for f in result.stdout.strip().split('\n') if f]
@@ -1026,11 +1013,6 @@ class ShellFileOperations(FileOperations):
        cmd = " ".join(cmd_parts)
        result = self._exec(cmd, timeout=60)
        
-        # grep exit codes: 0=matches found, 1=no matches, 2=error
-        if result.exit_code == 2 and not result.stdout.strip():
-            error_msg = result.stderr.strip() if hasattr(result, 'stderr') and result.stderr else "Search error"
-            return SearchResult(error=f"Search failed: {error_msg}", total_count=0)
-        
        if output_mode == "files_only":
            all_files = [f for f in result.stdout.strip().split('\n') if f]
            total = len(all_files)
@@ -7,7 +7,6 @@ import os
 import threading
 from typing import Optional
 from tools.file_operations import ShellFileOperations
-from agent.redact import redact_sensitive_text

 logger = logging.getLogger(__name__)

@@ -129,8 +128,6 @@ def read_file_tool(path: str, offset: int = 1, limit: int = 500, task_id: str =
    try:
        file_ops = _get_file_ops(task_id)
        result = file_ops.read_file(path, offset, limit)
-        if result.content:
-            result.content = redact_sensitive_text(result.content)
        return json.dumps(result.to_dict(), ensure_ascii=False)
    except Exception as e:
        return json.dumps({"error": str(e)}, ensure_ascii=False)
@@ -167,13 +164,7 @@ def patch_tool(mode: str = "replace", path: str = None, old_string: str = None,
        else:
            return json.dumps({"error": f"Unknown mode: {mode}"})
        
-        result_dict = result.to_dict()
-        result_json = json.dumps(result_dict, ensure_ascii=False)
-        # Hint when old_string not found — saves iterations where the agent
-        # retries with stale content instead of re-reading the file.
-        if result_dict.get("error") and "Could not find" in str(result_dict["error"]):
-            result_json += "\n\n[Hint: old_string not found. Use read_file to verify the current content, or search_files to locate the text.]"
-        return result_json
+        return json.dumps(result.to_dict(), ensure_ascii=False)
    except Exception as e:
        return json.dumps({"error": str(e)}, ensure_ascii=False)

@@ -189,18 +180,7 @@ def search_tool(pattern: str, target: str = "content", path: str = ".",
            pattern=pattern, path=path, target=target, file_glob=file_glob,
            limit=limit, offset=offset, output_mode=output_mode, context=context
        )
-        if hasattr(result, 'matches'):
-            for m in result.matches:
-                if hasattr(m, 'content') and m.content:
-                    m.content = redact_sensitive_text(m.content)
-        result_dict = result.to_dict()
-        result_json = json.dumps(result_dict, ensure_ascii=False)
-        # Hint when results were truncated — explicit next offset is clearer
-        # than relying on the model to infer it from total_count vs match count.
-        if result_dict.get("truncated"):
-            next_offset = offset + limit
-            result_json += f"\n\n[Hint: Results truncated. Use offset={next_offset} to see more, or narrow with a more specific pattern or file_glob.]"
-        return result_json
+        return json.dumps(result.to_dict(), ensure_ascii=False)
    except Exception as e:
        return json.dumps({"error": str(e)}, ensure_ascii=False)

@@ -8,7 +8,6 @@ human-friendly channel names to IDs. Works in both CLI and gateway contexts.
 import json
 import logging
 import os
-import time

 logger = logging.getLogger(__name__)

@@ -33,7 +32,7 @@ SEND_MESSAGE_SCHEMA = {
            },
            "target": {
                "type": "string",
-                "description": "Delivery target. Format: 'platform' (uses home channel), 'platform:#channel-name', or 'platform:chat_id'. Examples: 'telegram', 'discord:#bot-home', 'slack:#engineering', 'signal:+15551234567'"
+                "description": "Delivery target. Format: 'platform' (uses home channel), 'platform:#channel-name', or 'platform:chat_id'. Examples: 'telegram', 'discord:#bot-home', 'slack:#engineering'"
            },
            "message": {
                "type": "string",
@@ -108,7 +107,6 @@ def _handle_send(args):
        "discord": Platform.DISCORD,
        "slack": Platform.SLACK,
        "whatsapp": Platform.WHATSAPP,
-        "signal": Platform.SIGNAL,
    }
    platform = platform_map.get(platform_name)
    if not platform:
@@ -162,8 +160,6 @@ async def _send_to_platform(platform, pconfig, chat_id, message):
        return await _send_discord(pconfig.token, chat_id, message)
    elif platform == Platform.SLACK:
        return await _send_slack(pconfig.token, chat_id, message)
-    elif platform == Platform.SIGNAL:
-        return await _send_signal(pconfig.extra, chat_id, message)
    return {"error": f"Direct sending not yet implemented for {platform.value}"}


@@ -223,42 +219,6 @@ async def _send_slack(token, chat_id, message):
        return {"error": f"Slack send failed: {e}"}


-async def _send_signal(extra, chat_id, message):
-    """Send via signal-cli JSON-RPC API."""
-    try:
-        import httpx
-    except ImportError:
-        return {"error": "httpx not installed"}
-    try:
-        http_url = extra.get("http_url", "http://127.0.0.1:8080").rstrip("/")
-        account = extra.get("account", "")
-        if not account:
-            return {"error": "Signal account not configured"}
-
-        params = {"account": account, "message": message}
-        if chat_id.startswith("group:"):
-            params["groupId"] = chat_id[6:]
-        else:
-            params["recipient"] = [chat_id]
-
-        payload = {
-            "jsonrpc": "2.0",
-            "method": "send",
-            "params": params,
-            "id": f"send_{int(time.time() * 1000)}",
-        }
-
-        async with httpx.AsyncClient(timeout=30.0) as client:
-            resp = await client.post(f"{http_url}/api/v1/rpc", json=payload)
-            resp.raise_for_status()
-            data = resp.json()
-            if "error" in data:
-                return {"error": f"Signal RPC error: {data['error']}"}
-            return {"success": True, "platform": "signal", "chat_id": chat_id}
-    except Exception as e:
-        return {"error": f"Signal send failed: {e}"}
-
-
 def _check_send_message():
    """Gate send_message on gateway running (always available on messaging platforms)."""
    platform = os.getenv("HERMES_SESSION_PLATFORM", "")
@@ -69,36 +69,10 @@ def _read_manifest() -> Dict[str, str]:


 def _write_manifest(entries: Dict[str, str]):
-    """Write the manifest file atomically in v2 format (name:hash).
-
-    Uses a temp file + os.replace() to avoid corruption if the process
-    crashes or is interrupted mid-write.
-    """
-    import tempfile
-
+    """Write the manifest file in v2 format (name:hash)."""
    MANIFEST_FILE.parent.mkdir(parents=True, exist_ok=True)
-    data = "\n".join(f"{name}:{hash_val}" for name, hash_val in sorted(entries.items())) + "\n"
-
-    try:
-        fd, tmp_path = tempfile.mkstemp(
-            dir=str(MANIFEST_FILE.parent),
-            prefix=".bundled_manifest_",
-            suffix=".tmp",
-        )
-        try:
-            with os.fdopen(fd, "w", encoding="utf-8") as f:
-                f.write(data)
-                f.flush()
-                os.fsync(f.fileno())
-            os.replace(tmp_path, MANIFEST_FILE)
-        except BaseException:
-            try:
-                os.unlink(tmp_path)
-            except OSError:
-                pass
-            raise
-    except Exception as e:
-        logger.debug("Failed to write skills manifest %s: %s", MANIFEST_FILE, e, exc_info=True)
+    lines = [f"{name}:{hash_val}" for name, hash_val in sorted(entries.items())]
+    MANIFEST_FILE.write_text("\n".join(lines) + "\n", encoding="utf-8")


 def _discover_bundled_skills(bundled_dir: Path) -> List[Tuple[str, Path]]:
@@ -415,7 +415,7 @@ def _get_env_config() -> Dict[str, Any]:
    if env_type == "local":
        default_cwd = os.getcwd()
    else:
-        default_cwd = "~"
+        default_cwd = "/root"
    
    # Read TERMINAL_CWD but sanity-check it for container backends.
    # If the CWD looks like a host-local path that can't exist inside a
@@ -424,7 +424,7 @@ def _get_env_config() -> Dict[str, Any]:
    # SSH is excluded since /home/ paths are valid on remote machines.
    cwd = os.getenv("TERMINAL_CWD", default_cwd)
    if env_type in ("modal", "docker", "singularity", "daytona") and cwd:
-        host_prefixes = ("/Users/", "C:\\", "C:/")
+        host_prefixes = ("/Users/", "/home/", "C:\\", "C:/")
        if any(cwd.startswith(p) for p in host_prefixes) and cwd != default_cwd:
            logger.info("Ignoring TERMINAL_CWD=%r for %s backend "
                        "(host path won't exist in sandbox). Using %r instead.",
@@ -504,7 +504,12 @@ def _create_environment(env_type: str, image: str, cwd: str, timeout: int,
        if memory > 0:
            sandbox_kwargs["memory"] = memory
        if disk > 0:
-            sandbox_kwargs["ephemeral_disk"] = disk
+            try:
+                import inspect, modal
+                if "ephemeral_disk" in inspect.signature(modal.Sandbox.create).parameters:
+                    sandbox_kwargs["ephemeral_disk"] = disk
+            except Exception:
+                pass
        
        return _ModalEnvironment(
            image=image, cwd=cwd, timeout=timeout,
@@ -468,9 +468,7 @@ def _handle_vision_analyze(args, **kw):
    image_url = args.get("image_url", "")
    question = args.get("question", "")
    full_prompt = f"Fully describe and explain everything about this image, then answer the following question:\n\n{question}"
-    model = (os.getenv("AUXILIARY_VISION_MODEL", "").strip()
-             or DEFAULT_VISION_MODEL
-             or "google/gemini-3-flash-preview")
+    model = DEFAULT_VISION_MODEL or "google/gemini-3-flash-preview"
    return vision_analyze_tool(image_url, full_prompt, model)


@@ -85,13 +85,7 @@ DEFAULT_MIN_LENGTH_FOR_SUMMARIZATION = 5000

 # Resolve async auxiliary client at module level.
 # Handles Codex Responses API adapter transparently.
-_aux_async_client, _DEFAULT_SUMMARIZER_MODEL = get_async_text_auxiliary_client("web_extract")
-
-# Allow per-task override via config.yaml auxiliary.web_extract_model
-DEFAULT_SUMMARIZER_MODEL = (
-    os.getenv("AUXILIARY_WEB_EXTRACT_MODEL", "").strip()
-    or _DEFAULT_SUMMARIZER_MODEL
-)
+_aux_async_client, DEFAULT_SUMMARIZER_MODEL = get_async_text_auxiliary_client()

 _debug = DebugSession("web_tools", env_var="WEB_TOOLS_DEBUG")

@@ -1,150 +0,0 @@
---
-sidebar_position: 3
-title: 'Learning Path'
-description: 'Choose your learning path through the Hermes Agent documentation based on your experience level and goals.'
---
-
-# Learning Path
-
-Hermes Agent can do a lot — CLI assistant, Telegram/Discord bot, task automation, RL training, and more. This page helps you figure out where to start and what to read based on your experience level and what you're trying to accomplish.
-
-:::tip Start Here
-If you haven't installed Hermes Agent yet, begin with the [Installation guide](/docs/getting-started/installation) and then run through the [Quickstart](/docs/getting-started/quickstart). Everything below assumes you have a working installation.
-:::
-
-## How to Use This Page
-
- **Know your level?** Jump to the [experience-level table](#by-experience-level) and follow the reading order for your tier.
- **Have a specific goal?** Skip to [By Use Case](#by-use-case) and find the scenario that matches.
- **Just browsing?** Check the [Key Features](#key-features-at-a-glance) table for a quick overview of everything Hermes Agent can do.
-
-## By Experience Level
-
-| Level | Goal | Recommended Reading | Time Estimate |
-|---|---|---|---|
-| **Beginner** | Get up and running, have basic conversations, use built-in tools | [Installation](/docs/getting-started/installation) → [Quickstart](/docs/getting-started/quickstart) → [CLI Usage](/docs/user-guide/cli) → [Configuration](/docs/user-guide/configuration) | ~1 hour |
-| **Intermediate** | Set up messaging bots, use advanced features like memory, cron jobs, and skills | [Sessions](/docs/user-guide/sessions) → [Messaging](/docs/user-guide/messaging) → [Tools](/docs/user-guide/features/tools) → [Skills](/docs/user-guide/features/skills) → [Memory](/docs/user-guide/features/memory) → [Cron](/docs/user-guide/features/cron) | ~2–3 hours |
-| **Advanced** | Build custom tools, create skills, train models with RL, contribute to the project | [Architecture](/docs/developer-guide/architecture) → [Adding Tools](/docs/developer-guide/adding-tools) → [Creating Skills](/docs/developer-guide/creating-skills) → [RL Training](/docs/user-guide/features/rl-training) → [Contributing](/docs/developer-guide/contributing) | ~4–6 hours |
-
-## By Use Case
-
-Pick the scenario that matches what you want to do. Each one links you to the relevant docs in the order you should read them.
-
-### "I want a CLI coding assistant"
-
-Use Hermes Agent as an interactive terminal assistant for writing, reviewing, and running code.
-
-1. [Installation](/docs/getting-started/installation)
-2. [Quickstart](/docs/getting-started/quickstart)
-3. [CLI Usage](/docs/user-guide/cli)
-4. [Code Execution](/docs/user-guide/features/code-execution)
-5. [Context Files](/docs/user-guide/features/context-files)
-6. [Tips & Tricks](/docs/guides/tips)
-
-:::tip
-Pass files directly into your conversation with context files. Hermes Agent can read, edit, and run code in your projects.
-:::
-
-### "I want a Telegram/Discord bot"
-
-Deploy Hermes Agent as a bot on your favorite messaging platform.
-
-1. [Installation](/docs/getting-started/installation)
-2. [Configuration](/docs/user-guide/configuration)
-3. [Messaging Overview](/docs/user-guide/messaging)
-4. [Telegram Setup](/docs/user-guide/messaging/telegram)
-5. [Discord Setup](/docs/user-guide/messaging/discord)
-6. [Security](/docs/user-guide/security)
-
-For full project examples, see:
- [Daily Briefing Bot](/docs/guides/daily-briefing-bot)
- [Team Telegram Assistant](/docs/guides/team-telegram-assistant)
-
-### "I want to automate tasks"
-
-Schedule recurring tasks, run batch jobs, or chain agent actions together.
-
-1. [Quickstart](/docs/getting-started/quickstart)
-2. [Cron Scheduling](/docs/user-guide/features/cron)
-3. [Batch Processing](/docs/user-guide/features/batch-processing)
-4. [Delegation](/docs/user-guide/features/delegation)
-5. [Hooks](/docs/user-guide/features/hooks)
-
-:::tip
-Cron jobs let Hermes Agent run tasks on a schedule — daily summaries, periodic checks, automated reports — without you being present.
-:::
-
-### "I want to build custom tools/skills"
-
-Extend Hermes Agent with your own tools and reusable skill packages.
-
-1. [Tools Overview](/docs/user-guide/features/tools)
-2. [Skills Overview](/docs/user-guide/features/skills)
-3. [MCP (Model Context Protocol)](/docs/user-guide/features/mcp)
-4. [Architecture](/docs/developer-guide/architecture)
-5. [Adding Tools](/docs/developer-guide/adding-tools)
-6. [Creating Skills](/docs/developer-guide/creating-skills)
-
-:::tip
-Tools are individual functions the agent can call. Skills are bundles of tools, prompts, and configuration packaged together. Start with tools, graduate to skills.
-:::
-
-### "I want to train models"
-
-Use reinforcement learning to fine-tune model behavior with Hermes Agent's built-in RL training pipeline.
-
-1. [Quickstart](/docs/getting-started/quickstart)
-2. [Configuration](/docs/user-guide/configuration)
-3. [RL Training](/docs/user-guide/features/rl-training)
-4. [Provider Routing](/docs/user-guide/features/provider-routing)
-5. [Architecture](/docs/developer-guide/architecture)
-
-:::tip
-RL training works best when you already understand the basics of how Hermes Agent handles conversations and tool calls. Run through the Beginner path first if you're new.
-:::
-
-### "I want to use it as a Python library"
-
-Integrate Hermes Agent into your own Python applications programmatically.
-
-1. [Installation](/docs/getting-started/installation)
-2. [Quickstart](/docs/getting-started/quickstart)
-3. [Python Library Guide](/docs/guides/python-library)
-4. [Architecture](/docs/developer-guide/architecture)
-5. [Tools](/docs/user-guide/features/tools)
-6. [Sessions](/docs/user-guide/sessions)
-
-## Key Features at a Glance
-
-Not sure what's available? Here's a quick directory of major features:
-
-| Feature | What It Does | Link |
-|---|---|---|
-| **Tools** | Built-in tools the agent can call (file I/O, search, shell, etc.) | [Tools](/docs/user-guide/features/tools) |
-| **Skills** | Installable plugin packages that add new capabilities | [Skills](/docs/user-guide/features/skills) |
-| **Memory** | Persistent memory across sessions | [Memory](/docs/user-guide/features/memory) |
-| **Context Files** | Feed files and directories into conversations | [Context Files](/docs/user-guide/features/context-files) |
-| **MCP** | Connect to external tool servers via Model Context Protocol | [MCP](/docs/user-guide/features/mcp) |
-| **Cron** | Schedule recurring agent tasks | [Cron](/docs/user-guide/features/cron) |
-| **Delegation** | Spawn sub-agents for parallel work | [Delegation](/docs/user-guide/features/delegation) |
-| **Code Execution** | Run code in sandboxed environments | [Code Execution](/docs/user-guide/features/code-execution) |
-| **Browser** | Web browsing and scraping | [Browser](/docs/user-guide/features/browser) |
-| **Hooks** | Event-driven callbacks and middleware | [Hooks](/docs/user-guide/features/hooks) |
-| **Batch Processing** | Process multiple inputs in bulk | [Batch Processing](/docs/user-guide/features/batch-processing) |
-| **RL Training** | Fine-tune models with reinforcement learning | [RL Training](/docs/user-guide/features/rl-training) |
-| **Provider Routing** | Route requests across multiple LLM providers | [Provider Routing](/docs/user-guide/features/provider-routing) |
-
-## What to Read Next
-
-Based on where you are right now:
-
- **Just finished installing?** → Head to the [Quickstart](/docs/getting-started/quickstart) to run your first conversation.
- **Completed the Quickstart?** → Read [CLI Usage](/docs/user-guide/cli) and [Configuration](/docs/user-guide/configuration) to customize your setup.
- **Comfortable with the basics?** → Explore [Tools](/docs/user-guide/features/tools), [Skills](/docs/user-guide/features/skills), and [Memory](/docs/user-guide/features/memory) to unlock the full power of the agent.
- **Setting up for a team?** → Read [Security](/docs/user-guide/security) and [Sessions](/docs/user-guide/sessions) to understand access control and conversation management.
- **Ready to build?** → Jump into the [Developer Guide](/docs/developer-guide/architecture) to understand the internals and start contributing.
- **Want practical examples?** → Check out the [Guides](/docs/guides/tips) section for real-world projects and tips.
-
-:::tip
-You don't need to read everything. Pick the path that matches your goal, follow the links in order, and you'll be productive quickly. You can always come back to this page to find your next step.
-:::
@@ -1,6 +0,0 @@
-{
-  "label": "Guides & Tutorials",
-  "position": 2,
-  "collapsible": true,
-  "collapsed": false
-}
@@ -1,263 +0,0 @@
---
-sidebar_position: 2
-title: "Tutorial: Daily Briefing Bot"
-description: "Build an automated daily briefing bot that researches topics, summarizes findings, and delivers them to Telegram or Discord every morning"
---
-
-# Tutorial: Build a Daily Briefing Bot
-
-In this tutorial, you'll build a personal briefing bot that wakes up every morning, researches topics you care about, summarizes the findings, and delivers a concise briefing straight to your Telegram or Discord.
-
-By the end, you'll have a fully automated workflow combining **web search**, **cron scheduling**, **delegation**, and **messaging delivery** — no code required.
-
-## What We're Building
-
-Here's the flow:
-
-1. **8:00 AM** — The cron scheduler triggers your job
-2. **Hermes spins up** a fresh agent session with your prompt
-3. **Web search** pulls the latest news on your topics
-4. **Summarization** distills it into a clean briefing format
-5. **Delivery** sends the briefing to your Telegram or Discord
-
-The whole thing runs hands-free. You just read your briefing with your morning coffee.
-
-## Prerequisites
-
-Before starting, make sure you have:
-
- **Hermes Agent installed** — see the [Installation guide](/docs/getting-started/installation)
- **Gateway running** — the gateway daemon handles cron execution:
-  ```bash
-  hermes gateway install   # Install as system service (recommended)
-  # or
-  hermes gateway           # Run in foreground
-  ```
- **Firecrawl API key** — set `FIRECRAWL_API_KEY` in your environment for web search
- **Messaging configured** (optional but recommended) — [Telegram](/docs/user-guide/messaging/telegram) or Discord set up with a home channel
-
-:::tip No messaging? No problem
-You can still follow this tutorial using `deliver: "local"`. Briefings will be saved to `~/.hermes/cron/output/` and you can read them anytime.
-:::
-
-## Step 1: Test the Workflow Manually
-
-Before automating anything, let's make sure the briefing works. Start a chat session:
-
-```bash
-hermes
-```
-
-Then enter this prompt:
-
-```
-Search for the latest news about AI agents and open source LLMs.
-Summarize the top 3 stories in a concise briefing format with links.
-```
-
-Hermes will search the web, read through results, and produce something like:
-
-```
-☀️ Your AI Briefing — March 8, 2026
-
-1. Qwen 3 Released with 235B Parameters
-   Alibaba's latest open-weight model matches GPT-4.5 on several
-   benchmarks while remaining fully open source.
-   → https://qwenlm.github.io/blog/qwen3/
-
-2. LangChain Launches Agent Protocol Standard
-   A new open standard for agent-to-agent communication gains
-   adoption from 15 major frameworks in its first week.
-   → https://blog.langchain.dev/agent-protocol/
-
-3. EU AI Act Enforcement Begins for General-Purpose Models
-   The first compliance deadlines hit, with open source models
-   receiving exemptions under the 10M parameter threshold.
-   → https://artificialintelligenceact.eu/updates/
-
---
-3 stories • Sources searched: 8 • Generated by Hermes Agent
-```
-
-If this works, you're ready to automate it.
-
-:::tip Iterate on the format
-Try different prompts until you get output you love. Add instructions like "use emoji headers" or "keep each summary under 2 sentences." Whatever you settle on goes into the cron job.
-:::
-
-## Step 2: Create the Cron Job
-
-Now let's schedule this to run automatically every morning. You can do this in two ways.
-
-### Option A: Natural Language (in chat)
-
-Just tell Hermes what you want:
-
-```
-Every morning at 8am, search the web for the latest news about AI agents
-and open source LLMs. Summarize the top 3 stories in a concise briefing
-with links. Use a friendly, professional tone. Deliver to telegram.
-```
-
-Hermes will create the cron job for you using the `schedule_cronjob` tool.
-
-### Option B: CLI Slash Command
-
-Use the `/cron` command for more control:
-
-```
-/cron add "0 8 * * *" "Search the web for the latest news about AI agents and open source LLMs. Find at least 5 recent articles from the past 24 hours. Summarize the top 3 most important stories in a concise daily briefing format. For each story include: a clear headline, a 2-sentence summary, and the source URL. Use a friendly, professional tone. Format with emoji bullet points and end with a total story count."
-```
-
-### The Golden Rule: Self-Contained Prompts
-
-:::warning Critical concept
-Cron jobs run in a **completely fresh session** — no memory of your previous conversations, no context about what you "set up earlier." Your prompt must contain **everything** the agent needs to do the job.
-:::
-
-**Bad prompt:**
-```
-Do my usual morning briefing.
-```
-
-**Good prompt:**
-```
-Search the web for the latest news about AI agents and open source LLMs.
-Find at least 5 recent articles from the past 24 hours. Summarize the
-top 3 most important stories in a concise daily briefing format. For each
-story include: a clear headline, a 2-sentence summary, and the source URL.
-Use a friendly, professional tone. Format with emoji bullet points.
-```
-
-The good prompt is specific about **what to search**, **how many articles**, **what format**, and **what tone**. It's everything the agent needs in one shot.
-
-## Step 3: Customize the Briefing
-
-Once the basic briefing works, you can get creative.
-
-### Multi-Topic Briefings
-
-Cover several areas in one briefing:
-
-```
-/cron add "0 8 * * *" "Create a morning briefing covering three topics. For each topic, search the web for recent news from the past 24 hours and summarize the top 2 stories with links.
-
-Topics:
-1. AI and machine learning — focus on open source models and agent frameworks
-2. Cryptocurrency — focus on Bitcoin, Ethereum, and regulatory news
-3. Space exploration — focus on SpaceX, NASA, and commercial space
-
-Format as a clean briefing with section headers and emoji. End with today's date and a motivational quote."
-```
-
-### Using Delegation for Parallel Research
-
-For faster briefings, tell Hermes to delegate each topic to a sub-agent:
-
-```
-/cron add "0 8 * * *" "Create a morning briefing by delegating research to sub-agents. Delegate three parallel tasks:
-
-1. Delegate: Search for the top 2 AI/ML news stories from the past 24 hours with links
-2. Delegate: Search for the top 2 cryptocurrency news stories from the past 24 hours with links
-3. Delegate: Search for the top 2 space exploration news stories from the past 24 hours with links
-
-Collect all results and combine them into a single clean briefing with section headers, emoji formatting, and source links. Add today's date as a header."
-```
-
-Each sub-agent searches independently and in parallel, then the main agent combines everything into one polished briefing. See the [Delegation docs](/docs/user-guide/features/delegation) for more on how this works.
-
-### Weekday-Only Schedule
-
-Don't need briefings on weekends? Use a cron expression that targets Monday–Friday:
-
-```
-/cron add "0 8 * * 1-5" "Search for the latest AI and tech news..."
-```
-
-### Twice-Daily Briefings
-
-Get a morning overview and an evening recap:
-
-```
-/cron add "0 8 * * *" "Morning briefing: search for AI news from the past 12 hours..."
-/cron add "0 18 * * *" "Evening recap: search for AI news from the past 12 hours..."
-```
-
-### Adding Personal Context with Memory
-
-If you have [memory](/docs/user-guide/features/memory) enabled, you can store preferences that persist across sessions. But remember — cron jobs run in fresh sessions without conversational memory. To add personal context, bake it directly into the prompt:
-
-```
-/cron add "0 8 * * *" "You are creating a briefing for a senior ML engineer who cares about: PyTorch ecosystem, transformer architectures, open-weight models, and AI regulation in the EU. Skip stories about product launches or funding rounds unless they involve open source.
-
-Search for the latest news on these topics. Summarize the top 3 stories with links. Be concise and technical — this reader doesn't need basic explanations."
-```
-
-:::tip Tailor the persona
-Including details about who the briefing is *for* dramatically improves relevance. Tell the agent your role, interests, and what to skip.
-:::
-
-## Step 4: Manage Your Jobs
-
-### List All Scheduled Jobs
-
-In chat:
-```
-/cron list
-```
-
-Or from the terminal:
-```bash
-hermes cron list
-```
-
-You'll see output like:
-
-```
-ID          | Name              | Schedule    | Next Run           | Deliver
------------|-------------------|-------------|--------------------|--------
-a1b2c3d4    | Morning Briefing  | 0 8 * * *   | 2026-03-09 08:00   | telegram
-e5f6g7h8    | Evening Recap     | 0 18 * * *  | 2026-03-08 18:00   | telegram
-```
-
-### Remove a Job
-
-In chat:
-```
-/cron remove a1b2c3d4
-```
-
-Or ask conversationally:
-```
-Remove my morning briefing cron job.
-```
-
-Hermes will use `list_cronjobs` to find it and `remove_cronjob` to delete it.
-
-### Check Gateway Status
-
-Make sure the scheduler is actually running:
-
-```bash
-hermes cron status
-```
-
-If the gateway isn't running, your jobs won't execute. Install it as a system service for reliability:
-
-```bash
-hermes gateway install
-```
-
-## Going Further
-
-You've built a working daily briefing bot. Here are some directions to explore next:
-
- **[Scheduled Tasks (Cron)](/docs/user-guide/features/cron)** — Full reference for schedule formats, repeat limits, and delivery options
- **[Delegation](/docs/user-guide/features/delegation)** — Deep dive into parallel sub-agent workflows
- **[Messaging Platforms](/docs/user-guide/messaging)** — Set up Telegram, Discord, or other delivery targets
- **[Memory](/docs/user-guide/features/memory)** — Persistent context across sessions
- **[Tips & Best Practices](/docs/guides/tips)** — More prompt engineering advice
-
-:::tip What else can you schedule?
-The briefing bot pattern works for anything: competitor monitoring, GitHub repo summaries, weather forecasts, portfolio tracking, server health checks, or even a daily joke. If you can describe it in a prompt, you can schedule it.
-:::
@@ -1,340 +0,0 @@
---
-sidebar_position: 4
-title: "Using Hermes as a Python Library"
-description: "Embed AIAgent in your own Python scripts, web apps, or automation pipelines — no CLI required"
---
-
-# Using Hermes as a Python Library
-
-Hermes isn't just a CLI tool. You can import `AIAgent` directly and use it programmatically in your own Python scripts, web applications, or automation pipelines. This guide shows you how.
-
---
-
-## Installation
-
-Install Hermes directly from the repository:
-
-```bash
-pip install git+https://github.com/NousResearch/hermes-agent.git
-```
-
-Or with [uv](https://docs.astral.sh/uv/):
-
-```bash
-uv pip install git+https://github.com/NousResearch/hermes-agent.git
-```
-
-You can also pin it in your `requirements.txt`:
-
-```text
-hermes-agent @ git+https://github.com/NousResearch/hermes-agent.git
-```
-
-:::tip
-The same environment variables used by the CLI are required when using Hermes as a library. At minimum, set `OPENROUTER_API_KEY` (or `OPENAI_API_KEY` / `ANTHROPIC_API_KEY` if using direct provider access).
-:::
-
---
-
-## Basic Usage
-
-The simplest way to use Hermes is the `chat()` method — pass a message, get a string back:
-
-```python
-from run_agent import AIAgent
-
-agent = AIAgent(
-    model="anthropic/claude-sonnet-4",
-    quiet_mode=True,
-)
-response = agent.chat("What is the capital of France?")
-print(response)
-```
-
-`chat()` handles the full conversation loop internally — tool calls, retries, everything — and returns just the final text response.
-
-:::warning
-Always set `quiet_mode=True` when embedding Hermes in your own code. Without it, the agent prints CLI spinners, progress indicators, and other terminal output that will clutter your application's output.
-:::
-
---
-
-## Full Conversation Control
-
-For more control over the conversation, use `run_conversation()` directly. It returns a dictionary with the full response, message history, and metadata:
-
-```python
-agent = AIAgent(
-    model="anthropic/claude-sonnet-4",
-    quiet_mode=True,
-)
-
-result = agent.run_conversation(
-    user_message="Search for recent Python 3.13 features",
-    task_id="my-task-1",
-)
-
-print(result["final_response"])
-print(f"Messages exchanged: {len(result['messages'])}")
-```
-
-The returned dictionary contains:
- **`final_response`** — The agent's final text reply
- **`messages`** — The complete message history (system, user, assistant, tool calls)
- **`task_id`** — The task identifier used for VM isolation
-
-You can also pass a custom system message that overrides the ephemeral system prompt for that call:
-
-```python
-result = agent.run_conversation(
-    user_message="Explain quicksort",
-    system_message="You are a computer science tutor. Use simple analogies.",
-)
-```
-
---
-
-## Configuring Tools
-
-Control which toolsets the agent has access to using `enabled_toolsets` or `disabled_toolsets`:
-
-```python
-# Only enable web tools (browsing, search)
-agent = AIAgent(
-    model="anthropic/claude-sonnet-4",
-    enabled_toolsets=["web"],
-    quiet_mode=True,
-)
-
-# Enable everything except terminal access
-agent = AIAgent(
-    model="anthropic/claude-sonnet-4",
-    disabled_toolsets=["terminal"],
-    quiet_mode=True,
-)
-```
-
-:::tip
-Use `enabled_toolsets` when you want a minimal, locked-down agent (e.g., only web search for a research bot). Use `disabled_toolsets` when you want most capabilities but need to restrict specific ones (e.g., no terminal access in a shared environment).
-:::
-
---
-
-## Multi-turn Conversations
-
-Maintain conversation state across multiple turns by passing the message history back in:
-
-```python
-agent = AIAgent(
-    model="anthropic/claude-sonnet-4",
-    quiet_mode=True,
-)
-
-# First turn
-result1 = agent.run_conversation("My name is Alice")
-history = result1["messages"]
-
-# Second turn — agent remembers the context
-result2 = agent.run_conversation(
-    "What's my name?",
-    conversation_history=history,
-)
-print(result2["final_response"])  # "Your name is Alice."
-```
-
-The `conversation_history` parameter accepts the `messages` list from a previous result. The agent copies it internally, so your original list is never mutated.
-
---
-
-## Saving Trajectories
-
-Enable trajectory saving to capture conversations in ShareGPT format — useful for generating training data or debugging:
-
-```python
-agent = AIAgent(
-    model="anthropic/claude-sonnet-4",
-    save_trajectories=True,
-    quiet_mode=True,
-)
-
-agent.chat("Write a Python function to sort a list")
-# Saves to trajectory_samples.jsonl in ShareGPT format
-```
-
-Each conversation is appended as a single JSONL line, making it easy to collect datasets from automated runs.
-
---
-
-## Custom System Prompts
-
-Use `ephemeral_system_prompt` to set a custom system prompt that guides the agent's behavior but is **not** saved to trajectory files (keeping your training data clean):
-
-```python
-agent = AIAgent(
-    model="anthropic/claude-sonnet-4",
-    ephemeral_system_prompt="You are a SQL expert. Only answer database questions.",
-    quiet_mode=True,
-)
-
-response = agent.chat("How do I write a JOIN query?")
-print(response)
-```
-
-This is ideal for building specialized agents — a code reviewer, a documentation writer, a SQL assistant — all using the same underlying tooling.
-
---
-
-## Batch Processing
-
-For running many prompts in parallel, Hermes includes `batch_runner.py`. It manages concurrent `AIAgent` instances with proper resource isolation:
-
-```bash
-python batch_runner.py --input prompts.jsonl --output results.jsonl
-```
-
-Each prompt gets its own `task_id` and isolated environment. If you need custom batch logic, you can build your own using `AIAgent` directly:
-
-```python
-import concurrent.futures
-from run_agent import AIAgent
-
-prompts = [
-    "Explain recursion",
-    "What is a hash table?",
-    "How does garbage collection work?",
-]
-
-def process_prompt(prompt):
-    # Create a fresh agent per task for thread safety
-    agent = AIAgent(
-        model="anthropic/claude-sonnet-4",
-        quiet_mode=True,
-        skip_memory=True,
-    )
-    return agent.chat(prompt)
-
-with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor:
-    results = list(executor.map(process_prompt, prompts))
-
-for prompt, result in zip(prompts, results):
-    print(f"Q: {prompt}\nA: {result}\n")
-```
-
-:::warning
-Always create a **new `AIAgent` instance per thread or task**. The agent maintains internal state (conversation history, tool sessions, iteration counters) that is not thread-safe to share.
-:::
-
---
-
-## Integration Examples
-
-### FastAPI Endpoint
-
-```python
-from fastapi import FastAPI
-from pydantic import BaseModel
-from run_agent import AIAgent
-
-app = FastAPI()
-
-class ChatRequest(BaseModel):
-    message: str
-    model: str = "anthropic/claude-sonnet-4"
-
-@app.post("/chat")
-async def chat(request: ChatRequest):
-    agent = AIAgent(
-        model=request.model,
-        quiet_mode=True,
-        skip_context_files=True,
-        skip_memory=True,
-    )
-    response = agent.chat(request.message)
-    return {"response": response}
-```
-
-### Discord Bot
-
-```python
-import discord
-from run_agent import AIAgent
-
-client = discord.Client(intents=discord.Intents.default())
-
-@client.event
-async def on_message(message):
-    if message.author == client.user:
-        return
-    if message.content.startswith("!hermes "):
-        query = message.content[8:]
-        agent = AIAgent(
-            model="anthropic/claude-sonnet-4",
-            quiet_mode=True,
-            skip_context_files=True,
-            skip_memory=True,
-            platform="discord",
-        )
-        response = agent.chat(query)
-        await message.channel.send(response[:2000])
-
-client.run("YOUR_DISCORD_TOKEN")
-```
-
-### CI/CD Pipeline Step
-
-```python
-#!/usr/bin/env python3
-"""CI step: auto-review a PR diff."""
-import subprocess
-from run_agent import AIAgent
-
-diff = subprocess.check_output(["git", "diff", "main...HEAD"]).decode()
-
-agent = AIAgent(
-    model="anthropic/claude-sonnet-4",
-    quiet_mode=True,
-    skip_context_files=True,
-    skip_memory=True,
-    disabled_toolsets=["terminal", "browser"],
-)
-
-review = agent.chat(
-    f"Review this PR diff for bugs, security issues, and style problems:\n\n{diff}"
-)
-print(review)
-```
-
---
-
-## Key Constructor Parameters
-
-| Parameter | Type | Default | Description |
-|-----------|------|---------|-------------|
-| `model` | `str` | `"anthropic/claude-opus-4.6"` | Model in OpenRouter format |
-| `quiet_mode` | `bool` | `False` | Suppress CLI output |
-| `enabled_toolsets` | `List[str]` | `None` | Whitelist specific toolsets |
-| `disabled_toolsets` | `List[str]` | `None` | Blacklist specific toolsets |
-| `save_trajectories` | `bool` | `False` | Save conversations to JSONL |
-| `ephemeral_system_prompt` | `str` | `None` | Custom system prompt (not saved to trajectories) |
-| `max_iterations` | `int` | `90` | Max tool-calling iterations per conversation |
-| `skip_context_files` | `bool` | `False` | Skip loading AGENTS.md files |
-| `skip_memory` | `bool` | `False` | Disable persistent memory read/write |
-| `api_key` | `str` | `None` | API key (falls back to env vars) |
-| `base_url` | `str` | `None` | Custom API endpoint URL |
-| `platform` | `str` | `None` | Platform hint (`"discord"`, `"telegram"`, etc.) |
-
---
-
-## Important Notes
-
-:::tip
- Set **`skip_context_files=True`** if you don't want `AGENTS.md` files from the working directory loaded into the system prompt.
- Set **`skip_memory=True`** to prevent the agent from reading or writing persistent memory — recommended for stateless API endpoints.
- The `platform` parameter (e.g., `"discord"`, `"telegram"`) injects platform-specific formatting hints so the agent adapts its output style.
-:::
-
-:::warning
- **Thread safety**: Create one `AIAgent` per thread or task. Never share an instance across concurrent calls.
- **Resource cleanup**: The agent automatically cleans up resources (terminal sessions, browser instances) when a conversation ends. If you're running in a long-lived process, ensure each conversation completes normally.
- **Iteration limits**: The default `max_iterations=90` is generous. For simple Q&A use cases, consider lowering it (e.g., `max_iterations=10`) to prevent runaway tool-calling loops and control costs.
-:::
@@ -1,429 +0,0 @@
---
-sidebar_position: 3
-title: "Tutorial: Team Telegram Assistant"
-description: "Step-by-step guide to setting up a Telegram bot that your whole team can use for code help, research, system admin, and more"
---
-
-# Set Up a Team Telegram Assistant
-
-This tutorial walks you through setting up a Telegram bot powered by Hermes Agent that multiple team members can use. By the end, your team will have a shared AI assistant they can message for help with code, research, system administration, and anything else — secured with per-user authorization.
-
-## What We're Building
-
-A Telegram bot that:
-
- **Any authorized team member** can DM for help — code reviews, research, shell commands, debugging
- **Runs on your server** with full tool access — terminal, file editing, web search, code execution
- **Per-user sessions** — each person gets their own conversation context
- **Secure by default** — only approved users can interact, with two authorization methods
- **Scheduled tasks** — daily standups, health checks, and reminders delivered to a team channel
-
---
-
-## Prerequisites
-
-Before starting, make sure you have:
-
- **Hermes Agent installed** on a server or VPS (not your laptop — the bot needs to stay running). Follow the [installation guide](/getting-started/learning-path) if you haven't yet.
- **A Telegram account** for yourself (the bot owner)
- **An LLM provider configured** — at minimum, an API key for OpenAI, Anthropic, or another supported provider in `~/.hermes/.env`
-
-:::tip
-A $5/month VPS is plenty for running the gateway. Hermes itself is lightweight — the LLM API calls are what cost money, and those happen remotely.
-:::
-
---
-
-## Step 1: Create a Telegram Bot
-
-Every Telegram bot starts with **@BotFather** — Telegram's official bot for creating bots.
-
-1. **Open Telegram** and search for `@BotFather`, or go to [t.me/BotFather](https://t.me/BotFather)
-
-2. **Send `/newbot`** — BotFather will ask you two things:
-   - **Display name** — what users see (e.g., `Team Hermes Assistant`)
-   - **Username** — must end in `bot` (e.g., `myteam_hermes_bot`)
-
-3. **Copy the bot token** — BotFather replies with something like:
-   ```
-   Use this token to access the HTTP API:
-   7123456789:AAH1bGciOiJSUzI1NiIsInR5cCI6Ikp...
-   ```
-   Save this token — you'll need it in the next step.
-
-4. **Set a description** (optional but recommended):
-   ```
-   /setdescription
-   ```
-   Choose your bot, then enter something like:
-   ```
-   Team AI assistant powered by Hermes Agent. DM me for help with code, research, debugging, and more.
-   ```
-
-5. **Set bot commands** (optional — gives users a command menu):
-   ```
-   /setcommands
-   ```
-   Choose your bot, then paste:
-   ```
-   new - Start a fresh conversation
-   model - Show or change the AI model
-   status - Show session info
-   help - Show available commands
-   stop - Stop the current task
-   ```
-
-:::warning
-Keep your bot token secret. Anyone with the token can control the bot. If it leaks, use `/revoke` in BotFather to generate a new one.
-:::
-
---
-
-## Step 2: Configure the Gateway
-
-You have two options: the interactive setup wizard (recommended) or manual configuration.
-
-### Option A: Interactive Setup (Recommended)
-
-```bash
-hermes gateway setup
-```
-
-This walks you through everything with arrow-key selection. Pick **Telegram**, paste your bot token, and enter your user ID when prompted.
-
-### Option B: Manual Configuration
-
-Add these lines to `~/.hermes/.env`:
-
-```bash
-# Telegram bot token from BotFather
-TELEGRAM_BOT_TOKEN=7123456789:AAH1bGciOiJSUzI1NiIsInR5cCI6Ikp...
-
-# Your Telegram user ID (numeric)
-TELEGRAM_ALLOWED_USERS=123456789
-```
-
-### Finding Your User ID
-
-Your Telegram user ID is a numeric value (not your username). To find it:
-
-1. Message [@userinfobot](https://t.me/userinfobot) on Telegram
-2. It instantly replies with your numeric user ID
-3. Copy that number into `TELEGRAM_ALLOWED_USERS`
-
-:::info
-Telegram user IDs are permanent numbers like `123456789`. They're different from your `@username`, which can change. Always use the numeric ID for allowlists.
-:::
-
---
-
-## Step 3: Start the Gateway
-
-### Quick Test
-
-Run the gateway in the foreground first to make sure everything works:
-
-```bash
-hermes gateway
-```
-
-You should see output like:
-
-```
-[Gateway] Starting Hermes Gateway...
-[Gateway] Telegram adapter connected
-[Gateway] Cron scheduler started (tick every 60s)
-```
-
-Open Telegram, find your bot, and send it a message. If it replies, you're in business. Press `Ctrl+C` to stop.
-
-### Production: Install as a Service
-
-For a persistent deployment that survives reboots:
-
-```bash
-hermes gateway install
-```
-
-This creates a **systemd** service (Linux) or **launchd** service (macOS) that runs automatically.
-
-```bash
-# Linux — manage the service
-hermes gateway start
-hermes gateway stop
-hermes gateway status
-
-# View live logs
-journalctl --user -u hermes-gateway -f
-
-# Keep running after SSH logout
-sudo loginctl enable-linger $USER
-```
-
-```bash
-# macOS — manage the service
-launchctl start ai.hermes.gateway
-launchctl stop ai.hermes.gateway
-tail -f ~/.hermes/logs/gateway.log
-```
-
-### Verify It's Running
-
-```bash
-hermes gateway status
-```
-
-Then send a test message to your bot on Telegram. You should get a response within a few seconds.
-
---
-
-## Step 4: Set Up Team Access
-
-Now let's give your teammates access. There are two approaches.
-
-### Approach A: Static Allowlist
-
-Collect each team member's Telegram user ID (have them message [@userinfobot](https://t.me/userinfobot)) and add them as a comma-separated list:
-
-```bash
-# In ~/.hermes/.env
-TELEGRAM_ALLOWED_USERS=123456789,987654321,555555555
-```
-
-Restart the gateway after changes:
-
-```bash
-hermes gateway stop && hermes gateway start
-```
-
-### Approach B: DM Pairing (Recommended for Teams)
-
-DM pairing is more flexible — you don't need to collect user IDs upfront. Here's how it works:
-
-1. **Teammate DMs the bot** — since they're not on the allowlist, the bot replies with a one-time pairing code:
-   ```
-   🔐 Pairing code: XKGH5N7P
-   Send this code to the bot owner for approval.
-   ```
-
-2. **Teammate sends you the code** (via any channel — Slack, email, in person)
-
-3. **You approve it** on the server:
-   ```bash
-   hermes pairing approve telegram XKGH5N7P
-   ```
-
-4. **They're in** — the bot immediately starts responding to their messages
-
-**Managing paired users:**
-
-```bash
-# See all pending and approved users
-hermes pairing list
-
-# Revoke someone's access
-hermes pairing revoke telegram 987654321
-
-# Clear expired pending codes
-hermes pairing clear-pending
-```
-
-:::tip
-DM pairing is ideal for teams because you don't need to restart the gateway when adding new users. Approvals take effect immediately.
-:::
-
-### Security Considerations
-
- **Never set `GATEWAY_ALLOW_ALL_USERS=true`** on a bot with terminal access — anyone who finds your bot could run commands on your server
- Pairing codes expire after **1 hour** and use cryptographic randomness
- Rate limiting prevents brute-force attacks: 1 request per user per 10 minutes, max 3 pending codes per platform
- After 5 failed approval attempts, the platform enters a 1-hour lockout
- All pairing data is stored with `chmod 0600` permissions
-
---
-
-## Step 5: Configure the Bot
-
-### Set a Home Channel
-
-A **home channel** is where the bot delivers cron job results and proactive messages. Without one, scheduled tasks have nowhere to send output.
-
-**Option 1:** Use the `/sethome` command in any Telegram group or chat where the bot is a member.
-
-**Option 2:** Set it manually in `~/.hermes/.env`:
-
-```bash
-TELEGRAM_HOME_CHANNEL=-1001234567890
-TELEGRAM_HOME_CHANNEL_NAME="Team Updates"
-```
-
-To find a channel ID, add [@userinfobot](https://t.me/userinfobot) to the group — it will report the group's chat ID.
-
-### Configure Tool Progress Display
-
-Control how much detail the bot shows when using tools. In `~/.hermes/config.yaml`:
-
-```yaml
-display:
-  tool_progress: new    # off | new | all | verbose
-```
-
-| Mode | What You See |
-|------|-------------|
-| `off` | Clean responses only — no tool activity |
-| `new` | Brief status for each new tool call (recommended for messaging) |
-| `all` | Every tool call with details |
-| `verbose` | Full tool output including command results |
-
-Users can also change this per-session with the `/verbose` command in chat.
-
-### Set Up a Personality with SOUL.md
-
-Customize how the bot communicates by creating `~/.hermes/SOUL.md`:
-
-```markdown
-# Soul
-You are a helpful team assistant. Be concise and technical.
-Use code blocks for any code. Skip pleasantries — the team
-values directness. When debugging, always ask for error logs
-before guessing at solutions.
-```
-
-### Add Project Context
-
-If your team works on specific projects, create context files so the bot knows your stack:
-
-```markdown
-<!-- ~/.hermes/AGENTS.md -->
-# Team Context
- We use Python 3.12 with FastAPI and SQLAlchemy
- Frontend is React with TypeScript
- CI/CD runs on GitHub Actions
- Production deploys to AWS ECS
- Always suggest writing tests for new code
-```
-
-:::info
-Context files are injected into every session's system prompt. Keep them concise — every character counts against your token budget.
-:::
-
---
-
-## Step 6: Set Up Scheduled Tasks
-
-With the gateway running, you can schedule recurring tasks that deliver results to your team channel.
-
-### Daily Standup Summary
-
-Message the bot on Telegram:
-
-```
-Every weekday at 9am, check the GitHub repository at
-github.com/myorg/myproject for:
-1. Pull requests opened/merged in the last 24 hours
-2. Issues created or closed
-3. Any CI/CD failures on the main branch
-Format as a brief standup-style summary.
-```
-
-The agent creates a cron job automatically and delivers results to the chat where you asked (or the home channel).
-
-### Server Health Check
-
-```
-Every 6 hours, check disk usage with 'df -h', memory with 'free -h',
-and Docker container status with 'docker ps'. Report anything unusual —
-partitions above 80%, containers that have restarted, or high memory usage.
-```
-
-### Managing Scheduled Tasks
-
-```bash
-# From the CLI
-hermes cron list          # View all scheduled jobs
-hermes cron status        # Check if scheduler is running
-
-# From Telegram chat
-/cron list                # View jobs
-/cron remove <job_id>     # Remove a job
-```
-
-:::warning
-Cron job prompts run in completely fresh sessions with no memory of prior conversations. Make sure each prompt contains **all** the context the agent needs — file paths, URLs, server addresses, and clear instructions.
-:::
-
---
-
-## Production Tips
-
-### Use Docker for Safety
-
-On a shared team bot, use Docker as the terminal backend so agent commands run in a container instead of on your host:
-
-```bash
-# In ~/.hermes/.env
-TERMINAL_BACKEND=docker
-TERMINAL_DOCKER_IMAGE=nikolaik/python-nodejs:python3.11-nodejs20
-```
-
-Or in `~/.hermes/config.yaml`:
-
-```yaml
-terminal:
-  backend: docker
-  container_cpu: 1
-  container_memory: 5120
-  container_persistent: true
-```
-
-This way, even if someone asks the bot to run something destructive, your host system is protected.
-
-### Monitor the Gateway
-
-```bash
-# Check if the gateway is running
-hermes gateway status
-
-# Watch live logs (Linux)
-journalctl --user -u hermes-gateway -f
-
-# Watch live logs (macOS)
-tail -f ~/.hermes/logs/gateway.log
-```
-
-### Keep Hermes Updated
-
-From Telegram, send `/update` to the bot — it will pull the latest version and restart. Or from the server:
-
-```bash
-hermes update
-hermes gateway stop && hermes gateway start
-```
-
-### Log Locations
-
-| What | Location |
-|------|----------|
-| Gateway logs | `journalctl --user -u hermes-gateway` (Linux) or `~/.hermes/logs/gateway.log` (macOS) |
-| Cron job output | `~/.hermes/cron/output/{job_id}/{timestamp}.md` |
-| Cron job definitions | `~/.hermes/cron/jobs.json` |
-| Pairing data | `~/.hermes/pairing/` |
-| Session history | `~/.hermes/sessions/` |
-
---
-
-## Going Further
-
-You've got a working team Telegram assistant. Here are some next steps:
-
- **[Security Guide](/user-guide/security)** — deep dive into authorization, container isolation, and command approval
- **[Messaging Gateway](/user-guide/messaging)** — full reference for gateway architecture, session management, and chat commands
- **[Telegram Setup](/user-guide/messaging/telegram)** — platform-specific details including voice messages and TTS
- **[Scheduled Tasks](/user-guide/features/cron)** — advanced cron scheduling with delivery options and cron expressions
- **[Context Files](/user-guide/features/context-files)** — AGENTS.md, SOUL.md, and .cursorrules for project knowledge
- **[Personality](/user-guide/features/personality)** — built-in personality presets and custom persona definitions
- **Add more platforms** — the same gateway can simultaneously run [Discord](/user-guide/messaging/discord), [Slack](/user-guide/messaging/slack), and [WhatsApp](/user-guide/messaging/whatsapp)
-
---
-
-*Questions or issues? Open an issue on GitHub — contributions are welcome.*
@@ -1,211 +0,0 @@
---
-sidebar_position: 1
-title: "Tips & Best Practices"
-description: "Practical advice to get the most out of Hermes Agent — prompt tips, CLI shortcuts, context files, memory, cost optimization, and security"
---
-
-# Tips & Best Practices
-
-A quick-wins collection of practical tips that make you immediately more effective with Hermes Agent. Each section targets a different aspect — scan the headers and jump to what's relevant.
-
---
-
-## Getting the Best Results
-
-### Be Specific About What You Want
-
-Vague prompts produce vague results. Instead of "fix the code," say "fix the TypeError in `api/handlers.py` on line 47 — the `process_request()` function receives `None` from `parse_body()`." The more context you give, the fewer iterations you need.
-
-### Provide Context Up Front
-
-Front-load your request with the relevant details: file paths, error messages, expected behavior. One well-crafted message beats three rounds of clarification. Paste error tracebacks directly — the agent can parse them.
-
-### Use Context Files for Recurring Instructions
-
-If you find yourself repeating the same instructions ("use tabs not spaces," "we use pytest," "the API is at `/api/v2`"), put them in an `AGENTS.md` file. The agent reads it automatically every session — zero effort after setup.
-
-### Let the Agent Use Its Tools
-
-Don't try to hand-hold every step. Say "find and fix the failing test" rather than "open `tests/test_foo.py`, look at line 42, then..." The agent has file search, terminal access, and code execution — let it explore and iterate.
-
-### Use Skills for Complex Workflows
-
-Before writing a long prompt explaining how to do something, check if there's already a skill for it. Type `/skills` to browse available skills, or just invoke one directly like `/axolotl` or `/github-pr-workflow`.
-
-## CLI Power User Tips
-
-### Multi-Line Input
-
-Press **Alt+Enter** (or **Ctrl+J**) to insert a newline without sending. This lets you compose multi-line prompts, paste code blocks, or structure complex requests before hitting Enter to send.
-
-### Paste Detection
-
-The CLI auto-detects multi-line pastes. Just paste a code block or error traceback directly — it won't send each line as a separate message. The paste is buffered and sent as one message.
-
-### Interrupt and Redirect
-
-Press **Ctrl+C** once to interrupt the agent mid-response. You can then type a new message to redirect it. Double-press Ctrl+C within 2 seconds to force exit. This is invaluable when the agent starts going down the wrong path.
-
-### Resume Sessions with `-c`
-
-Forgot something from your last session? Run `hermes -c` to resume exactly where you left off, with full conversation history restored. You can also resume by title: `hermes -r "my research project"`.
-
-### Clipboard Image Paste
-
-Press **Ctrl+V** to paste an image from your clipboard directly into the chat. The agent uses vision to analyze screenshots, diagrams, error popups, or UI mockups — no need to save to a file first.
-
-### Slash Command Autocomplete
-
-Type `/` and press **Tab** to see all available commands. This includes built-in commands (`/compress`, `/model`, `/title`) and every installed skill. You don't need to memorize anything — Tab completion has you covered.
-
-:::tip
-Use `/verbose` to cycle through tool output display modes: **off → new → all → verbose**. The "all" mode is great for watching what the agent does; "off" is cleanest for simple Q&A.
-:::
-
-## Context Files
-
-### AGENTS.md: Your Project's Brain
-
-Create an `AGENTS.md` in your project root with architecture decisions, coding conventions, and project-specific instructions. This is automatically injected into every session, so the agent always knows your project's rules.
-
-```markdown
-# Project Context
- This is a FastAPI backend with SQLAlchemy ORM
- Always use async/await for database operations
- Tests go in tests/ and use pytest-asyncio
- Never commit .env files
-```
-
-### SOUL.md: Customize Personality
-
-Want the agent to be more concise? More technical? Place a `SOUL.md` in your project root or `~/.hermes/SOUL.md` for global personality customization. This shapes the agent's tone and communication style.
-
-```markdown
-# Soul
-You are a senior backend engineer. Be terse and direct.
-Skip explanations unless asked. Prefer one-liners over verbose solutions.
-Always consider error handling and edge cases.
-```
-
-### .cursorrules Compatibility
-
-Already have a `.cursorrules` or `.cursor/rules/*.mdc` file? Hermes reads those too. No need to duplicate your coding conventions — they're loaded automatically from the working directory.
-
-### Hierarchical Discovery
-
-Hermes walks the directory tree and discovers **all** `AGENTS.md` files at every level. In a monorepo, put project-wide conventions at the root and team-specific ones in subdirectories — they're all concatenated together with path headers.
-
-:::tip
-Keep context files focused and concise. Every character counts against your token budget since they're injected into every single message.
-:::
-
-## Memory & Skills
-
-### Memory vs. Skills: What Goes Where
-
-**Memory** is for facts: your environment, preferences, project locations, and things the agent has learned about you. **Skills** are for procedures: multi-step workflows, tool-specific instructions, and reusable recipes. Use memory for "what," skills for "how."
-
-### When to Create Skills
-
-If you find a task that takes 5+ steps and you'll do it again, ask the agent to create a skill for it. Say "save what you just did as a skill called `deploy-staging`." Next time, just type `/deploy-staging` and the agent loads the full procedure.
-
-### Managing Memory Capacity
-
-Memory is intentionally bounded (~2,200 chars for MEMORY.md, ~1,375 chars for USER.md). When it fills up, the agent consolidates entries. You can help by saying "clean up your memory" or "replace the old Python 3.9 note — we're on 3.12 now."
-
-### Let the Agent Remember
-
-After a productive session, say "remember this for next time" and the agent will save the key takeaways. You can also be specific: "save to memory that our CI uses GitHub Actions with the `deploy.yml` workflow."
-
-:::warning
-Memory is a frozen snapshot — changes made during a session don't appear in the system prompt until the next session starts. The agent writes to disk immediately, but the prompt cache isn't invalidated mid-session.
-:::
-
-## Performance & Cost
-
-### Don't Break the Prompt Cache
-
-Most LLM providers cache the system prompt prefix. If you keep your system prompt stable (same context files, same memory), subsequent messages in a session get **cache hits** that are significantly cheaper. Avoid changing the model or system prompt mid-session.
-
-### Use /compress Before Hitting Limits
-
-Long sessions accumulate tokens. When you notice responses slowing down or getting truncated, run `/compress`. This summarizes the conversation history, preserving key context while dramatically reducing token count. Use `/usage` to check where you stand.
-
-### Delegate for Parallel Work
-
-Need to research three topics at once? Ask the agent to use `delegate_task` with parallel subtasks. Each subagent runs independently with its own context, and only the final summaries come back — massively reducing your main conversation's token usage.
-
-### Use execute_code for Batch Operations
-
-Instead of running terminal commands one at a time, ask the agent to write a script that does everything at once. "Write a Python script to rename all `.jpeg` files to `.jpg` and run it" is cheaper and faster than renaming files individually.
-
-### Choose the Right Model
-
-Use `/model` to switch models mid-session. Use a frontier model (Claude Sonnet/Opus, GPT-4o) for complex reasoning and architecture decisions. Switch to a faster model for simple tasks like formatting, renaming, or boilerplate generation.
-
-:::tip
-Run `/usage` periodically to see your token consumption. Run `/insights` for a broader view of usage patterns over the last 30 days.
-:::
-
-## Messaging Tips
-
-### Set a Home Channel
-
-Use `/sethome` in your preferred Telegram or Discord chat to designate it as the home channel. Cron job results and scheduled task outputs are delivered here. Without it, the agent has nowhere to send proactive messages.
-
-### Use /title to Organize Sessions
-
-Name your sessions with `/title auth-refactor` or `/title research-llm-quantization`. Named sessions are easy to find with `hermes sessions list` and resume with `hermes -r "auth-refactor"`. Unnamed sessions pile up and become impossible to distinguish.
-
-### DM Pairing for Team Access
-
-Instead of manually collecting user IDs for allowlists, enable DM pairing. When a teammate DMs the bot, they get a one-time pairing code. You approve it with `hermes pairing approve telegram XKGH5N7P` — simple and secure.
-
-### Tool Progress Display Modes
-
-Use `/verbose` to control how much tool activity you see. In messaging platforms, less is usually more — keep it on "new" to see just new tool calls. In the CLI, "all" gives you a satisfying live view of everything the agent does.
-
-:::tip
-On messaging platforms, sessions auto-reset after idle time (default: 120 min) or daily at 4 AM. Adjust per-platform in `~/.hermes/gateway.json` if you need longer sessions.
-:::
-
-## Security
-
-### Use Docker for Untrusted Code
-
-When working with untrusted repositories or running unfamiliar code, use Docker or Daytona as your terminal backend. Set `TERMINAL_BACKEND=docker` in your `.env`. Destructive commands inside a container can't harm your host system.
-
-```bash
-# In your .env:
-TERMINAL_BACKEND=docker
-TERMINAL_DOCKER_IMAGE=hermes-sandbox:latest
-```
-
-### Review Before Choosing "Always"
-
-When the agent triggers a dangerous command approval (`rm -rf`, `DROP TABLE`, etc.), you get four options: **once**, **session**, **always**, **deny**. Think carefully before choosing "always" — it permanently allowlists that pattern. Start with "session" until you're comfortable.
-
-### Command Approval Is Your Safety Net
-
-Hermes checks every command against a curated list of dangerous patterns before execution. This includes recursive deletes, SQL drops, piping curl to shell, and more. Don't disable this in production — it exists for good reasons.
-
-:::warning
-When running in a container backend (Docker, Singularity, Modal, Daytona), dangerous command checks are **skipped** because the container is the security boundary. Make sure your container images are properly locked down.
-:::
-
-### Use Allowlists for Messaging Bots
-
-Never set `GATEWAY_ALLOW_ALL_USERS=true` on a bot with terminal access. Always use platform-specific allowlists (`TELEGRAM_ALLOWED_USERS`, `DISCORD_ALLOWED_USERS`) or DM pairing to control who can interact with your agent.
-
-```bash
-# Recommended: explicit allowlists per platform
-TELEGRAM_ALLOWED_USERS=123456789,987654321
-DISCORD_ALLOWED_USERS=123456789012345678
-
-# Or use cross-platform allowlist
-GATEWAY_ALLOWED_USERS=123456789,987654321
-```
-
---
-
-*Have a tip that should be on this page? Open an issue or PR — community contributions are welcome.*
@@ -25,7 +25,6 @@ It's not a coding copilot tethered to an IDE or a chatbot wrapper around a singl
 |---|---|
 | 🚀 **[Installation](/docs/getting-started/installation)** | Install in 60 seconds on Linux, macOS, or WSL2 |
 | 📖 **[Quickstart Tutorial](/docs/getting-started/quickstart)** | Your first conversation and key features to try |
-| 🗺️ **[Learning Path](/docs/getting-started/learning-path)** | Find the right docs for your experience level |
 | ⚙️ **[Configuration](/docs/user-guide/configuration)** | Config file, providers, models, and options |
 | 💬 **[Messaging Gateway](/docs/user-guide/messaging)** | Set up Telegram, Discord, Slack, or WhatsApp |
 | 🔧 **[Tools & Toolsets](/docs/user-guide/features/tools)** | 40+ built-in tools and how to configure them |
@@ -34,9 +33,8 @@ It's not a coding copilot tethered to an IDE or a chatbot wrapper around a singl
 | 🔌 **[MCP Integration](/docs/user-guide/features/mcp)** | Connect to any MCP server for extended capabilities |
 | 📄 **[Context Files](/docs/user-guide/features/context-files)** | Project context files that shape every conversation |
 | 🔒 **[Security](/docs/user-guide/security)** | Command approval, authorization, container isolation |
-| 💡 **[Tips & Best Practices](/docs/guides/tips)** | Quick wins to get the most out of Hermes |
 | 🏗️ **[Architecture](/docs/developer-guide/architecture)** | How it works under the hood |
-| ❓ **[FAQ & Troubleshooting](/docs/reference/faq)** | Common questions and solutions |
+| 🤝 **[Contributing](/docs/developer-guide/contributing)** | Development setup and PR process |

 ## Key Features

@@ -160,22 +160,6 @@ Type `/` in the interactive CLI to see an autocomplete dropdown.
 | `/usage` | Show token usage for this session |
 | `/insights [--days N]` | Show usage insights and analytics (last 30 days) |

-#### /compress
-
-Manually triggers context compression on the current conversation. This summarizes middle turns of the conversation while preserving the first 3 and last 4 turns, significantly reducing token count. Useful when:
-
- The conversation is getting long and you want to reduce costs
- You're approaching the model's context limit
- You want to continue the conversation without starting fresh
-
-Requirements: at least 4 messages in the conversation. The configured model (or `compression.summary_model` from config) is used to generate the summary. After compression, the session continues seamlessly with the compressed history.
-
-Reports the result as: `Compressed: X → Y messages, ~N → ~M tokens`.
-
-:::tip
-Compression also happens automatically when approaching context limits (configurable via `compression.threshold` in `config.yaml`). Use `/compress` when you want to trigger it early.
-:::
-
 ### Media & Input

 | Command | Description |
@@ -107,10 +107,6 @@ All variables go in `~/.hermes/.env`. You can also set them with `hermes config
 | `WHATSAPP_ENABLED` | Enable WhatsApp bridge (`true`/`false`) |
 | `WHATSAPP_MODE` | `bot` (separate number) or `self-chat` (message yourself) |
 | `WHATSAPP_ALLOWED_USERS` | Comma-separated phone numbers (with country code) |
-| `SIGNAL_HTTP_URL` | signal-cli daemon HTTP endpoint (e.g., `http://127.0.0.1:8080`) |
-| `SIGNAL_ACCOUNT` | Bot phone number in E.164 format (e.g., `+15551234567`) |
-| `SIGNAL_ALLOWED_USERS` | Comma-separated E.164 phone numbers or UUIDs |
-| `SIGNAL_GROUP_ALLOWED_USERS` | Comma-separated group IDs, or `*` for all groups (omit to disable groups) |
 | `MESSAGING_CWD` | Working directory for terminal in messaging (default: `~`) |
 | `GATEWAY_ALLOWED_USERS` | Comma-separated user IDs allowed across all platforms |
 | `GATEWAY_ALLOW_ALL_USERS` | Allow all users without allowlist (`true`/`false`, default: `false`) |
@@ -1,430 +0,0 @@
---
-sidebar_position: 3
-title: "FAQ & Troubleshooting"
-description: "Frequently asked questions and solutions to common issues with Hermes Agent"
---
-
-# FAQ & Troubleshooting
-
-Quick answers and fixes for the most common questions and issues.
-
---
-
-## Frequently Asked Questions
-
-### What LLM providers work with Hermes?
-
-Hermes Agent works with any OpenAI-compatible API. Supported providers include:
-
- **[OpenRouter](https://openrouter.ai/)** — access hundreds of models through one API key (recommended for flexibility)
- **Nous Portal** — Nous Research's own inference endpoint
- **OpenAI** — GPT-4o, o1, o3, etc.
- **Anthropic** — Claude models (via OpenRouter or compatible proxy)
- **Google** — Gemini models (via OpenRouter or compatible proxy)
- **z.ai / ZhipuAI** — GLM models
- **Kimi / Moonshot AI** — Kimi models
- **MiniMax** — global and China endpoints
- **Local models** — via [Ollama](https://ollama.com/), [vLLM](https://docs.vllm.ai/), [llama.cpp](https://github.com/ggerganov/llama.cpp), [SGLang](https://github.com/sgl-project/sglang), or any OpenAI-compatible server
-
-Set your provider with `hermes setup` or by editing `~/.hermes/.env`. See the [Environment Variables](./environment-variables.md) reference for all provider keys.
-
-### Does it work on Windows?
-
-**Not natively.** Hermes Agent requires a Unix-like environment. On Windows, install [WSL2](https://learn.microsoft.com/en-us/windows/wsl/install) and run Hermes from inside it. The standard install command works perfectly in WSL2:
-
-```bash
-curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
-```
-
-### Is my data sent anywhere?
-
-API calls go **only to the LLM provider you configure** (e.g., OpenRouter, your local Ollama instance). Hermes Agent does not collect telemetry, usage data, or analytics. Your conversations, memory, and skills are stored locally in `~/.hermes/`.
-
-### Can I use it offline / with local models?
-
-Yes. Point Hermes at any local OpenAI-compatible server:
-
-```bash
-hermes config set OPENAI_BASE_URL http://localhost:11434/v1  # Ollama
-hermes config set OPENAI_API_KEY ollama                       # Any non-empty value
-hermes config set HERMES_MODEL llama3.1
-```
-
-This works with Ollama, vLLM, llama.cpp server, SGLang, LocalAI, and others. See the [Configuration guide](../user-guide/configuration.md) for details.
-
-### How much does it cost?
-
-Hermes Agent itself is **free and open-source** (MIT license). You pay only for the LLM API usage from your chosen provider. Local models are completely free to run.
-
-### Can multiple people use one instance?
-
-Yes. The [messaging gateway](../user-guide/messaging/index.md) lets multiple users interact with the same Hermes Agent instance via Telegram, Discord, Slack, WhatsApp, or Home Assistant. Access is controlled through allowlists (specific user IDs) and DM pairing (first user to message claims access).
-
-### What's the difference between memory and skills?
-
- **Memory** stores **facts** — things the agent knows about you, your projects, and preferences. Memories are retrieved automatically based on relevance.
- **Skills** store **procedures** — step-by-step instructions for how to do things. Skills are recalled when the agent encounters a similar task.
-
-Both persist across sessions. See [Memory](../user-guide/features/memory.md) and [Skills](../user-guide/features/skills.md) for details.
-
-### Can I use it in my own Python project?
-
-Yes. Import the `AIAgent` class and use Hermes programmatically:
-
-```python
-from hermes.agent import AIAgent
-
-agent = AIAgent(model="openrouter/nous/hermes-3-llama-3.1-70b")
-response = await agent.chat("Explain quantum computing briefly")
-```
-
-See the [Python Library guide](../user-guide/features/code-execution.md) for full API usage.
-
---
-
-## Troubleshooting
-
-### Installation Issues
-
-#### `hermes: command not found` after installation
-
-**Cause:** Your shell hasn't reloaded the updated PATH.
-
-**Solution:**
-```bash
-# Reload your shell profile
-source ~/.bashrc    # bash
-source ~/.zshrc     # zsh
-
-# Or start a new terminal session
-```
-
-If it still doesn't work, verify the install location:
-```bash
-which hermes
-ls ~/.local/bin/hermes
-```
-
-:::tip
-The installer adds `~/.local/bin` to your PATH. If you use a non-standard shell config, add `export PATH="$HOME/.local/bin:$PATH"` manually.
-:::
-
-#### Python version too old
-
-**Cause:** Hermes requires Python 3.11 or newer.
-
-**Solution:**
-```bash
-python3 --version   # Check current version
-
-# Install a newer Python
-sudo apt install python3.12   # Ubuntu/Debian
-brew install python@3.12      # macOS
-```
-
-The installer handles this automatically — if you see this error during manual installation, upgrade Python first.
-
-#### `uv: command not found`
-
-**Cause:** The `uv` package manager isn't installed or not in PATH.
-
-**Solution:**
-```bash
-curl -LsSf https://astral.sh/uv/install.sh | sh
-source ~/.bashrc
-```
-
-#### Permission denied errors during install
-
-**Cause:** Insufficient permissions to write to the install directory.
-
-**Solution:**
-```bash
-# Don't use sudo with the installer — it installs to ~/.local/bin
-# If you previously installed with sudo, clean up:
-sudo rm /usr/local/bin/hermes
-# Then re-run the standard installer
-curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
-```
-
---
-
-### Provider & Model Issues
-
-#### API key not working
-
-**Cause:** Key is missing, expired, incorrectly set, or for the wrong provider.
-
-**Solution:**
-```bash
-# Check which keys are set
-hermes config get OPENROUTER_API_KEY
-
-# Re-run interactive setup
-hermes setup
-
-# Or set directly
-hermes config set OPENROUTER_API_KEY sk-or-v1-xxxxxxxxxxxx
-```
-
-:::warning
-Make sure the key matches the provider. An OpenAI key won't work with OpenRouter and vice versa. Check `~/.hermes/.env` for conflicting entries.
-:::
-
-#### Model not available / model not found
-
-**Cause:** The model identifier is incorrect or not available on your provider.
-
-**Solution:**
-```bash
-# List available models for your provider
-hermes models
-
-# Set a valid model
-hermes config set HERMES_MODEL openrouter/nous/hermes-3-llama-3.1-70b
-
-# Or specify per-session
-hermes chat --model openrouter/meta-llama/llama-3.1-70b-instruct
-```
-
-#### Rate limiting (429 errors)
-
-**Cause:** You've exceeded your provider's rate limits.
-
-**Solution:** Wait a moment and retry. For sustained usage, consider:
- Upgrading your provider plan
- Switching to a different model or provider
- Using `hermes chat --provider <alternative>` to route to a different backend
-
-#### Context length exceeded
-
-**Cause:** The conversation has grown too long for the model's context window.
-
-**Solution:**
-```bash
-# Compress the current session
-/compress
-
-# Or start a fresh session
-hermes chat
-
-# Use a model with a larger context window
-hermes chat --model openrouter/google/gemini-2.0-flash-001
-```
-
---
-
-### Terminal Issues
-
-#### Command blocked as dangerous
-
-**Cause:** Hermes detected a potentially destructive command (e.g., `rm -rf`, `DROP TABLE`). This is a safety feature.
-
-**Solution:** When prompted, review the command and type `y` to approve it. You can also:
- Ask the agent to use a safer alternative
- See the full list of dangerous patterns in the [Security docs](../user-guide/security.md)
-
-:::tip
-This is working as intended — Hermes never silently runs destructive commands. The approval prompt shows you exactly what will execute.
-:::
-
-#### `sudo` not working via messaging gateway
-
-**Cause:** The messaging gateway runs without an interactive terminal, so `sudo` cannot prompt for a password.
-
-**Solution:**
- Avoid `sudo` in messaging — ask the agent to find alternatives
- If you must use `sudo`, configure passwordless sudo for specific commands in `/etc/sudoers`
- Or switch to the terminal interface for administrative tasks: `hermes chat`
-
-#### Docker backend not connecting
-
-**Cause:** Docker daemon isn't running or the user lacks permissions.
-
-**Solution:**
-```bash
-# Check Docker is running
-docker info
-
-# Add your user to the docker group
-sudo usermod -aG docker $USER
-newgrp docker
-
-# Verify
-docker run hello-world
-```
-
---
-
-### Messaging Issues
-
-#### Bot not responding to messages
-
-**Cause:** The bot isn't running, isn't authorized, or your user isn't in the allowlist.
-
-**Solution:**
-```bash
-# Check if the gateway is running
-hermes gateway status
-
-# Start the gateway
-hermes gateway start
-
-# Check logs for errors
-hermes gateway logs
-```
-
-#### Messages not delivering
-
-**Cause:** Network issues, bot token expired, or platform webhook misconfiguration.
-
-**Solution:**
- Verify your bot token is valid with `hermes setup`
- Check gateway logs: `hermes gateway logs`
- For webhook-based platforms (Slack, WhatsApp), ensure your server is publicly accessible
-
-#### Allowlist confusion — who can talk to the bot?
-
-**Cause:** Authorization mode determines who gets access.
-
-**Solution:**
-
-| Mode | How it works |
-|------|-------------|
-| **Allowlist** | Only user IDs listed in config can interact |
-| **DM pairing** | First user to message in DM claims exclusive access |
-| **Open** | Anyone can interact (not recommended for production) |
-
-Configure in `~/.hermes/config.yaml` under your gateway's settings. See the [Messaging docs](../user-guide/messaging/index.md).
-
-#### Gateway won't start
-
-**Cause:** Missing dependencies, port conflicts, or misconfigured tokens.
-
-**Solution:**
-```bash
-# Install messaging dependencies
-pip install hermes-agent[telegram]   # or [discord], [slack], [whatsapp]
-
-# Check for port conflicts
-lsof -i :8080
-
-# Verify configuration
-hermes config show
-```
-
---
-
-### Performance Issues
-
-#### Slow responses
-
-**Cause:** Large model, distant API server, or heavy system prompt with many tools.
-
-**Solution:**
- Try a faster/smaller model: `hermes chat --model openrouter/meta-llama/llama-3.1-8b-instruct`
- Reduce active toolsets: `hermes chat -t "terminal"`
- Check your network latency to the provider
- For local models, ensure you have enough GPU VRAM
-
-#### High token usage
-
-**Cause:** Long conversations, verbose system prompts, or many tool calls accumulating context.
-
-**Solution:**
-```bash
-# Compress the conversation to reduce tokens
-/compress
-
-# Check session token count
-/stats
-```
-
-:::tip
-Use `/compress` regularly during long sessions. It summarizes the conversation history and reduces token usage significantly while preserving context.
-:::
-
-#### Session getting too long
-
-**Cause:** Extended conversations accumulate messages and tool outputs, approaching context limits.
-
-**Solution:**
-```bash
-# Compress current session (preserves key context)
-/compress
-
-# Start a new session with a reference to the old one
-hermes chat
-
-# Resume a specific session later if needed
-hermes chat --continue
-```
-
---
-
-### MCP Issues
-
-#### MCP server not connecting
-
-**Cause:** Server binary not found, wrong command path, or missing runtime.
-
-**Solution:**
-```bash
-# Ensure MCP dependencies are installed
-pip install hermes-agent[mcp]
-
-# For npm-based servers, ensure Node.js is available
-node --version
-npx --version
-
-# Test the server manually
-npx -y @modelcontextprotocol/server-filesystem /tmp
-```
-
-Verify your `~/.hermes/config.yaml` MCP configuration:
-```yaml
-mcp_servers:
-  filesystem:
-    command: "npx"
-    args: ["-y", "@modelcontextprotocol/server-filesystem", "/home/user/docs"]
-```
-
-#### Tools not showing up from MCP server
-
-**Cause:** Server started but tool discovery failed, or tools are filtered out.
-
-**Solution:**
- Check gateway/agent logs for MCP connection errors
- Ensure the server responds to the `tools/list` RPC method
- Restart the agent — MCP tools are discovered at startup
-
-```bash
-# Verify MCP servers are configured
-hermes config show | grep -A 5 mcp_servers
-
-# Restart hermes to re-discover tools
-hermes chat
-```
-
-#### MCP timeout errors
-
-**Cause:** The MCP server is taking too long to respond, or it crashed during execution.
-
-**Solution:**
- Increase the timeout in your MCP server config if supported
- Check if the MCP server process is still running
- For remote HTTP MCP servers, check network connectivity
-
-:::warning
-If an MCP server crashes mid-request, Hermes will report a timeout. Check the server's own logs (not just Hermes logs) to diagnose the root cause.
-:::
-
---
-
-## Still Stuck?
-
-If your issue isn't covered here:
-
-1. **Search existing issues:** [GitHub Issues](https://github.com/NousResearch/hermes-agent/issues)
-2. **Ask the community:** [Nous Research Discord](https://discord.gg/nousresearch)
-3. **File a bug report:** Include your OS, Python version (`python3 --version`), Hermes version (`hermes --version`), and the full error message
@@ -65,10 +65,6 @@ hermes -w -q "Fix issue #123"     # Single query in worktree

 The welcome banner shows your model, terminal backend, working directory, available tools, and installed skills at a glance.

-### Session Resume Display
-
-When resuming a previous session (`hermes -c` or `hermes --resume <id>`), a "Previous Conversation" panel appears between the banner and the input prompt, showing a compact recap of the conversation history. See [Sessions — Conversation Recap on Resume](sessions.md#conversation-recap-on-resume) for details and configuration.
-
 ## Keybindings

 | Key | Action |
@@ -75,7 +75,7 @@ The OpenAI Codex provider authenticates via device code (open a URL, enter a cod
 :::

 :::warning
-Even when using Nous Portal, Codex, or a custom endpoint, some tools (vision, web summarization, MoA) use a separate "auxiliary" model — by default Gemini Flash via OpenRouter. An `OPENROUTER_API_KEY` enables these tools automatically. You can also configure which model and provider these tools use — see [Auxiliary Models](#auxiliary-models) below.
+Even when using Nous Portal, Codex, or a custom endpoint, some tools (vision, web summarization, MoA) use OpenRouter independently. An `OPENROUTER_API_KEY` enables these tools.
 :::

 ### First-Class Chinese AI Providers
@@ -432,121 +432,9 @@ node_modules/
 ```yaml
 compression:
  enabled: true
-  threshold: 0.85              # Compress at 85% of context limit
-  summary_model: "google/gemini-3-flash-preview"   # Model for summarization
-  # summary_provider: "auto"   # "auto", "openrouter", "nous", "main"
+  threshold: 0.85    # Compress at 85% of context limit
 ```

-The `summary_model` must support a context length at least as large as your main model's, since it receives the full middle section of the conversation for compression.
-
-## Auxiliary Models
-
-Hermes uses lightweight "auxiliary" models for side tasks like image analysis, web page summarization, and browser screenshot analysis. By default, these use **Gemini Flash** via OpenRouter or Nous Portal — you don't need to configure anything.
-
-To use a different model, add an `auxiliary` section to `~/.hermes/config.yaml`:
-
-```yaml
-auxiliary:
-  # Image analysis (vision_analyze tool + browser screenshots)
-  vision:
-    provider: "auto"           # "auto", "openrouter", "nous", "main"
-    model: ""                  # e.g. "openai/gpt-4o", "google/gemini-2.5-flash"
-
-  # Web page summarization + browser page text extraction
-  web_extract:
-    provider: "auto"
-    model: ""                  # e.g. "google/gemini-2.5-flash"
-```
-
-### Changing the Vision Model
-
-To use GPT-4o instead of Gemini Flash for image analysis:
-
-```yaml
-auxiliary:
-  vision:
-    model: "openai/gpt-4o"
-```
-
-Or via environment variable (in `~/.hermes/.env`):
-
-```bash
-AUXILIARY_VISION_MODEL=openai/gpt-4o
-```
-
-### Provider Options
-
-| Provider | Description | Requirements |
-|----------|-------------|-------------|
-| `"auto"` | Best available (default). Vision tries OpenRouter → Nous → Codex. | — |
-| `"openrouter"` | Force OpenRouter — routes to any model (Gemini, GPT-4o, Claude, etc.) | `OPENROUTER_API_KEY` |
-| `"nous"` | Force Nous Portal | `hermes login` |
-| `"codex"` | Force Codex OAuth (ChatGPT account). Supports vision (gpt-5.3-codex). | `hermes model` → Codex |
-| `"main"` | Use your custom endpoint (`OPENAI_BASE_URL` + `OPENAI_API_KEY`). Works with OpenAI, local models, or any OpenAI-compatible API. | `OPENAI_BASE_URL` + `OPENAI_API_KEY` |
-
-### Common Setups
-
-**Using OpenAI API key for vision:**
-```yaml
-# In ~/.hermes/.env:
-# OPENAI_BASE_URL=https://api.openai.com/v1
-# OPENAI_API_KEY=sk-...
-
-auxiliary:
-  vision:
-    provider: "main"
-    model: "gpt-4o"       # or "gpt-4o-mini" for cheaper
-```
-
-**Using OpenRouter for vision** (route to any model):
-```yaml
-auxiliary:
-  vision:
-    provider: "openrouter"
-    model: "openai/gpt-4o"      # or "google/gemini-2.5-flash", etc.
-```
-
-**Using Codex OAuth** (ChatGPT Pro/Plus account — no API key needed):
-```yaml
-auxiliary:
-  vision:
-    provider: "codex"     # uses your ChatGPT OAuth token
-    # model defaults to gpt-5.3-codex (supports vision)
-```
-
-**Using a local/self-hosted model:**
-```yaml
-auxiliary:
-  vision:
-    provider: "main"      # uses your OPENAI_BASE_URL endpoint
-    model: "my-local-model"
-```
-
-:::tip
-If you use Codex OAuth as your main model provider, vision works automatically — no extra configuration needed. Codex is included in the auto-detection chain for vision.
-:::
-
-:::warning
-**Vision requires a multimodal model.** If you set `provider: "main"`, make sure your endpoint supports multimodal/vision — otherwise image analysis will fail.
-:::
-
-### Environment Variables
-
-You can also configure auxiliary models via environment variables instead of `config.yaml`:
-
-| Setting | Environment Variable |
-|---------|---------------------|
-| Vision provider | `AUXILIARY_VISION_PROVIDER` |
-| Vision model | `AUXILIARY_VISION_MODEL` |
-| Web extract provider | `AUXILIARY_WEB_EXTRACT_PROVIDER` |
-| Web extract model | `AUXILIARY_WEB_EXTRACT_MODEL` |
-| Compression provider | `CONTEXT_COMPRESSION_PROVIDER` |
-| Compression model | `CONTEXT_COMPRESSION_MODEL` |
-
-:::tip
-Run `hermes config` to see your current auxiliary model settings. Overrides only show up when they differ from the defaults.
-:::
-
 ## Reasoning Effort

 Control how much "thinking" the model does before responding:
@@ -580,8 +468,6 @@ display:
  tool_progress: all    # off | new | all | verbose
  personality: "kawaii"  # Default personality for the CLI
  compact: false         # Compact output mode (less whitespace)
-  resume_display: full   # full (show previous messages on resume) | minimal (one-liner only)
-  bell_on_complete: false  # Play terminal bell when agent finishes (great for long tasks)
 ```

 | Mode | What you see |
@@ -621,16 +507,6 @@ code_execution:
  max_tool_calls: 50           # Max tool calls within code execution
 ```

-## Browser
-
-Configure browser automation behavior:
-
-```yaml
-browser:
-  inactivity_timeout: 120        # Seconds before auto-closing idle sessions
-  record_sessions: false         # Auto-record browser sessions as WebM videos to ~/.hermes/browser_recordings/
-```
-
 ## Delegation

 Configure subagent behavior for the delegate tool:
@@ -142,16 +142,6 @@ What does the chart on this page show?

 Screenshots are stored in `~/.hermes/browser_screenshots/` and automatically cleaned up after 24 hours.

-### `browser_console`
-
-Get browser console output (log/warn/error messages) and uncaught JavaScript exceptions from the current page. Essential for detecting silent JS errors that don't appear in the accessibility tree.
-
-```
-Check the browser console for any JavaScript errors
-```
-
-Use `clear=True` to clear the console after reading, so subsequent calls only show new messages.
-
 ### `browser_close`

 Close the browser session and release resources. Call this when done to free up Browserbase session quota.
@@ -185,17 +175,6 @@ Agent workflow:
 4. browser_close()
 ```

-## Session Recording
-
-Automatically record browser sessions as WebM video files:
-
-```yaml
-browser:
-  record_sessions: true  # default: false
-```
-
-When enabled, recording starts automatically on the first `browser_navigate` and saves to `~/.hermes/browser_recordings/` when the session closes. Works in both local and cloud (Browserbase) modes. Recordings older than 72 hours are automatically cleaned up.
-
 ## Stealth Features

 Browserbase provides automatic stealth capabilities:
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
dmahan93	5f9c02bb37	fix: skip tests when atroposlib/minisweagent unavailable in CI - test_agent_loop_tool_calling.py: import atroposlib at module level to trigger skip (environments.agent_loop is now importable without atroposlib due to __init__.py graceful fallback) - test_modal_sandbox_fixes.py: skip TestToolResolution tests when minisweagent not installed	2026-03-09 23:37:32 -05:00
dmahan93	3dbeaea3dc	fix: guard all atroposlib imports for CI without atropos installed - environments/__init__.py: try/except on atroposlib imports so submodules like tool_call_parsers remain importable standalone - test_agent_loop.py, test_tool_call_parsers.py, test_managed_server_tool_support.py: skip at module level when atroposlib is missing	2026-03-09 23:33:24 -05:00
dmahan93	26d9b5af29	test: skip atropos-dependent tests when atroposlib not installed Guard all test files that import from environments/ or atroposlib with try/except + pytest.skip(allow_module_level=True) so they gracefully skip instead of crashing when deps aren't available.	2026-03-09 23:14:53 -05:00
dmahan93	ef8cb9afd2	add a local vllm instance	2026-03-09 23:02:13 -05:00
dmahan93	407a1e24b2	fix: use ManagedServer for vLLM in TBLite eval + local_vllm config TBLite eval was bypassing ManagedServer and calling ServerManager directly, which uses /v1/chat/completions — not available on the atropos vllm_api_server (/generate only). Now uses _use_managed_server() to detect vLLM/SGLang backends and route through ManagedServer (Phase 2) with proper tool_parser and /generate endpoint. Falls back to Phase 1 for OpenAI endpoints. Also adds local_vllm.yaml config for running against a local vLLM server with Docker sandboxes.	2026-03-09 21:32:23 -05:00
dmahan93	e1e69dfd32	fix: handle dict and object tool_calls in agent loop vLLM's ToolCallTranslator returns tool_calls as dicts, while OpenAI API returns them as objects with .id, .function.name etc. Normalize both formats in the agent loop.	2026-03-09 21:21:49 -05:00
dmahan93	003b6e49df	test: 5 vLLM integration tests + fallback tool call parser Tests hit a real vLLM server (Qwen/Qwen3-4B-Thinking-2507) via ManagedServer Phase 2. Auto-skip if server isn't running. Tests verify: - Single tool call through full agent loop - Multi-tool calls across turns - ManagedServer produces SequenceNodes with tokens/logprobs - Direct response without tools - Thinking model produces <think> blocks Also adds fallback parser in agent_loop.py: when ManagedServer's ToolCallTranslator can't parse (vLLM not installed), hermes-agent's standalone parsers extract <tool_call> tags from raw content.	2026-03-09 21:18:42 -05:00
dmahan93	dab2cfe566	add eval output to gitignore	2026-03-09 21:01:36 -05:00
dmahan93	c87bd5dd87	refactor: update to new atropos tool-calling API Migrate from old tool_call_parser (instance) to new ToolCallTranslator pattern from atropos add-openai-endpoint-for-managed-server branch: - Set tool_parser on ServerManager (string name, e.g. 'hermes') - Use managed_server(tokenizer=..., preserve_think_blocks=...) instead of managed_server(tokenizer=..., tool_call_parser=instance) - ManagedServer now handles tool call translation internally via ToolCallTranslator (bidirectional raw text <-> OpenAI tool_calls) - Remove old parser loading code (get_parser/KeyError fallback) The hermes-agent tool_call_parsers/ directory is preserved as a standalone fallback for environments that don't use vLLM's parsers.	2026-03-09 20:49:18 -05:00
dmahan93	2a67e4fa57	test: 9 agent loop tool-calling integration tests Real LLM calls via OpenRouter using stepfun/step-3.5-flash:free (zero cost). Falls back to paid models if free model is unavailable. Tests: single tool call, multi-tool single turn, multi-turn chains, unknown tool rejection, max_turns limit, direct response (no tools), tool error handling, AgentResult structure, conversation history.	2026-03-09 20:37:55 -05:00
dmahan93	136a64942d	feat: add eval_concurrency limit + Docker local config for TBLite - Add eval_concurrency config field with asyncio.Semaphore - Add local.yaml config using Docker backend (sandboxed, no cloud costs) - Register docker_image alongside modal_image for backend flexibility - Default: 8 parallel tasks for local runs	2026-03-09 20:28:28 -05:00
dmahan93	9f74d1f2ec	test: 13 tests for Modal sandbox infra fixes	2026-03-09 20:26:09 -05:00
dmahan93	11ad4173de	fix: Modal sandbox eval infra (9 fixes for TBLite baseline) Fixes discovered while running TBLite baseline evaluation: 1. ephemeral_disk param not supported in modal 1.3.5 - check before passing 2. Modal legacy image builder requires working pip - add ensurepip fix via setup_dockerfile_commands to handle task images with broken pip 3. Host cwd leaked into Modal sandbox - add /home/ to host prefix check 4. Tilde ~ not expanded by subprocess.run(cwd=) in sandboxes - use /root 5. install_pipx must stay True for swerex-remote to be available Dependencies also needed (not in this commit): - git submodule update --init mini-swe-agent - uv pip install swe-rex boto3	2026-03-09 18:36:28 -05:00
dmahan93	92cb77eaa7	Add tests for atropos tool calling integration - test_tool_call_parsers.py: 16 tests for parser registry, hermes parser (single/multiple/truncated/malformed), and ParseResult contract validation - test_agent_loop.py: 21 tests for HermesAgentLoop with mock servers (text responses, tool calls, max turns, unknown tools, API errors, extra_body forwarding, managed state, blocked tools, reasoning extraction) - test_managed_server_tool_support.py: 9 tests validating API compatibility between hermes-agent and atroposlib's ManagedServer tool_call_parser support (gracefully skips on baseline atroposlib, passes on tool_call_support branch)	2026-03-09 15:42:16 -05:00