fix(discord): persist thread participation across gateway restarts

_bot_participated_threads was an in-memory set — lost on every restart. After restart, the bot forgot which threads it was active in, requiring fresh @mentions and potentially creating duplicate threads instead of continuing existing conversations. Changes: - Persist thread IDs to ~/.hermes/discord_threads.json - Load on adapter init, save on every new thread participation - _track_thread() replaces direct .add() calls for atomic persist - Cap at 500 tracked threads to prevent unbounded growth - /thread slash command also tracks participation - 7 new tests covering persistence, restart survival, corruption recovery, cap enforcement
fix: add metadata param to base send_image and forward in send_animation
2026-03-17 02:26:34 -07:00 · 2026-03-17 02:02:28 -07:00 · 2026-03-17 01:59:07 -07:00 · 2026-03-17 01:53:58 -07:00 · 2026-03-17 01:52:51 -07:00 · 2026-03-17 01:52:46 -07:00
116 changed files with 12541 additions and 2267 deletions
@@ -129,14 +129,50 @@ Messages follow OpenAI format: `{"role": "system/user/assistant/tool", ...}`. Re
 - **KawaiiSpinner** (`agent/display.py`) — animated faces during API calls, `┊` activity feed for tool results
 - `load_cli_config()` in cli.py merges hardcoded defaults + user config YAML
 - **Skin engine** (`hermes_cli/skin_engine.py`) — data-driven CLI theming; initialized from `display.skin` config key at startup; skins customize banner colors, spinner faces/verbs/wings, tool prefix, response box, branding text
- `process_command()` is a method on `HermesCLI` (not in commands.py)
+- `process_command()` is a method on `HermesCLI` — dispatches on canonical command name resolved via `resolve_command()` from the central registry
 - Skill slash commands: `agent/skill_commands.py` scans `~/.hermes/skills/`, injects as **user message** (not system prompt) to preserve prompt caching

-### Adding CLI Commands
+### Slash Command Registry (`hermes_cli/commands.py`)

-1. Add to `COMMANDS` dict in `hermes_cli/commands.py`
-2. Add handler in `HermesCLI.process_command()` in `cli.py`
-3. For persistent settings, use `save_config_value()` in `cli.py`
+All slash commands are defined in a central `COMMAND_REGISTRY` list of `CommandDef` objects. Every downstream consumer derives from this registry automatically:
+
+- **CLI** — `process_command()` resolves aliases via `resolve_command()`, dispatches on canonical name
+- **Gateway** — `GATEWAY_KNOWN_COMMANDS` frozenset for hook emission, `resolve_command()` for dispatch
+- **Gateway help** — `gateway_help_lines()` generates `/help` output
+- **Telegram** — `telegram_bot_commands()` generates the BotCommand menu
+- **Slack** — `slack_subcommand_map()` generates `/hermes` subcommand routing
+- **Autocomplete** — `COMMANDS` flat dict feeds `SlashCommandCompleter`
+- **CLI help** — `COMMANDS_BY_CATEGORY` dict feeds `show_help()`
+
+### Adding a Slash Command
+
+1. Add a `CommandDef` entry to `COMMAND_REGISTRY` in `hermes_cli/commands.py`:
+```python
+CommandDef("mycommand", "Description of what it does", "Session",
+           aliases=("mc",), args_hint="[arg]"),
+```
+2. Add handler in `HermesCLI.process_command()` in `cli.py`:
+```python
+elif canonical == "mycommand":
+    self._handle_mycommand(cmd_original)
+```
+3. If the command is available in the gateway, add a handler in `gateway/run.py`:
+```python
+if canonical == "mycommand":
+    return await self._handle_mycommand(event)
+```
+4. For persistent settings, use `save_config_value()` in `cli.py`
+
+**CommandDef fields:**
+- `name` — canonical name without slash (e.g. `"background"`)
+- `description` — human-readable description
+- `category` — one of `"Session"`, `"Configuration"`, `"Tools & Skills"`, `"Info"`, `"Exit"`
+- `aliases` — tuple of alternative names (e.g. `("bg",)`)
+- `args_hint` — argument placeholder shown in help (e.g. `"<prompt>"`, `"[name]"`)
+- `cli_only` — only available in the interactive CLI
+- `gateway_only` — only available in messaging platforms
+
+**Adding an alias** requires only adding it to the `aliases` tuple on the existing `CommandDef`. No other file changes needed — dispatch, help text, Telegram menu, Slack mapping, and autocomplete all update automatically.

 ---

@@ -136,7 +136,7 @@ hermes-agent/
 │   ├── auth.py                   # Provider resolution, OAuth, Nous Portal
 │   ├── models.py                 # OpenRouter model selection lists
 │   ├── banner.py                 # Welcome banner, ASCII art
-│   ├── commands.py               # Slash command definitions + autocomplete
+│   ├── commands.py               # Central slash command registry (CommandDef), autocomplete, gateway helpers
 │   ├── callbacks.py              # Interactive callbacks (clarify, sudo, approval)
 │   ├── doctor.py                 # Diagnostics
 │   ├── skills_hub.py             # Skills Hub CLI + /skills slash command
@@ -0,0 +1,377 @@
+# Hermes Agent v0.3.0 (v2026.3.17)
+
+**Release Date:** March 17, 2026
+
+> The streaming, plugins, and provider release — unified real-time token delivery, first-class plugin architecture, rebuilt provider system with Vercel AI Gateway, native Anthropic provider, smart approvals, live Chrome CDP browser connect, ACP IDE integration, Honcho memory, voice mode, persistent shell, and 50+ bug fixes across every platform.
+
+---
+
+## ✨ Highlights
+
+- **Unified Streaming Infrastructure** — Real-time token-by-token delivery in CLI and all gateway platforms. Responses stream as they're generated instead of arriving as a block. ([#1538](https://github.com/NousResearch/hermes-agent/pull/1538))
+
+- **First-Class Plugin Architecture** — Drop Python files into `~/.hermes/plugins/` to extend Hermes with custom tools, commands, and hooks. No forking required. ([#1544](https://github.com/NousResearch/hermes-agent/pull/1544), [#1555](https://github.com/NousResearch/hermes-agent/pull/1555))
+
+- **Native Anthropic Provider** — Direct Anthropic API calls with Claude Code credential auto-discovery, OAuth PKCE flows, and native prompt caching. No OpenRouter middleman needed. ([#1097](https://github.com/NousResearch/hermes-agent/pull/1097))
+
+- **Smart Approvals + /stop Command** — Codex-inspired approval system that learns which commands are safe and remembers your preferences. `/stop` kills the current agent run immediately. ([#1543](https://github.com/NousResearch/hermes-agent/pull/1543))
+
+- **Honcho Memory Integration** — Async memory writes, configurable recall modes, session title integration, and multi-user isolation in gateway mode. By @erosika. ([#736](https://github.com/NousResearch/hermes-agent/pull/736))
+
+- **Voice Mode** — Push-to-talk in CLI, voice notes in Telegram/Discord, Discord voice channel support, and local Whisper transcription via faster-whisper. ([#1299](https://github.com/NousResearch/hermes-agent/pull/1299), [#1185](https://github.com/NousResearch/hermes-agent/pull/1185), [#1429](https://github.com/NousResearch/hermes-agent/pull/1429))
+
+- **Concurrent Tool Execution** — Multiple independent tool calls now run in parallel via ThreadPoolExecutor, significantly reducing latency for multi-tool turns. ([#1152](https://github.com/NousResearch/hermes-agent/pull/1152))
+
+- **PII Redaction** — When `privacy.redact_pii` is enabled, personally identifiable information is automatically scrubbed before sending context to LLM providers. ([#1542](https://github.com/NousResearch/hermes-agent/pull/1542))
+
+- **`/browser connect` via CDP** — Attach browser tools to a live Chrome instance through Chrome DevTools Protocol. Debug, inspect, and interact with pages you already have open. ([#1549](https://github.com/NousResearch/hermes-agent/pull/1549))
+
+- **Vercel AI Gateway Provider** — Route Hermes through Vercel's AI Gateway for access to their model catalog and infrastructure. ([#1628](https://github.com/NousResearch/hermes-agent/pull/1628))
+
+- **Centralized Provider Router** — Rebuilt provider system with `call_llm` API, unified `/model` command, auto-detect provider on model switch, and direct endpoint overrides for auxiliary/delegation clients. ([#1003](https://github.com/NousResearch/hermes-agent/pull/1003), [#1506](https://github.com/NousResearch/hermes-agent/pull/1506), [#1375](https://github.com/NousResearch/hermes-agent/pull/1375))
+
+- **ACP Server (IDE Integration)** — VS Code, Zed, and JetBrains can now connect to Hermes as an agent backend, with full slash command support. ([#1254](https://github.com/NousResearch/hermes-agent/pull/1254), [#1532](https://github.com/NousResearch/hermes-agent/pull/1532))
+
+- **Persistent Shell Mode** — Local and SSH terminal backends can maintain shell state across tool calls — cd, env vars, and aliases persist. By @alt-glitch. ([#1067](https://github.com/NousResearch/hermes-agent/pull/1067), [#1483](https://github.com/NousResearch/hermes-agent/pull/1483))
+
+- **Agentic On-Policy Distillation (OPD)** — New RL training environment for distilling agent policies, expanding the Atropos training ecosystem. ([#1149](https://github.com/NousResearch/hermes-agent/pull/1149))
+
+---
+
+## 🏗️ Core Agent & Architecture
+
+### Provider & Model Support
+- **Centralized provider router** with `call_llm` API and unified `/model` command — switch models and providers seamlessly ([#1003](https://github.com/NousResearch/hermes-agent/pull/1003))
+- **Vercel AI Gateway** provider support ([#1628](https://github.com/NousResearch/hermes-agent/pull/1628))
+- **Auto-detect provider** when switching models via `/model` ([#1506](https://github.com/NousResearch/hermes-agent/pull/1506))
+- **Direct endpoint overrides** for auxiliary and delegation clients — point vision/subagent calls at specific endpoints ([#1375](https://github.com/NousResearch/hermes-agent/pull/1375))
+- **Native Anthropic auxiliary vision** — use Claude's native vision API instead of routing through OpenAI-compatible endpoints ([#1377](https://github.com/NousResearch/hermes-agent/pull/1377))
+- Anthropic OAuth flow improvements — auto-run `claude setup-token`, reauthentication, PKCE state persistence, identity fingerprinting ([#1132](https://github.com/NousResearch/hermes-agent/pull/1132), [#1360](https://github.com/NousResearch/hermes-agent/pull/1360), [#1396](https://github.com/NousResearch/hermes-agent/pull/1396), [#1597](https://github.com/NousResearch/hermes-agent/pull/1597))
+- Fix adaptive thinking without `budget_tokens` for Claude 4.6 models — by @ASRagab ([#1128](https://github.com/NousResearch/hermes-agent/pull/1128))
+- Fix Anthropic cache markers through adapter — by @brandtcormorant ([#1216](https://github.com/NousResearch/hermes-agent/pull/1216))
+- Retry Anthropic 429/529 errors and surface details to users — by @0xbyt4 ([#1585](https://github.com/NousResearch/hermes-agent/pull/1585))
+- Fix Anthropic adapter max_tokens, fallback crash, proxy base_url — by @0xbyt4 ([#1121](https://github.com/NousResearch/hermes-agent/pull/1121))
+- Fix DeepSeek V3 parser dropping multiple parallel tool calls — by @mr-emmett-one ([#1365](https://github.com/NousResearch/hermes-agent/pull/1365), [#1300](https://github.com/NousResearch/hermes-agent/pull/1300))
+- Accept unlisted models with warning instead of rejecting ([#1047](https://github.com/NousResearch/hermes-agent/pull/1047), [#1102](https://github.com/NousResearch/hermes-agent/pull/1102))
+- Skip reasoning params for unsupported OpenRouter models ([#1485](https://github.com/NousResearch/hermes-agent/pull/1485))
+- MiniMax Anthropic API compatibility fix ([#1623](https://github.com/NousResearch/hermes-agent/pull/1623))
+- Custom endpoint `/models` verification and `/v1` base URL suggestion ([#1480](https://github.com/NousResearch/hermes-agent/pull/1480))
+- Resolve delegation providers from `custom_providers` config ([#1328](https://github.com/NousResearch/hermes-agent/pull/1328))
+- Kimi model additions and User-Agent fix ([#1039](https://github.com/NousResearch/hermes-agent/pull/1039))
+- Strip `call_id`/`response_item_id` for Mistral compatibility ([#1058](https://github.com/NousResearch/hermes-agent/pull/1058))
+
+### Agent Loop & Conversation
+- **Anthropic Context Editing API** support ([#1147](https://github.com/NousResearch/hermes-agent/pull/1147))
+- Improved context compaction handoff summaries — compressor now preserves more actionable state ([#1273](https://github.com/NousResearch/hermes-agent/pull/1273))
+- Sync session_id after mid-run context compression ([#1160](https://github.com/NousResearch/hermes-agent/pull/1160))
+- Session hygiene threshold tuned to 50% for more proactive compression ([#1096](https://github.com/NousResearch/hermes-agent/pull/1096), [#1161](https://github.com/NousResearch/hermes-agent/pull/1161))
+- Include session ID in system prompt via `--pass-session-id` flag ([#1040](https://github.com/NousResearch/hermes-agent/pull/1040))
+- Prevent closed OpenAI client reuse across retries ([#1391](https://github.com/NousResearch/hermes-agent/pull/1391))
+- Sanitize chat payloads and provider precedence ([#1253](https://github.com/NousResearch/hermes-agent/pull/1253))
+- Handle dict tool call arguments from Codex and local backends ([#1393](https://github.com/NousResearch/hermes-agent/pull/1393), [#1440](https://github.com/NousResearch/hermes-agent/pull/1440))
+
+### Memory & Sessions
+- **Improve memory prioritization** — user preferences and corrections weighted above procedural knowledge ([#1548](https://github.com/NousResearch/hermes-agent/pull/1548))
+- Tighter memory and session recall guidance in system prompts ([#1329](https://github.com/NousResearch/hermes-agent/pull/1329))
+- Persist CLI token counts to session DB for `/insights` ([#1498](https://github.com/NousResearch/hermes-agent/pull/1498))
+- Keep Honcho recall out of the cached system prefix ([#1201](https://github.com/NousResearch/hermes-agent/pull/1201))
+- Correct `seed_ai_identity` to use `session.add_messages()` ([#1475](https://github.com/NousResearch/hermes-agent/pull/1475))
+- Isolate Honcho session routing for multi-user gateway ([#1500](https://github.com/NousResearch/hermes-agent/pull/1500))
+
+---
+
+## 📱 Messaging Platforms (Gateway)
+
+### Gateway Core
+- **System gateway service mode** — run as a system-level systemd service, not just user-level ([#1371](https://github.com/NousResearch/hermes-agent/pull/1371))
+- **Gateway install scope prompts** — choose user vs system scope during setup ([#1374](https://github.com/NousResearch/hermes-agent/pull/1374))
+- **Reasoning hot reload** — change reasoning settings without restarting the gateway ([#1275](https://github.com/NousResearch/hermes-agent/pull/1275))
+- Default group sessions to per-user isolation — no more shared state across users in group chats ([#1495](https://github.com/NousResearch/hermes-agent/pull/1495), [#1417](https://github.com/NousResearch/hermes-agent/pull/1417))
+- Harden gateway restart recovery ([#1310](https://github.com/NousResearch/hermes-agent/pull/1310))
+- Cancel active runs during shutdown ([#1427](https://github.com/NousResearch/hermes-agent/pull/1427))
+- SSL certificate auto-detection for NixOS and non-standard systems ([#1494](https://github.com/NousResearch/hermes-agent/pull/1494))
+- Auto-detect D-Bus session bus for `systemctl --user` on headless servers ([#1601](https://github.com/NousResearch/hermes-agent/pull/1601))
+- Auto-enable systemd linger during gateway install on headless servers ([#1334](https://github.com/NousResearch/hermes-agent/pull/1334))
+- Fall back to module entrypoint when `hermes` is not on PATH ([#1355](https://github.com/NousResearch/hermes-agent/pull/1355))
+- Fix dual gateways on macOS launchd after `hermes update` ([#1567](https://github.com/NousResearch/hermes-agent/pull/1567))
+- Remove recursive ExecStop from systemd units ([#1530](https://github.com/NousResearch/hermes-agent/pull/1530))
+- Prevent logging handler accumulation in gateway mode ([#1251](https://github.com/NousResearch/hermes-agent/pull/1251))
+- Restart on retryable startup failures — by @jplew ([#1517](https://github.com/NousResearch/hermes-agent/pull/1517))
+- Backfill model on gateway sessions after agent runs ([#1306](https://github.com/NousResearch/hermes-agent/pull/1306))
+- PID-based gateway kill and deferred config write ([#1499](https://github.com/NousResearch/hermes-agent/pull/1499))
+
+### Telegram
+- Buffer media groups to prevent self-interruption from photo bursts ([#1341](https://github.com/NousResearch/hermes-agent/pull/1341), [#1422](https://github.com/NousResearch/hermes-agent/pull/1422))
+- Retry on transient TLS failures during connect and send ([#1535](https://github.com/NousResearch/hermes-agent/pull/1535))
+- Harden polling conflict handling ([#1339](https://github.com/NousResearch/hermes-agent/pull/1339))
+- Escape chunk indicators and inline code in MarkdownV2 ([#1478](https://github.com/NousResearch/hermes-agent/pull/1478), [#1626](https://github.com/NousResearch/hermes-agent/pull/1626))
+- Check updater/app state before disconnect ([#1389](https://github.com/NousResearch/hermes-agent/pull/1389))
+
+### Discord
+- `/thread` command with `auto_thread` config and media metadata fixes ([#1178](https://github.com/NousResearch/hermes-agent/pull/1178))
+- Auto-thread on @mention, skip mention text in bot threads ([#1438](https://github.com/NousResearch/hermes-agent/pull/1438))
+- Retry without reply reference for system messages ([#1385](https://github.com/NousResearch/hermes-agent/pull/1385))
+- Preserve native document and video attachment support ([#1392](https://github.com/NousResearch/hermes-agent/pull/1392))
+- Defer discord adapter annotations to avoid optional import crashes ([#1314](https://github.com/NousResearch/hermes-agent/pull/1314))
+
+### Slack
+- Thread handling overhaul — progress messages, responses, and session isolation all respect threads ([#1103](https://github.com/NousResearch/hermes-agent/pull/1103))
+- Formatting, reactions, user resolution, and command improvements ([#1106](https://github.com/NousResearch/hermes-agent/pull/1106))
+- Fix MAX_MESSAGE_LENGTH 3900 → 39000 ([#1117](https://github.com/NousResearch/hermes-agent/pull/1117))
+- File upload fallback preserves thread context — by @0xbyt4 ([#1122](https://github.com/NousResearch/hermes-agent/pull/1122))
+- Improve setup guidance ([#1387](https://github.com/NousResearch/hermes-agent/pull/1387))
+
+### Email
+- Fix IMAP UID tracking and SMTP TLS verification ([#1305](https://github.com/NousResearch/hermes-agent/pull/1305))
+- Add `skip_attachments` option via config.yaml ([#1536](https://github.com/NousResearch/hermes-agent/pull/1536))
+
+### Home Assistant
+- Event filtering closed by default ([#1169](https://github.com/NousResearch/hermes-agent/pull/1169))
+
+---
+
+## 🖥️ CLI & User Experience
+
+### Interactive CLI
+- **Persistent CLI status bar** — always-visible model, provider, and token counts ([#1522](https://github.com/NousResearch/hermes-agent/pull/1522))
+- **File path autocomplete** in the input prompt ([#1545](https://github.com/NousResearch/hermes-agent/pull/1545))
+- **`/plan` command** — generate implementation plans from specs ([#1372](https://github.com/NousResearch/hermes-agent/pull/1372), [#1381](https://github.com/NousResearch/hermes-agent/pull/1381))
+- **Major `/rollback` improvements** — richer checkpoint history, clearer UX ([#1505](https://github.com/NousResearch/hermes-agent/pull/1505))
+- **Preload CLI skills on launch** — skills are ready before the first prompt ([#1359](https://github.com/NousResearch/hermes-agent/pull/1359))
+- **Centralized slash command registry** — all commands defined once, consumed everywhere ([#1603](https://github.com/NousResearch/hermes-agent/pull/1603))
+- `/bg` alias for `/background` ([#1590](https://github.com/NousResearch/hermes-agent/pull/1590))
+- Prefix matching for slash commands — `/mod` resolves to `/model` ([#1320](https://github.com/NousResearch/hermes-agent/pull/1320))
+- `/new`, `/reset`, `/clear` now start genuinely fresh sessions ([#1237](https://github.com/NousResearch/hermes-agent/pull/1237))
+- Accept session ID prefixes for session actions ([#1425](https://github.com/NousResearch/hermes-agent/pull/1425))
+- TUI prompt and accent output now respect active skin ([#1282](https://github.com/NousResearch/hermes-agent/pull/1282))
+- Centralize tool emoji metadata in registry + skin integration ([#1484](https://github.com/NousResearch/hermes-agent/pull/1484))
+- "View full command" option added to dangerous command approval — by @teknium1 based on design by community ([#887](https://github.com/NousResearch/hermes-agent/pull/887))
+- Non-blocking startup update check and banner deduplication ([#1386](https://github.com/NousResearch/hermes-agent/pull/1386))
+- `/reasoning` command output ordering and inline think extraction fixes ([#1031](https://github.com/NousResearch/hermes-agent/pull/1031))
+- Verbose mode shows full untruncated output ([#1472](https://github.com/NousResearch/hermes-agent/pull/1472))
+- Fix `/status` to report live state and tokens ([#1476](https://github.com/NousResearch/hermes-agent/pull/1476))
+- Seed a default global SOUL.md ([#1311](https://github.com/NousResearch/hermes-agent/pull/1311))
+
+### Setup & Configuration
+- **OpenClaw migration** during first-time setup — by @kshitijk4poor ([#981](https://github.com/NousResearch/hermes-agent/pull/981))
+- `hermes claw migrate` command + migration docs ([#1059](https://github.com/NousResearch/hermes-agent/pull/1059))
+- Smart vision setup that respects the user's chosen provider ([#1323](https://github.com/NousResearch/hermes-agent/pull/1323))
+- Handle headless setup flows end-to-end ([#1274](https://github.com/NousResearch/hermes-agent/pull/1274))
+- Prefer curses over `simple_term_menu` in setup.py ([#1487](https://github.com/NousResearch/hermes-agent/pull/1487))
+- Show effective model and provider in `/status` ([#1284](https://github.com/NousResearch/hermes-agent/pull/1284))
+- Config set examples use placeholder syntax ([#1322](https://github.com/NousResearch/hermes-agent/pull/1322))
+- Reload .env over stale shell overrides ([#1434](https://github.com/NousResearch/hermes-agent/pull/1434))
+- Fix is_coding_plan NameError crash — by @0xbyt4 ([#1123](https://github.com/NousResearch/hermes-agent/pull/1123))
+- Add missing packages to setuptools config — by @alt-glitch ([#912](https://github.com/NousResearch/hermes-agent/pull/912))
+- Installer: clarify why sudo is needed at every prompt ([#1602](https://github.com/NousResearch/hermes-agent/pull/1602))
+
+---
+
+## 🔧 Tool System
+
+### Terminal & Execution
+- **Persistent shell mode** for local and SSH backends — maintain shell state across tool calls — by @alt-glitch ([#1067](https://github.com/NousResearch/hermes-agent/pull/1067), [#1483](https://github.com/NousResearch/hermes-agent/pull/1483))
+- **Tirith pre-exec command scanning** — security layer that analyzes commands before execution ([#1256](https://github.com/NousResearch/hermes-agent/pull/1256))
+- Strip Hermes provider env vars from all subprocess environments ([#1157](https://github.com/NousResearch/hermes-agent/pull/1157), [#1172](https://github.com/NousResearch/hermes-agent/pull/1172), [#1399](https://github.com/NousResearch/hermes-agent/pull/1399), [#1419](https://github.com/NousResearch/hermes-agent/pull/1419)) — initial fix by @eren-karakus0
+- SSH preflight check ([#1486](https://github.com/NousResearch/hermes-agent/pull/1486))
+- Docker backend: make cwd workspace mount explicit opt-in ([#1534](https://github.com/NousResearch/hermes-agent/pull/1534))
+- Add project root to PYTHONPATH in execute_code sandbox ([#1383](https://github.com/NousResearch/hermes-agent/pull/1383))
+- Eliminate execute_code progress spam on gateway platforms ([#1098](https://github.com/NousResearch/hermes-agent/pull/1098))
+- Clearer docker backend preflight errors ([#1276](https://github.com/NousResearch/hermes-agent/pull/1276))
+
+### Browser
+- **`/browser connect`** — attach browser tools to a live Chrome instance via CDP ([#1549](https://github.com/NousResearch/hermes-agent/pull/1549))
+- Improve browser cleanup, local browser PATH setup, and screenshot recovery ([#1333](https://github.com/NousResearch/hermes-agent/pull/1333))
+
+### MCP
+- **Selective tool loading** with utility policies — filter which MCP tools are available ([#1302](https://github.com/NousResearch/hermes-agent/pull/1302))
+- Auto-reload MCP tools when `mcp_servers` config changes without restart ([#1474](https://github.com/NousResearch/hermes-agent/pull/1474))
+- Resolve npx stdio connection failures ([#1291](https://github.com/NousResearch/hermes-agent/pull/1291))
+- Preserve MCP toolsets when saving platform tool config ([#1421](https://github.com/NousResearch/hermes-agent/pull/1421))
+
+### Vision
+- Unify vision backend gating ([#1367](https://github.com/NousResearch/hermes-agent/pull/1367))
+- Surface actual error reason instead of generic message ([#1338](https://github.com/NousResearch/hermes-agent/pull/1338))
+- Make Claude image handling work end-to-end ([#1408](https://github.com/NousResearch/hermes-agent/pull/1408))
+
+### Cron
+- **Compress cron management into one tool** — single `cronjob` tool replaces multiple commands ([#1343](https://github.com/NousResearch/hermes-agent/pull/1343))
+- Suppress duplicate cron sends to auto-delivery targets ([#1357](https://github.com/NousResearch/hermes-agent/pull/1357))
+- Persist cron sessions to SQLite ([#1255](https://github.com/NousResearch/hermes-agent/pull/1255))
+- Per-job runtime overrides (provider, model, base_url) ([#1398](https://github.com/NousResearch/hermes-agent/pull/1398))
+- Atomic write in `save_job_output` to prevent data loss on crash ([#1173](https://github.com/NousResearch/hermes-agent/pull/1173))
+- Preserve thread context for `deliver=origin` ([#1437](https://github.com/NousResearch/hermes-agent/pull/1437))
+
+### Patch Tool
+- Avoid corrupting pipe chars in V4A patch apply ([#1286](https://github.com/NousResearch/hermes-agent/pull/1286))
+- Permissive `block_anchor` thresholds and unicode normalization ([#1539](https://github.com/NousResearch/hermes-agent/pull/1539))
+
+### Delegation
+- Add observability metadata to subagent results (model, tokens, duration, tool trace) ([#1175](https://github.com/NousResearch/hermes-agent/pull/1175))
+
+---
+
+## 🧩 Skills Ecosystem
+
+### Skills System
+- **Integrate skills.sh** as a hub source alongside ClawHub ([#1303](https://github.com/NousResearch/hermes-agent/pull/1303))
+- Secure skill env setup on load ([#1153](https://github.com/NousResearch/hermes-agent/pull/1153))
+- Honor policy table for dangerous verdicts ([#1330](https://github.com/NousResearch/hermes-agent/pull/1330))
+- Harden ClawHub skill search exact matches ([#1400](https://github.com/NousResearch/hermes-agent/pull/1400))
+- Fix ClawHub skill install — use `/download` ZIP endpoint ([#1060](https://github.com/NousResearch/hermes-agent/pull/1060))
+- Avoid mislabeling local skills as builtin — by @arceus77-7 ([#862](https://github.com/NousResearch/hermes-agent/pull/862))
+
+### New Skills
+- **Linear** project management ([#1230](https://github.com/NousResearch/hermes-agent/pull/1230))
+- **X/Twitter** via x-cli ([#1285](https://github.com/NousResearch/hermes-agent/pull/1285))
+- **Telephony** — Twilio, SMS, and AI calls ([#1289](https://github.com/NousResearch/hermes-agent/pull/1289))
+- **1Password** — by @arceus77-7 ([#883](https://github.com/NousResearch/hermes-agent/pull/883), [#1179](https://github.com/NousResearch/hermes-agent/pull/1179))
+- **NeuroSkill BCI** integration ([#1135](https://github.com/NousResearch/hermes-agent/pull/1135))
+- **Blender MCP** for 3D modeling ([#1531](https://github.com/NousResearch/hermes-agent/pull/1531))
+- **OSS Security Forensics** ([#1482](https://github.com/NousResearch/hermes-agent/pull/1482))
+- **Parallel CLI** research skill ([#1301](https://github.com/NousResearch/hermes-agent/pull/1301))
+- **OpenCode** CLI skill ([#1174](https://github.com/NousResearch/hermes-agent/pull/1174))
+- **ASCII Video** skill refactored — by @SHL0MS ([#1213](https://github.com/NousResearch/hermes-agent/pull/1213), [#1598](https://github.com/NousResearch/hermes-agent/pull/1598))
+
+---
+
+## 🎙️ Voice Mode
+
+- Voice mode foundation — push-to-talk CLI, Telegram/Discord voice notes ([#1299](https://github.com/NousResearch/hermes-agent/pull/1299))
+- Free local Whisper transcription via faster-whisper ([#1185](https://github.com/NousResearch/hermes-agent/pull/1185))
+- Discord voice channel reliability fixes ([#1429](https://github.com/NousResearch/hermes-agent/pull/1429))
+- Restore local STT fallback for gateway voice notes ([#1490](https://github.com/NousResearch/hermes-agent/pull/1490))
+- Honor `stt.enabled: false` across gateway transcription ([#1394](https://github.com/NousResearch/hermes-agent/pull/1394))
+- Fix bogus incapability message on Telegram voice notes (Issue [#1033](https://github.com/NousResearch/hermes-agent/issues/1033))
+
+---
+
+## 🔌 ACP (IDE Integration)
+
+- Restore ACP server implementation ([#1254](https://github.com/NousResearch/hermes-agent/pull/1254))
+- Support slash commands in ACP adapter ([#1532](https://github.com/NousResearch/hermes-agent/pull/1532))
+
+---
+
+## 🧪 RL Training
+
+- **Agentic On-Policy Distillation (OPD)** environment — new RL training environment for agent policy distillation ([#1149](https://github.com/NousResearch/hermes-agent/pull/1149))
+- Make tinker-atropos RL training fully optional ([#1062](https://github.com/NousResearch/hermes-agent/pull/1062))
+
+---
+
+## 🔒 Security & Reliability
+
+### Security Hardening
+- **Tirith pre-exec command scanning** — static analysis of terminal commands before execution ([#1256](https://github.com/NousResearch/hermes-agent/pull/1256))
+- **PII redaction** when `privacy.redact_pii` is enabled ([#1542](https://github.com/NousResearch/hermes-agent/pull/1542))
+- Strip Hermes provider/gateway/tool env vars from all subprocess environments ([#1157](https://github.com/NousResearch/hermes-agent/pull/1157), [#1172](https://github.com/NousResearch/hermes-agent/pull/1172), [#1399](https://github.com/NousResearch/hermes-agent/pull/1399), [#1419](https://github.com/NousResearch/hermes-agent/pull/1419))
+- Docker cwd workspace mount now explicit opt-in — never auto-mount host directories ([#1534](https://github.com/NousResearch/hermes-agent/pull/1534))
+- Escape parens and braces in fork bomb regex pattern ([#1397](https://github.com/NousResearch/hermes-agent/pull/1397))
+- Harden `.worktreeinclude` path containment ([#1388](https://github.com/NousResearch/hermes-agent/pull/1388))
+- Use description as `pattern_key` to prevent approval collisions ([#1395](https://github.com/NousResearch/hermes-agent/pull/1395))
+
+### Reliability
+- Guard init-time stdio writes ([#1271](https://github.com/NousResearch/hermes-agent/pull/1271))
+- Session log writes reuse shared atomic JSON helper ([#1280](https://github.com/NousResearch/hermes-agent/pull/1280))
+- Atomic temp cleanup protected on interrupts ([#1401](https://github.com/NousResearch/hermes-agent/pull/1401))
+
+---
+
+## 🐛 Notable Bug Fixes
+
+- **`/status` always showing 0 tokens** — now reports live state (Issue [#1465](https://github.com/NousResearch/hermes-agent/issues/1465), [#1476](https://github.com/NousResearch/hermes-agent/pull/1476))
+- **Custom model endpoints not working** — restored config-saved endpoint resolution (Issue [#1460](https://github.com/NousResearch/hermes-agent/issues/1460), [#1373](https://github.com/NousResearch/hermes-agent/pull/1373))
+- **MCP tools not visible until restart** — auto-reload on config change (Issue [#1036](https://github.com/NousResearch/hermes-agent/issues/1036), [#1474](https://github.com/NousResearch/hermes-agent/pull/1474))
+- **`hermes tools` removing MCP tools** — preserve MCP toolsets when saving (Issue [#1247](https://github.com/NousResearch/hermes-agent/issues/1247), [#1421](https://github.com/NousResearch/hermes-agent/pull/1421))
+- **Terminal subprocesses inheriting `OPENAI_BASE_URL`** breaking external tools (Issue [#1002](https://github.com/NousResearch/hermes-agent/issues/1002), [#1399](https://github.com/NousResearch/hermes-agent/pull/1399))
+- **Background process lost on gateway restart** — improved recovery (Issue [#1144](https://github.com/NousResearch/hermes-agent/issues/1144))
+- **Cron jobs not persisting state** — now stored in SQLite (Issue [#1416](https://github.com/NousResearch/hermes-agent/issues/1416), [#1255](https://github.com/NousResearch/hermes-agent/pull/1255))
+- **Cronjob `deliver: origin` not preserving thread context** (Issue [#1219](https://github.com/NousResearch/hermes-agent/issues/1219), [#1437](https://github.com/NousResearch/hermes-agent/pull/1437))
+- **Gateway systemd service failing to auto-restart** when browser processes orphaned (Issue [#1617](https://github.com/NousResearch/hermes-agent/issues/1617))
+- **`/background` completion report cut off in Telegram** (Issue [#1443](https://github.com/NousResearch/hermes-agent/issues/1443))
+- **Model switching not taking effect** (Issue [#1244](https://github.com/NousResearch/hermes-agent/issues/1244), [#1183](https://github.com/NousResearch/hermes-agent/pull/1183))
+- **`hermes doctor` reporting cronjob as unavailable** (Issue [#878](https://github.com/NousResearch/hermes-agent/issues/878), [#1180](https://github.com/NousResearch/hermes-agent/pull/1180))
+- **WhatsApp bridge messages not received** from mobile (Issue [#1142](https://github.com/NousResearch/hermes-agent/issues/1142))
+- **Setup wizard hanging on headless SSH** (Issue [#905](https://github.com/NousResearch/hermes-agent/issues/905), [#1274](https://github.com/NousResearch/hermes-agent/pull/1274))
+- **Log handler accumulation** degrading gateway performance (Issue [#990](https://github.com/NousResearch/hermes-agent/issues/990), [#1251](https://github.com/NousResearch/hermes-agent/pull/1251))
+- **Gateway NULL model in DB** (Issue [#987](https://github.com/NousResearch/hermes-agent/issues/987), [#1306](https://github.com/NousResearch/hermes-agent/pull/1306))
+- **Strict endpoints rejecting replayed tool_calls** (Issue [#893](https://github.com/NousResearch/hermes-agent/issues/893))
+- **Remaining hardcoded `~/.hermes` paths** — all now respect `HERMES_HOME` (Issue [#892](https://github.com/NousResearch/hermes-agent/issues/892), [#1233](https://github.com/NousResearch/hermes-agent/pull/1233))
+- **Delegate tool not working with custom inference providers** (Issue [#1011](https://github.com/NousResearch/hermes-agent/issues/1011), [#1328](https://github.com/NousResearch/hermes-agent/pull/1328))
+- **Skills Guard blocking official skills** (Issue [#1006](https://github.com/NousResearch/hermes-agent/issues/1006), [#1330](https://github.com/NousResearch/hermes-agent/pull/1330))
+- **Setup writing provider before model selection** (Issue [#1182](https://github.com/NousResearch/hermes-agent/issues/1182))
+- **`GatewayConfig.get()` AttributeError** crashing all message handling (Issue [#1158](https://github.com/NousResearch/hermes-agent/issues/1158), [#1287](https://github.com/NousResearch/hermes-agent/pull/1287))
+- **`/update` hard-failing with "command not found"** (Issue [#1049](https://github.com/NousResearch/hermes-agent/issues/1049))
+- **Image analysis failing silently** (Issue [#1034](https://github.com/NousResearch/hermes-agent/issues/1034), [#1338](https://github.com/NousResearch/hermes-agent/pull/1338))
+- **API `BadRequestError` from `'dict'` object has no attribute `'strip'`** (Issue [#1071](https://github.com/NousResearch/hermes-agent/issues/1071))
+- **Slash commands requiring exact full name** — now uses prefix matching (Issue [#928](https://github.com/NousResearch/hermes-agent/issues/928), [#1320](https://github.com/NousResearch/hermes-agent/pull/1320))
+- **Gateway stops responding when terminal is closed on headless** (Issue [#1005](https://github.com/NousResearch/hermes-agent/issues/1005))
+
+---
+
+## 🧪 Testing
+
+- Cover empty cached Anthropic tool-call turns ([#1222](https://github.com/NousResearch/hermes-agent/pull/1222))
+- Fix stale CI assumptions in parser and quick-command coverage ([#1236](https://github.com/NousResearch/hermes-agent/pull/1236))
+- Fix gateway async tests without implicit event loop ([#1278](https://github.com/NousResearch/hermes-agent/pull/1278))
+- Make gateway async tests xdist-safe ([#1281](https://github.com/NousResearch/hermes-agent/pull/1281))
+- Cross-timezone naive timestamp regression for cron ([#1319](https://github.com/NousResearch/hermes-agent/pull/1319))
+- Isolate codex provider tests from local env ([#1335](https://github.com/NousResearch/hermes-agent/pull/1335))
+- Lock retry replacement semantics ([#1379](https://github.com/NousResearch/hermes-agent/pull/1379))
+- Improve error logging in session search tool — by @aydnOktay ([#1533](https://github.com/NousResearch/hermes-agent/pull/1533))
+
+---
+
+## 📚 Documentation
+
+- Comprehensive SOUL.md guide ([#1315](https://github.com/NousResearch/hermes-agent/pull/1315))
+- Voice mode documentation ([#1316](https://github.com/NousResearch/hermes-agent/pull/1316), [#1362](https://github.com/NousResearch/hermes-agent/pull/1362))
+- Provider contribution guide ([#1361](https://github.com/NousResearch/hermes-agent/pull/1361))
+- ACP and internal systems implementation guides ([#1259](https://github.com/NousResearch/hermes-agent/pull/1259))
+- Expand Docusaurus coverage across CLI, tools, skills, and skins ([#1232](https://github.com/NousResearch/hermes-agent/pull/1232))
+- Terminal backend and Windows troubleshooting ([#1297](https://github.com/NousResearch/hermes-agent/pull/1297))
+- Skills hub reference section ([#1317](https://github.com/NousResearch/hermes-agent/pull/1317))
+- Checkpoint, /rollback, and git worktrees guide ([#1493](https://github.com/NousResearch/hermes-agent/pull/1493), [#1524](https://github.com/NousResearch/hermes-agent/pull/1524))
+- CLI status bar and /usage reference ([#1523](https://github.com/NousResearch/hermes-agent/pull/1523))
+- Fallback providers + /background command docs ([#1430](https://github.com/NousResearch/hermes-agent/pull/1430))
+- Gateway service scopes docs ([#1378](https://github.com/NousResearch/hermes-agent/pull/1378))
+- Slack thread reply behavior docs ([#1407](https://github.com/NousResearch/hermes-agent/pull/1407))
+- Redesigned landing page with Nous blue palette — by @austinpickett ([#974](https://github.com/NousResearch/hermes-agent/pull/974))
+- Fix several documentation typos — by @JackTheGit ([#953](https://github.com/NousResearch/hermes-agent/pull/953))
+- Stabilize website diagrams ([#1405](https://github.com/NousResearch/hermes-agent/pull/1405))
+- CLI vs messaging quick reference in README ([#1491](https://github.com/NousResearch/hermes-agent/pull/1491))
+- Add search to Docusaurus ([#1053](https://github.com/NousResearch/hermes-agent/pull/1053))
+- Home Assistant integration docs ([#1170](https://github.com/NousResearch/hermes-agent/pull/1170))
+
+---
+
+## 👥 Contributors
+
+### Core
+- **@teknium1** — 220+ PRs spanning every area of the codebase
+
+### Top Community Contributors
+
+- **@0xbyt4** (4 PRs) — Anthropic adapter fixes (max_tokens, fallback crash, 429/529 retry), Slack file upload thread context, setup NameError fix
+- **@erosika** (1 PR) — Honcho memory integration: async writes, memory modes, session title integration
+- **@SHL0MS** (2 PRs) — ASCII video skill design patterns and refactoring
+- **@alt-glitch** (2 PRs) — Persistent shell mode for local/SSH backends, setuptools packaging fix
+- **@arceus77-7** (2 PRs) — 1Password skill, fix skills list mislabeling
+- **@kshitijk4poor** (1 PR) — OpenClaw migration during setup wizard
+- **@ASRagab** (1 PR) — Fix adaptive thinking for Claude 4.6 models
+- **@eren-karakus0** (1 PR) — Strip Hermes provider env vars from subprocess environment
+- **@mr-emmett-one** (1 PR) — Fix DeepSeek V3 parser multi-tool call support
+- **@jplew** (1 PR) — Gateway restart on retryable startup failures
+- **@brandtcormorant** (1 PR) — Fix Anthropic cache control for empty text blocks
+- **@aydnOktay** (1 PR) — Improve error logging in session search tool
+- **@austinpickett** (1 PR) — Landing page redesign with Nous blue palette
+- **@JackTheGit** (1 PR) — Documentation typo fixes
+
+### All Contributors
+
+@0xbyt4, @alt-glitch, @arceus77-7, @ASRagab, @austinpickett, @aydnOktay, @brandtcormorant, @eren-karakus0, @erosika, @JackTheGit, @jplew, @kshitijk4poor, @mr-emmett-one, @SHL0MS, @teknium1
+
+---
+
+**Full Changelog**: [v2026.3.12...v2026.3.17](https://github.com/NousResearch/hermes-agent/compare/v2026.3.12...v2026.3.17)
@@ -42,7 +42,7 @@ from acp_adapter.events import (
    make_tool_progress_cb,
 )
 from acp_adapter.permissions import make_approval_callback
-from acp_adapter.session import SessionManager
+from acp_adapter.session import SessionManager, SessionState

 logger = logging.getLogger(__name__)

@@ -226,10 +226,19 @@ class HermesACPAgent(acp.Agent):
            logger.error("prompt: session %s not found", session_id)
            return PromptResponse(stop_reason="refusal")

-        user_text = _extract_text(prompt)
-        if not user_text.strip():
+        user_text = _extract_text(prompt).strip()
+        if not user_text:
            return PromptResponse(stop_reason="end_turn")

+        # Intercept slash commands — handle locally without calling the LLM
+        if user_text.startswith("/"):
+            response_text = self._handle_slash_command(user_text, state)
+            if response_text is not None:
+                if self._conn:
+                    update = acp.update_agent_message_text(response_text)
+                    await self._conn.session_update(session_id, update)
+                return PromptResponse(stop_reason="end_turn")
+
        logger.info("Prompt on session %s: %s", session_id, user_text[:100])

        conn = self._conn
@@ -315,12 +324,149 @@ class HermesACPAgent(acp.Agent):
        stop_reason = "cancelled" if state.cancel_event and state.cancel_event.is_set() else "end_turn"
        return PromptResponse(stop_reason=stop_reason, usage=usage)

-    # ---- Model switching ----------------------------------------------------
+    # ---- Slash commands (headless) -------------------------------------------
+
+    _SLASH_COMMANDS = {
+        "help": "Show available commands",
+        "model": "Show or change current model",
+        "tools": "List available tools",
+        "context": "Show conversation context info",
+        "reset": "Clear conversation history",
+        "compact": "Compress conversation context",
+        "version": "Show Hermes version",
+    }
+
+    def _handle_slash_command(self, text: str, state: SessionState) -> str | None:
+        """Dispatch a slash command and return the response text.
+
+        Returns ``None`` for unrecognized commands so they fall through
+        to the LLM (the user may have typed ``/something`` as prose).
+        """
+        parts = text.split(maxsplit=1)
+        cmd = parts[0].lstrip("/").lower()
+        args = parts[1].strip() if len(parts) > 1 else ""
+
+        handler = {
+            "help": self._cmd_help,
+            "model": self._cmd_model,
+            "tools": self._cmd_tools,
+            "context": self._cmd_context,
+            "reset": self._cmd_reset,
+            "compact": self._cmd_compact,
+            "version": self._cmd_version,
+        }.get(cmd)
+
+        if handler is None:
+            return None  # not a known command — let the LLM handle it
+
+        try:
+            return handler(args, state)
+        except Exception as e:
+            logger.error("Slash command /%s error: %s", cmd, e, exc_info=True)
+            return f"Error executing /{cmd}: {e}"
+
+    def _cmd_help(self, args: str, state: SessionState) -> str:
+        lines = ["Available commands:", ""]
+        for cmd, desc in self._SLASH_COMMANDS.items():
+            lines.append(f"  /{cmd:10s}  {desc}")
+        lines.append("")
+        lines.append("Unrecognized /commands are sent to the model as normal messages.")
+        return "\n".join(lines)
+
+    def _cmd_model(self, args: str, state: SessionState) -> str:
+        if not args:
+            model = state.model or getattr(state.agent, "model", "unknown")
+            provider = getattr(state.agent, "provider", None) or "auto"
+            return f"Current model: {model}\nProvider: {provider}"
+
+        new_model = args.strip()
+        target_provider = None
+
+        # Auto-detect provider for the requested model
+        try:
+            from hermes_cli.models import parse_model_input, detect_provider_for_model
+            current_provider = getattr(state.agent, "provider", None) or "openrouter"
+            target_provider, new_model = parse_model_input(new_model, current_provider)
+            if target_provider == current_provider:
+                detected = detect_provider_for_model(new_model, current_provider)
+                if detected:
+                    target_provider, new_model = detected
+        except Exception:
+            logger.debug("Provider detection failed, using model as-is", exc_info=True)
+
+        state.model = new_model
+        state.agent = self.session_manager._make_agent(
+            session_id=state.session_id,
+            cwd=state.cwd,
+            model=new_model,
+        )
+        provider_label = target_provider or getattr(state.agent, "provider", "auto")
+        logger.info("Session %s: model switched to %s", state.session_id, new_model)
+        return f"Model switched to: {new_model}\nProvider: {provider_label}"
+
+    def _cmd_tools(self, args: str, state: SessionState) -> str:
+        try:
+            from model_tools import get_tool_definitions
+            toolsets = getattr(state.agent, "enabled_toolsets", None) or ["hermes-acp"]
+            tools = get_tool_definitions(enabled_toolsets=toolsets, quiet_mode=True)
+            if not tools:
+                return "No tools available."
+            lines = [f"Available tools ({len(tools)}):"]
+            for t in tools:
+                name = t.get("function", {}).get("name", "?")
+                desc = t.get("function", {}).get("description", "")
+                # Truncate long descriptions
+                if len(desc) > 80:
+                    desc = desc[:77] + "..."
+                lines.append(f"  {name}: {desc}")
+            return "\n".join(lines)
+        except Exception as e:
+            return f"Could not list tools: {e}"
+
+    def _cmd_context(self, args: str, state: SessionState) -> str:
+        n_messages = len(state.history)
+        if n_messages == 0:
+            return "Conversation is empty (no messages yet)."
+        # Count by role
+        roles: dict[str, int] = {}
+        for msg in state.history:
+            role = msg.get("role", "unknown")
+            roles[role] = roles.get(role, 0) + 1
+        lines = [
+            f"Conversation: {n_messages} messages",
+            f"  user: {roles.get('user', 0)}, assistant: {roles.get('assistant', 0)}, "
+            f"tool: {roles.get('tool', 0)}, system: {roles.get('system', 0)}",
+        ]
+        model = state.model or getattr(state.agent, "model", "")
+        if model:
+            lines.append(f"Model: {model}")
+        return "\n".join(lines)
+
+    def _cmd_reset(self, args: str, state: SessionState) -> str:
+        state.history.clear()
+        return "Conversation history cleared."
+
+    def _cmd_compact(self, args: str, state: SessionState) -> str:
+        if not state.history:
+            return "Nothing to compress — conversation is empty."
+        try:
+            agent = state.agent
+            if hasattr(agent, "compress_context"):
+                agent.compress_context(state.history)
+                return f"Context compressed. Messages: {len(state.history)}"
+            return "Context compression not available for this agent."
+        except Exception as e:
+            return f"Compression failed: {e}"
+
+    def _cmd_version(self, args: str, state: SessionState) -> str:
+        return f"Hermes Agent v{HERMES_VERSION}"
+
+    # ---- Model switching (ACP protocol method) -------------------------------

    async def set_session_model(
        self, model_id: str, session_id: str, **kwargs: Any
    ):
-        """Switch the model for a session."""
+        """Switch the model for a session (called by ACP protocol)."""
        state = self.session_manager.get_session(session_id)
        if state:
            state.model = model_id
@@ -45,14 +45,19 @@ _COMMON_BETAS = [
    "fine-grained-tool-streaming-2025-05-14",
 ]

-# Additional beta headers required for OAuth/subscription auth
-# Both clawdbot and OpenCode include claude-code-20250219 alongside oauth-2025-04-20.
-# Without claude-code-20250219, Anthropic's API rejects OAuth tokens with 401.
+# Additional beta headers required for OAuth/subscription auth.
+# Matches what Claude Code (and pi-ai / OpenCode) send.
 _OAUTH_ONLY_BETAS = [
    "claude-code-20250219",
    "oauth-2025-04-20",
 ]

+# Claude Code identity — required for OAuth requests to be routed correctly.
+# Without these, Anthropic's infrastructure intermittently 500s OAuth traffic.
+_CLAUDE_CODE_VERSION = "2.1.2"
+_CLAUDE_CODE_SYSTEM_PREFIX = "You are Claude Code, Anthropic's official CLI for Claude."
+_MCP_TOOL_PREFIX = "mcp_"
+

 def _is_oauth_token(key: str) -> bool:
    """Check if the key is an OAuth/setup token (not a regular Console API key).
@@ -88,10 +93,16 @@ def build_anthropic_client(api_key: str, base_url: str = None):
        kwargs["base_url"] = base_url

    if _is_oauth_token(api_key):
-        # OAuth access token / setup-token → Bearer auth + beta headers
+        # OAuth access token / setup-token → Bearer auth + Claude Code identity.
+        # Anthropic routes OAuth requests based on user-agent and headers;
+        # without Claude Code's fingerprint, requests get intermittent 500s.
        all_betas = _COMMON_BETAS + _OAUTH_ONLY_BETAS
        kwargs["auth_token"] = api_key
-        kwargs["default_headers"] = {"anthropic-beta": ",".join(all_betas)}
+        kwargs["default_headers"] = {
+            "anthropic-beta": ",".join(all_betas),
+            "user-agent": f"claude-cli/{_CLAUDE_CODE_VERSION} (external, cli)",
+            "x-app": "cli",
+        }
    else:
        # Regular API key → x-api-key header + common betas
        kwargs["api_key"] = api_key
@@ -189,7 +200,10 @@ def _refresh_oauth_token(creds: Dict[str, Any]) -> Optional[str]:
    req = urllib.request.Request(
        "https://console.anthropic.com/v1/oauth/token",
        data=data,
-        headers={"Content-Type": "application/x-www-form-urlencoded"},
+        headers={
+            "Content-Type": "application/x-www-form-urlencoded",
+            "User-Agent": f"claude-cli/{_CLAUDE_CODE_VERSION} (external, cli)",
+        },
        method="POST",
    )

@@ -332,12 +346,24 @@ def resolve_anthropic_token() -> Optional[str]:
            return preferred
        return cc_token

-    # 3. Claude Code credential file
+    # 3. Hermes-managed OAuth credentials (~/.hermes/.anthropic_oauth.json)
+    hermes_creds = read_hermes_oauth_credentials()
+    if hermes_creds:
+        if is_claude_code_token_valid(hermes_creds):
+            logger.debug("Using Hermes-managed OAuth credentials")
+            return hermes_creds["accessToken"]
+        # Expired — try refresh
+        logger.debug("Hermes OAuth token expired — attempting refresh")
+        refreshed = refresh_hermes_oauth_token()
+        if refreshed:
+            return refreshed
+
+    # 4. Claude Code credential file
    resolved_claude_token = _resolve_claude_code_token_from_credentials(creds)
    if resolved_claude_token:
        return resolved_claude_token

-    # 4. Regular API key, or a legacy OAuth token saved in ANTHROPIC_API_KEY.
+    # 5. Regular API key, or a legacy OAuth token saved in ANTHROPIC_API_KEY.
    # This remains as a compatibility fallback for pre-migration Hermes configs.
    api_key = os.getenv("ANTHROPIC_API_KEY", "").strip()
    if api_key:
@@ -386,6 +412,215 @@ def run_oauth_setup_token() -> Optional[str]:
    return None


+# ── Hermes-native PKCE OAuth flow ────────────────────────────────────────
+# Mirrors the flow used by Claude Code, pi-ai, and OpenCode.
+# Stores credentials in ~/.hermes/.anthropic_oauth.json (our own file).
+
+_OAUTH_CLIENT_ID = "9d1c250a-e61b-44d9-88ed-5944d1962f5e"
+_OAUTH_TOKEN_URL = "https://console.anthropic.com/v1/oauth/token"
+_OAUTH_REDIRECT_URI = "https://console.anthropic.com/oauth/code/callback"
+_OAUTH_SCOPES = "org:create_api_key user:profile user:inference"
+_HERMES_OAUTH_FILE = Path(os.getenv("HERMES_HOME", str(Path.home() / ".hermes"))) / ".anthropic_oauth.json"
+
+
+def _generate_pkce() -> tuple:
+    """Generate PKCE code_verifier and code_challenge (S256)."""
+    import base64
+    import hashlib
+    import secrets
+
+    verifier = base64.urlsafe_b64encode(secrets.token_bytes(32)).rstrip(b"=").decode()
+    challenge = base64.urlsafe_b64encode(
+        hashlib.sha256(verifier.encode()).digest()
+    ).rstrip(b"=").decode()
+    return verifier, challenge
+
+
+def run_hermes_oauth_login() -> Optional[str]:
+    """Run Hermes-native OAuth PKCE flow for Claude Pro/Max subscription.
+
+    Opens a browser to claude.ai for authorization, prompts for the code,
+    exchanges it for tokens, and stores them in ~/.hermes/.anthropic_oauth.json.
+
+    Returns the access token on success, None on failure.
+    """
+    import time
+    import webbrowser
+
+    verifier, challenge = _generate_pkce()
+
+    # Build authorization URL
+    params = {
+        "code": "true",
+        "client_id": _OAUTH_CLIENT_ID,
+        "response_type": "code",
+        "redirect_uri": _OAUTH_REDIRECT_URI,
+        "scope": _OAUTH_SCOPES,
+        "code_challenge": challenge,
+        "code_challenge_method": "S256",
+        "state": verifier,
+    }
+    from urllib.parse import urlencode
+    auth_url = f"https://claude.ai/oauth/authorize?{urlencode(params)}"
+
+    print()
+    print("Authorize Hermes with your Claude Pro/Max subscription.")
+    print()
+    print("╭─ Claude Pro/Max Authorization ────────────────────╮")
+    print("│                                                   │")
+    print("│  Open this link in your browser:                  │")
+    print("╰───────────────────────────────────────────────────╯")
+    print()
+    print(f"  {auth_url}")
+    print()
+
+    # Try to open browser automatically (works on desktop, silently fails on headless/SSH)
+    try:
+        webbrowser.open(auth_url)
+        print("  (Browser opened automatically)")
+    except Exception:
+        pass
+
+    print()
+    print("After authorizing, you'll see a code. Paste it below.")
+    print()
+    try:
+        auth_code = input("Authorization code: ").strip()
+    except (KeyboardInterrupt, EOFError):
+        return None
+
+    if not auth_code:
+        print("No code entered.")
+        return None
+
+    # Split code#state format
+    splits = auth_code.split("#")
+    code = splits[0]
+    state = splits[1] if len(splits) > 1 else ""
+
+    # Exchange code for tokens
+    try:
+        import urllib.request
+        exchange_data = json.dumps({
+            "grant_type": "authorization_code",
+            "client_id": _OAUTH_CLIENT_ID,
+            "code": code,
+            "state": state,
+            "redirect_uri": _OAUTH_REDIRECT_URI,
+            "code_verifier": verifier,
+        }).encode()
+
+        req = urllib.request.Request(
+            _OAUTH_TOKEN_URL,
+            data=exchange_data,
+            headers={
+                "Content-Type": "application/json",
+                "User-Agent": f"claude-cli/{_CLAUDE_CODE_VERSION} (external, cli)",
+            },
+            method="POST",
+        )
+
+        with urllib.request.urlopen(req, timeout=15) as resp:
+            result = json.loads(resp.read().decode())
+    except Exception as e:
+        print(f"Token exchange failed: {e}")
+        return None
+
+    access_token = result.get("access_token", "")
+    refresh_token = result.get("refresh_token", "")
+    expires_in = result.get("expires_in", 3600)
+
+    if not access_token:
+        print("No access token in response.")
+        return None
+
+    # Store credentials
+    expires_at_ms = int(time.time() * 1000) + (expires_in * 1000)
+    _save_hermes_oauth_credentials(access_token, refresh_token, expires_at_ms)
+
+    # Also write to Claude Code's credential file for backward compat
+    _write_claude_code_credentials(access_token, refresh_token, expires_at_ms)
+
+    print("Authentication successful!")
+    return access_token
+
+
+def _save_hermes_oauth_credentials(access_token: str, refresh_token: str, expires_at_ms: int) -> None:
+    """Save OAuth credentials to ~/.hermes/.anthropic_oauth.json."""
+    data = {
+        "accessToken": access_token,
+        "refreshToken": refresh_token,
+        "expiresAt": expires_at_ms,
+    }
+    try:
+        _HERMES_OAUTH_FILE.parent.mkdir(parents=True, exist_ok=True)
+        _HERMES_OAUTH_FILE.write_text(json.dumps(data, indent=2), encoding="utf-8")
+        _HERMES_OAUTH_FILE.chmod(0o600)
+    except (OSError, IOError) as e:
+        logger.debug("Failed to save Hermes OAuth credentials: %s", e)
+
+
+def read_hermes_oauth_credentials() -> Optional[Dict[str, Any]]:
+    """Read Hermes-managed OAuth credentials from ~/.hermes/.anthropic_oauth.json."""
+    if _HERMES_OAUTH_FILE.exists():
+        try:
+            data = json.loads(_HERMES_OAUTH_FILE.read_text(encoding="utf-8"))
+            if data.get("accessToken"):
+                return data
+        except (json.JSONDecodeError, OSError, IOError) as e:
+            logger.debug("Failed to read Hermes OAuth credentials: %s", e)
+    return None
+
+
+def refresh_hermes_oauth_token() -> Optional[str]:
+    """Refresh the Hermes-managed OAuth token using the stored refresh token.
+
+    Returns the new access token, or None if refresh fails.
+    """
+    import time
+    import urllib.request
+
+    creds = read_hermes_oauth_credentials()
+    if not creds or not creds.get("refreshToken"):
+        return None
+
+    try:
+        data = json.dumps({
+            "grant_type": "refresh_token",
+            "refresh_token": creds["refreshToken"],
+            "client_id": _OAUTH_CLIENT_ID,
+        }).encode()
+
+        req = urllib.request.Request(
+            _OAUTH_TOKEN_URL,
+            data=data,
+            headers={
+                "Content-Type": "application/json",
+                "User-Agent": f"claude-cli/{_CLAUDE_CODE_VERSION} (external, cli)",
+            },
+            method="POST",
+        )
+
+        with urllib.request.urlopen(req, timeout=10) as resp:
+            result = json.loads(resp.read().decode())
+
+        new_access = result.get("access_token", "")
+        new_refresh = result.get("refresh_token", creds["refreshToken"])
+        expires_in = result.get("expires_in", 3600)
+
+        if new_access:
+            new_expires_ms = int(time.time() * 1000) + (expires_in * 1000)
+            _save_hermes_oauth_credentials(new_access, new_refresh, new_expires_ms)
+            # Also update Claude Code's credential file
+            _write_claude_code_credentials(new_access, new_refresh, new_expires_ms)
+            logger.debug("Successfully refreshed Hermes OAuth token")
+            return new_access
+    except Exception as e:
+        logger.debug("Failed to refresh Hermes OAuth token: %s", e)
+
+    return None
+
+
 # ---------------------------------------------------------------------------
 # Message / tool / response format conversion
 # ---------------------------------------------------------------------------
@@ -714,14 +949,59 @@ def build_anthropic_kwargs(
    max_tokens: Optional[int],
    reasoning_config: Optional[Dict[str, Any]],
    tool_choice: Optional[str] = None,
+    is_oauth: bool = False,
 ) -> Dict[str, Any]:
-    """Build kwargs for anthropic.messages.create()."""
+    """Build kwargs for anthropic.messages.create().
+
+    When *is_oauth* is True, applies Claude Code compatibility transforms:
+    system prompt prefix, tool name prefixing, and prompt sanitization.
+    """
    system, anthropic_messages = convert_messages_to_anthropic(messages)
    anthropic_tools = convert_tools_to_anthropic(tools) if tools else []

    model = normalize_model_name(model)
    effective_max_tokens = max_tokens or 16384

+    # ── OAuth: Claude Code identity ──────────────────────────────────
+    if is_oauth:
+        # 1. Prepend Claude Code system prompt identity
+        cc_block = {"type": "text", "text": _CLAUDE_CODE_SYSTEM_PREFIX}
+        if isinstance(system, list):
+            system = [cc_block] + system
+        elif isinstance(system, str) and system:
+            system = [cc_block, {"type": "text", "text": system}]
+        else:
+            system = [cc_block]
+
+        # 2. Sanitize system prompt — replace product name references
+        #    to avoid Anthropic's server-side content filters.
+        for block in system:
+            if isinstance(block, dict) and block.get("type") == "text":
+                text = block.get("text", "")
+                text = text.replace("Hermes Agent", "Claude Code")
+                text = text.replace("Hermes agent", "Claude Code")
+                text = text.replace("hermes-agent", "claude-code")
+                text = text.replace("Nous Research", "Anthropic")
+                block["text"] = text
+
+        # 3. Prefix tool names with mcp_ (Claude Code convention)
+        if anthropic_tools:
+            for tool in anthropic_tools:
+                if "name" in tool:
+                    tool["name"] = _MCP_TOOL_PREFIX + tool["name"]
+
+        # 4. Prefix tool names in message history (tool_use and tool_result blocks)
+        for msg in anthropic_messages:
+            content = msg.get("content")
+            if isinstance(content, list):
+                for block in content:
+                    if isinstance(block, dict):
+                        if block.get("type") == "tool_use" and "name" in block:
+                            if not block["name"].startswith(_MCP_TOOL_PREFIX):
+                                block["name"] = _MCP_TOOL_PREFIX + block["name"]
+                        elif block.get("type") == "tool_result" and "tool_use_id" in block:
+                            pass  # tool_result uses ID, not name
+
    kwargs: Dict[str, Any] = {
        "model": model,
        "messages": anthropic_messages,
@@ -768,11 +1048,15 @@ def build_anthropic_kwargs(

 def normalize_anthropic_response(
    response,
+    strip_tool_prefix: bool = False,
 ) -> Tuple[SimpleNamespace, str]:
    """Normalize Anthropic response to match the shape expected by AIAgent.

    Returns (assistant_message, finish_reason) where assistant_message has
    .content, .tool_calls, and .reasoning attributes.
+
+    When *strip_tool_prefix* is True, removes the ``mcp_`` prefix that was
+    added to tool names for OAuth Claude Code compatibility.
    """
    text_parts = []
    reasoning_parts = []
@@ -784,12 +1068,15 @@ def normalize_anthropic_response(
        elif block.type == "thinking":
            reasoning_parts.append(block.thinking)
        elif block.type == "tool_use":
+            name = block.name
+            if strip_tool_prefix and name.startswith(_MCP_TOOL_PREFIX):
+                name = name[len(_MCP_TOOL_PREFIX):]
            tool_calls.append(
                SimpleNamespace(
                    id=block.id,
                    type="function",
                    function=SimpleNamespace(
-                        name=block.name,
+                        name=name,
                        arguments=json.dumps(block.input),
                    ),
                )
@@ -57,6 +57,7 @@ _API_KEY_PROVIDER_AUX_MODELS: Dict[str, str] = {
    "minimax": "MiniMax-M2.5-highspeed",
    "minimax-cn": "MiniMax-M2.5-highspeed",
    "anthropic": "claude-haiku-4-5-20251001",
+    "ai-gateway": "google/gemini-3-flash",
 }

 # OpenRouter app attribution headers
@@ -40,6 +40,8 @@ DEFAULT_CONTEXT_LENGTHS = {
    "anthropic/claude-opus-4.6": 200000,
    "anthropic/claude-sonnet-4": 200000,
    "anthropic/claude-sonnet-4-20250514": 200000,
+    "anthropic/claude-sonnet-4.5": 200000,
+    "anthropic/claude-sonnet-4.6": 200000,
    "anthropic/claude-haiku-4.5": 200000,
    # Bare Anthropic model IDs (for native API provider)
    "claude-opus-4-6": 200000,
@@ -50,11 +52,18 @@ DEFAULT_CONTEXT_LENGTHS = {
    "claude-opus-4-20250514": 200000,
    "claude-sonnet-4-20250514": 200000,
    "claude-haiku-4-5-20251001": 200000,
+    "openai/gpt-5": 128000,
+    "openai/gpt-4.1": 1047576,
+    "openai/gpt-4.1-mini": 1047576,
    "openai/gpt-4o": 128000,
    "openai/gpt-4-turbo": 128000,
    "openai/gpt-4o-mini": 128000,
+    "google/gemini-3-pro-preview": 1048576,
+    "google/gemini-3-flash": 1048576,
+    "google/gemini-2.5-flash": 1048576,
    "google/gemini-2.0-flash": 1048576,
    "google/gemini-2.5-pro": 1048576,
+    "deepseek/deepseek-v3.2": 65536,
    "meta-llama/llama-3.3-70b-instruct": 131072,
    "deepseek/deepseek-chat-v3": 65536,
    "qwen/qwen-2.5-72b-instruct": 32768,
@@ -73,9 +73,15 @@ DEFAULT_AGENT_IDENTITY = (
 MEMORY_GUIDANCE = (
    "You have persistent memory across sessions. Save durable facts using the memory "
    "tool: user preferences, environment details, tool quirks, and stable conventions. "
-    "Memory is injected into every turn, so keep it compact. Do NOT save task progress, "
-    "session outcomes, or completed-work logs to memory; use session_search to recall "
-    "those from past transcripts."
+    "Memory is injected into every turn, so keep it compact and focused on facts that "
+    "will still matter later.\n"
+    "Prioritize what reduces future user steering — the most valuable memory is one "
+    "that prevents the user from having to correct or remind you again. "
+    "User preferences and recurring corrections matter more than procedural task details.\n"
+    "Do NOT save task progress, session outcomes, completed-work logs, or temporary TODO "
+    "state to memory; use session_search to recall those from past transcripts. "
+    "If you've discovered a new way to do something, solved a problem that could be "
+    "necessary later, save it as a skill with the skill tool."
 )

 SESSION_SEARCH_GUIDANCE = (
@@ -86,8 +92,11 @@ SESSION_SEARCH_GUIDANCE = (

 SKILLS_GUIDANCE = (
    "After completing a complex task (5+ tool calls), fixing a tricky error, "
-    "or discovering a non-trivial workflow, consider saving the approach as a "
-    "skill with skill_manage so you can reuse it next time."
+    "or discovering a non-trivial workflow, save the approach as a "
+    "skill with skill_manage so you can reuse it next time.\n"
+    "When using a skill and finding it outdated, incomplete, or wrong, "
+    "patch it immediately with skill_manage(action='patch') — don't wait to be asked. "
+    "Skills that aren't maintained become liabilities."
 )

 PLATFORM_HINTS = {
@@ -326,6 +335,9 @@ def build_skills_system_prompt(
        "Before replying, scan the skills below. If one clearly matches your task, "
        "load it with skill_view(name) and follow its instructions. "
        "If a skill has issues, fix it with skill_manage(action='patch').\n"
+        "After difficult/iterative tasks, offer to save as a skill. "
+        "If a skill you loaded was missing steps, had wrong commands, or needed "
+        "pitfalls you discovered, update it before finishing.\n"
        "\n"
        "<available_skills>\n"
        + "\n".join(index_lines) + "\n"
@@ -355,6 +355,19 @@ session_reset:
 # explicitly want one shared "room brain" per group/channel.
 group_sessions_per_user: true

+# ─────────────────────────────────────────────────────────────────────────────
+# Gateway Streaming
+# ─────────────────────────────────────────────────────────────────────────────
+# Stream tokens to messaging platforms in real-time. The bot sends a message
+# on first token, then progressively edits it as more tokens arrive.
+# Disabled by default — enable to try the streaming UX on Telegram/Discord/Slack.
+streaming:
+  enabled: false
+  # transport: edit           # "edit" = progressive editMessageText
+  # edit_interval: 0.3        # seconds between message edits
+  # buffer_threshold: 40      # chars before forcing an edit flush
+  # cursor: " ▉"              # cursor shown during streaming
+
 # =============================================================================
 # Skills Configuration
 # =============================================================================
@@ -716,6 +729,12 @@ display:
  # Toggle at runtime with /reasoning show or /reasoning hide.
  show_reasoning: false

+  # Stream tokens to the terminal as they arrive instead of waiting for the
+  # full response. The response box opens on first token and text appears
+  # line-by-line. Tool calls are still captured silently.
+  # Disabled by default — enable to try the streaming UX.
+  streaming: false
+
  # ───────────────────────────────────────────────────────────────────────────
  # Skin / Theme
  # ───────────────────────────────────────────────────────────────────────────
@@ -756,3 +775,14 @@ display:
  #   tool_prefix: "╎"                       # Tool output line prefix (default: ┊)
  #
  skin: default
+
+# =============================================================================
+# Privacy
+# =============================================================================
+# privacy:
+#   # Redact PII from the LLM context prompt.
+#   # When true, phone numbers are stripped and user/chat IDs are replaced
+#   # with deterministic hashes before being sent to the model.
+#   # Names and usernames are NOT affected (user-chosen, publicly visible).
+#   # Routing/delivery still uses the original values internally.
+#   redact_pii: false
@@ -210,6 +210,8 @@ def load_cli_config() -> Dict[str, Any]:
            "compact": False,
            "resume_display": "full",
            "show_reasoning": False,
+            "streaming": False,
+            "show_cost": False,
            "skin": "default",
        },
        "clarify": {
@@ -401,7 +403,13 @@ def load_cli_config() -> Dict[str, Any]:
            "provider": "AUXILIARY_WEB_EXTRACT_PROVIDER",
            "model": "AUXILIARY_WEB_EXTRACT_MODEL",
            "base_url": "AUXILIARY_WEB_EXTRACT_BASE_URL",
-            "api_key": "AUXILIARY_WEB_EXTRACT_API_KEY",
+            "api_key": "AUXILI..._KEY",
+        },
+        "approval": {
+            "provider": "AUXILIARY_APPROVAL_PROVIDER",
+            "model": "AUXILIARY_APPROVAL_MODEL",
+            "base_url": "AUXILIARY_APPROVAL_BASE_URL",
+            "api_key": "AUXILIARY_APPROVAL_API_KEY",
        },
    }
    
@@ -460,7 +468,7 @@ from hermes_cli.banner import (
    VERSION, RELEASE_DATE, HERMES_AGENT_LOGO, HERMES_CADUCEUS, COMPACT_BANNER,
    build_welcome_banner,
 )
-from hermes_cli.commands import COMMANDS, SlashCommandCompleter
+from hermes_cli.commands import COMMANDS, SlashCommandCompleter, SlashCommandAutoSuggest
 from hermes_cli import callbacks as _callbacks
 from toolsets import get_all_toolsets, get_toolset_info, resolve_toolset, validate_toolset

@@ -1023,8 +1031,18 @@ class HermesCLI:
        self.bell_on_complete = CLI_CONFIG["display"].get("bell_on_complete", False)
        # show_reasoning: display model thinking/reasoning before the response
        self.show_reasoning = CLI_CONFIG["display"].get("show_reasoning", False)
+        # show_cost: display $ cost in the status bar (off by default)
+        self.show_cost = CLI_CONFIG["display"].get("show_cost", False)
        self.verbose = verbose if verbose is not None else (self.tool_progress_mode == "verbose")
        
+        # streaming: stream tokens to the terminal as they arrive (display.streaming in config.yaml)
+        self.streaming_enabled = CLI_CONFIG["display"].get("streaming", False)
+
+        # Streaming display state
+        self._stream_buf = ""        # Partial line buffer for line-buffered rendering
+        self._stream_started = False  # True once first delta arrives
+        self._stream_box_opened = False  # True once the response box header is printed
+        
        # Configuration - priority: CLI args > env vars > config file
        # Model comes from: CLI arg or config.yaml (single source of truth).
        # LLM_MODEL/OPENAI_MODEL env vars are NOT checked — config.yaml is
@@ -1280,13 +1298,22 @@ class HermesCLI:
            width = width or shutil.get_terminal_size((80, 24)).columns
            percent = snapshot["context_percent"]
            percent_label = f"{percent}%" if percent is not None else "--"
-            cost_label = f"${snapshot['session_cost']:.2f}" if snapshot["pricing_known"] else "cost n/a"
            duration_label = snapshot["duration"]
+            show_cost = getattr(self, "show_cost", False)
+
+            if show_cost:
+                cost_label = f"${snapshot['session_cost']:.2f}" if snapshot["pricing_known"] else "cost n/a"
+            else:
+                cost_label = None

            if width < 52:
                return f"⚕ {snapshot['model_short']} · {duration_label}"
            if width < 76:
-                return f"⚕ {snapshot['model_short']} · {percent_label} · {cost_label} · {duration_label}"
+                parts = [f"⚕ {snapshot['model_short']}", percent_label]
+                if cost_label:
+                    parts.append(cost_label)
+                parts.append(duration_label)
+                return " · ".join(parts)

            if snapshot["context_length"]:
                ctx_total = _format_context_length(snapshot["context_length"])
@@ -1295,7 +1322,11 @@ class HermesCLI:
            else:
                context_label = "ctx --"

-            return f"⚕ {snapshot['model_short']} │ {context_label} │ {percent_label} │ {cost_label} │ {duration_label}"
+            parts = [f"⚕ {snapshot['model_short']}", context_label, percent_label]
+            if cost_label:
+                parts.append(cost_label)
+            parts.append(duration_label)
+            return " │ ".join(parts)
        except Exception:
            return f"⚕ {self.model if getattr(self, 'model', None) else 'Hermes'}"

@@ -1303,8 +1334,13 @@ class HermesCLI:
        try:
            snapshot = self._get_status_bar_snapshot()
            width = shutil.get_terminal_size((80, 24)).columns
-            cost_label = f"${snapshot['session_cost']:.2f}" if snapshot["pricing_known"] else "cost n/a"
            duration_label = snapshot["duration"]
+            show_cost = getattr(self, "show_cost", False)
+
+            if show_cost:
+                cost_label = f"${snapshot['session_cost']:.2f}" if snapshot["pricing_known"] else "cost n/a"
+            else:
+                cost_label = None

            if width < 52:
                return [
@@ -1318,17 +1354,23 @@ class HermesCLI:
            percent = snapshot["context_percent"]
            percent_label = f"{percent}%" if percent is not None else "--"
            if width < 76:
-                return [
+                frags = [
                    ("class:status-bar", " ⚕ "),
                    ("class:status-bar-strong", snapshot["model_short"]),
                    ("class:status-bar-dim", " · "),
                    (self._status_bar_context_style(percent), percent_label),
-                    ("class:status-bar-dim", " · "),
-                    ("class:status-bar-dim", cost_label),
+                ]
+                if cost_label:
+                    frags.extend([
+                        ("class:status-bar-dim", " · "),
+                        ("class:status-bar-dim", cost_label),
+                    ])
+                frags.extend([
                    ("class:status-bar-dim", " · "),
                    ("class:status-bar-dim", duration_label),
                    ("class:status-bar", " "),
-                ]
+                ])
+                return frags

            if snapshot["context_length"]:
                ctx_total = _format_context_length(snapshot["context_length"])
@@ -1338,7 +1380,7 @@ class HermesCLI:
                context_label = "ctx --"

            bar_style = self._status_bar_context_style(percent)
-            return [
+            frags = [
                ("class:status-bar", " ⚕ "),
                ("class:status-bar-strong", snapshot["model_short"]),
                ("class:status-bar-dim", " │ "),
@@ -1347,12 +1389,18 @@ class HermesCLI:
                (bar_style, self._build_context_bar(percent)),
                ("class:status-bar-dim", " "),
                (bar_style, percent_label),
-                ("class:status-bar-dim", " │ "),
-                ("class:status-bar-dim", cost_label),
+            ]
+            if cost_label:
+                frags.extend([
+                    ("class:status-bar-dim", " │ "),
+                    ("class:status-bar-dim", cost_label),
+                ])
+            frags.extend([
                ("class:status-bar-dim", " │ "),
                ("class:status-bar-dim", duration_label),
                ("class:status-bar", " "),
-            ]
+            ])
+            return frags
        except Exception:
            return [("class:status-bar", f" {self._build_status_bar_text()} ")]

@@ -1415,6 +1463,177 @@ class HermesCLI:
        self._spinner_text = text or ""
        self._invalidate()

+    # ── Streaming display ────────────────────────────────────────────────
+
+    def _stream_reasoning_delta(self, text: str) -> None:
+        """Stream reasoning/thinking tokens into a dim box above the response.
+
+        Opens a dim reasoning box on first token, streams line-by-line.
+        The box is closed automatically when content tokens start arriving
+        (via _stream_delta → _emit_stream_text).
+        """
+        if not text:
+            return
+
+        # Open reasoning box on first reasoning token
+        if not getattr(self, "_reasoning_box_opened", False):
+            self._reasoning_box_opened = True
+            w = shutil.get_terminal_size().columns
+            r_label = " Reasoning "
+            r_fill = w - 2 - len(r_label)
+            _cprint(f"\n{_DIM}┌─{r_label}{'─' * max(r_fill - 1, 0)}┐{_RST}")
+
+        self._reasoning_buf = getattr(self, "_reasoning_buf", "") + text
+
+        # Emit complete lines
+        while "\n" in self._reasoning_buf:
+            line, self._reasoning_buf = self._reasoning_buf.split("\n", 1)
+            _cprint(f"{_DIM}{line}{_RST}")
+
+    def _close_reasoning_box(self) -> None:
+        """Close the live reasoning box if it's open."""
+        if getattr(self, "_reasoning_box_opened", False):
+            # Flush remaining reasoning buffer
+            buf = getattr(self, "_reasoning_buf", "")
+            if buf:
+                _cprint(f"{_DIM}{buf}{_RST}")
+                self._reasoning_buf = ""
+            w = shutil.get_terminal_size().columns
+            _cprint(f"{_DIM}└{'─' * (w - 2)}┘{_RST}")
+            self._reasoning_box_opened = False
+
+    def _stream_delta(self, text: str) -> None:
+        """Line-buffered streaming callback for real-time token rendering.
+
+        Receives text deltas from the agent as tokens arrive. Buffers
+        partial lines and emits complete lines via _cprint to work
+        reliably with prompt_toolkit's patch_stdout.
+
+        Reasoning/thinking blocks (<REASONING_SCRATCHPAD>, <think>, etc.)
+        are suppressed during streaming since they'd display raw XML tags.
+        The agent strips them from the final response anyway.
+        """
+        if not text:
+            return
+
+        self._stream_started = True
+
+        # ── Tag-based reasoning suppression ──
+        # Track whether we're inside a reasoning/thinking block.
+        # These tags are model-generated (system prompt tells the model
+        # to use them) and get stripped from final_response. We must
+        # suppress them during streaming too.
+        _OPEN_TAGS = ("<REASONING_SCRATCHPAD>", "<think>", "<reasoning>", "<THINKING>")
+        _CLOSE_TAGS = ("</REASONING_SCRATCHPAD>", "</think>", "</reasoning>", "</THINKING>")
+
+        # Append to a pre-filter buffer first
+        self._stream_prefilt = getattr(self, "_stream_prefilt", "") + text
+
+        # Check if we're entering a reasoning block
+        if not getattr(self, "_in_reasoning_block", False):
+            for tag in _OPEN_TAGS:
+                idx = self._stream_prefilt.find(tag)
+                if idx != -1:
+                    # Emit everything before the tag
+                    before = self._stream_prefilt[:idx]
+                    if before:
+                        self._emit_stream_text(before)
+                    self._in_reasoning_block = True
+                    self._stream_prefilt = self._stream_prefilt[idx + len(tag):]
+                    break
+
+            # Could also be a partial open tag at the end — hold it back
+            if not getattr(self, "_in_reasoning_block", False):
+                # Check for partial tag match at the end
+                safe = self._stream_prefilt
+                for tag in _OPEN_TAGS:
+                    for i in range(1, len(tag)):
+                        if self._stream_prefilt.endswith(tag[:i]):
+                            safe = self._stream_prefilt[:-i]
+                            break
+                if safe:
+                    self._emit_stream_text(safe)
+                    self._stream_prefilt = self._stream_prefilt[len(safe):]
+                return
+
+        # Inside a reasoning block — look for close tag.
+        # Keep accumulating _stream_prefilt because close tags can arrive
+        # split across multiple tokens (e.g. "</REASONING_SCRATCH" + "PAD>...").
+        if getattr(self, "_in_reasoning_block", False):
+            for tag in _CLOSE_TAGS:
+                idx = self._stream_prefilt.find(tag)
+                if idx != -1:
+                    self._in_reasoning_block = False
+                    after = self._stream_prefilt[idx + len(tag):]
+                    self._stream_prefilt = ""
+                    # Process remaining text after close tag through full
+                    # filtering (it could contain another open tag)
+                    if after:
+                        self._stream_delta(after)
+                    return
+            # Still inside reasoning block — keep only the tail that could
+            # be a partial close tag prefix (save memory on long blocks).
+            max_tag_len = max(len(t) for t in _CLOSE_TAGS)
+            if len(self._stream_prefilt) > max_tag_len:
+                self._stream_prefilt = self._stream_prefilt[-max_tag_len:]
+            return
+
+    def _emit_stream_text(self, text: str) -> None:
+        """Emit filtered text to the streaming display."""
+        if not text:
+            return
+
+        # Close the live reasoning box before opening the response box
+        self._close_reasoning_box()
+
+        # Open the response box header on the very first visible text
+        if not self._stream_box_opened:
+            # Strip leading whitespace/newlines before first visible content
+            text = text.lstrip("\n")
+            if not text:
+                return
+            self._stream_box_opened = True
+            try:
+                from hermes_cli.skin_engine import get_active_skin
+                _skin = get_active_skin()
+                label = _skin.get_branding("response_label", "⚕ Hermes")
+            except Exception:
+                label = "⚕ Hermes"
+            w = shutil.get_terminal_size().columns
+            fill = w - 2 - len(label)
+            _cprint(f"\n{_GOLD}╭─{label}{'─' * max(fill - 1, 0)}╮{_RST}")
+
+        self._stream_buf += text
+
+        # Emit complete lines, keep partial remainder in buffer
+        while "\n" in self._stream_buf:
+            line, self._stream_buf = self._stream_buf.split("\n", 1)
+            _cprint(line)
+
+    def _flush_stream(self) -> None:
+        """Emit any remaining partial line from the stream buffer and close the box."""
+        # Close reasoning box if still open (in case no content tokens arrived)
+        self._close_reasoning_box()
+
+        if self._stream_buf:
+            _cprint(self._stream_buf)
+            self._stream_buf = ""
+
+        # Close the response box
+        if self._stream_box_opened:
+            w = shutil.get_terminal_size().columns
+            _cprint(f"{_GOLD}╰{'─' * (w - 2)}╯{_RST}")
+
+    def _reset_stream_state(self) -> None:
+        """Reset streaming state before each agent invocation."""
+        self._stream_buf = ""
+        self._stream_started = False
+        self._stream_box_opened = False
+        self._stream_prefilt = ""
+        self._in_reasoning_block = False
+        self._reasoning_box_opened = False
+        self._reasoning_buf = ""
+
    def _slow_command_status(self, command: str) -> str:
        """Return a user-facing status message for slower slash commands."""
        cmd_lower = command.lower().strip()
@@ -1430,6 +1649,8 @@ class HermesCLI:
            return "Processing skills command..."
        if cmd_lower == "/reload-mcp":
            return "Reloading MCP servers..."
+        if cmd_lower.startswith("/browser"):
+            return "Configuring browser..."
        return "Processing command..."

    def _command_spinner_frame(self) -> str:
@@ -1616,7 +1837,11 @@ class HermesCLI:
                platform="cli",
                session_db=self._session_db,
                clarify_callback=self._clarify_callback,
-                reasoning_callback=self._on_reasoning if (self.show_reasoning or self.verbose) else None,
+                reasoning_callback=(
+                    self._stream_reasoning_delta if (self.streaming_enabled and self.show_reasoning)
+                    else self._on_reasoning if (self.show_reasoning or self.verbose)
+                    else None
+                ),
                honcho_session_key=None,  # resolved by run_agent via config sessions map / title
                fallback_model=self._fallback_model,
                thinking_callback=self._on_thinking,
@@ -1624,6 +1849,7 @@ class HermesCLI:
                checkpoint_max_snapshots=self.checkpoint_max_snapshots,
                pass_session_id=self.pass_session_id,
                tool_progress_callback=self._on_tool_progress,
+                stream_delta_callback=self._stream_delta if self.streaming_enabled else None,
            )
            self._active_agent_route_signature = (
                effective_model,
@@ -2027,6 +2253,26 @@ class HermesCLI:
            # Treat as a git hash
            return ref

+    def _handle_stop_command(self):
+        """Handle /stop — kill all running background processes.
+
+        Inspired by OpenAI Codex's separation of interrupt (stop current turn)
+        from /stop (clean up background processes). See openai/codex#14602.
+        """
+        from tools.process_registry import get_registry
+
+        registry = get_registry()
+        processes = registry.list_processes()
+        running = [p for p in processes if p.get("status") == "running"]
+
+        if not running:
+            print("  No running background processes.")
+            return
+
+        print(f"  Stopping {len(running)} background process(es)...")
+        killed = registry.kill_all()
+        print(f"  ✅ Stopped {killed} process(es).")
+
    def _handle_paste_command(self):
        """Handle /paste — explicitly check clipboard for an image.

@@ -3020,18 +3266,25 @@ class HermesCLI:
        # Lowercase only for dispatch matching; preserve original case for arguments
        cmd_lower = command.lower().strip()
        cmd_original = command.strip()
+
+        # Resolve aliases via central registry so adding an alias is a one-line
+        # change in hermes_cli/commands.py instead of touching every dispatch site.
+        from hermes_cli.commands import resolve_command as _resolve_cmd
+        _base_word = cmd_lower.split()[0].lstrip("/")
+        _cmd_def = _resolve_cmd(_base_word)
+        canonical = _cmd_def.name if _cmd_def else _base_word
        
-        if cmd_lower in ("/quit", "/exit", "/q"):
+        if canonical in ("quit", "exit", "q"):
            return False
-        elif cmd_lower == "/help":
+        elif canonical == "help":
            self.show_help()
-        elif cmd_lower == "/tools":
+        elif canonical == "tools":
            self.show_tools()
-        elif cmd_lower == "/toolsets":
+        elif canonical == "toolsets":
            self.show_toolsets()
-        elif cmd_lower == "/config":
+        elif canonical == "config":
            self.show_config()
-        elif cmd_lower == "/clear":
+        elif canonical == "clear":
            self.new_session(silent=True)
            # Clear terminal screen.  Inside the TUI, Rich's console.clear()
            # goes through patch_stdout's StdoutProxy which swallows the
@@ -3072,9 +3325,9 @@ class HermesCLI:
            else:
                self.show_banner()
                print("  ✨ (◕‿◕)✨ Fresh start! Screen cleared and conversation reset.\n")
-        elif cmd_lower == "/history":
+        elif canonical == "history":
            self.show_history()
-        elif cmd_lower.startswith("/title"):
+        elif canonical == "title":
            parts = cmd_original.split(maxsplit=1)
            if len(parts) > 1:
                raw_title = parts[1].strip()
@@ -3145,9 +3398,9 @@ class HermesCLI:
                        _cprint(f"  No title set. Usage: /title <your session title>")
                else:
                    _cprint("  Session database not available.")
-        elif cmd_lower in ("/reset", "/new"):
+        elif canonical == "new":
            self.new_session()
-        elif cmd_lower.startswith("/model"):
+        elif canonical == "model":
            # Use original case so model names like "Anthropic/Claude-Opus-4" are preserved
            parts = cmd_original.split(maxsplit=1)
            if len(parts) > 1:
@@ -3234,54 +3487,79 @@ class HermesCLI:
                        print("  Note: Model will revert on restart. Use a verified model to save to config.")
            else:
                self._show_model_and_providers()
-        elif cmd_lower == "/provider":
+        elif canonical == "provider":
            self._show_model_and_providers()
-        elif cmd_lower.startswith("/prompt"):
+        elif canonical == "prompt":
            # Use original case so prompt text isn't lowercased
            self._handle_prompt_command(cmd_original)
-        elif cmd_lower.startswith("/personality"):
+        elif canonical == "personality":
            # Use original case (handler lowercases the personality name itself)
            self._handle_personality_command(cmd_original)
-        elif cmd_lower == "/plan" or cmd_lower.startswith("/plan "):
+        elif canonical == "plan":
            self._handle_plan_command(cmd_original)
-        elif cmd_lower == "/retry":
+        elif canonical == "retry":
            retry_msg = self.retry_last()
            if retry_msg and hasattr(self, '_pending_input'):
                # Re-queue the message so process_loop sends it to the agent
                self._pending_input.put(retry_msg)
-        elif cmd_lower == "/undo":
+        elif canonical == "undo":
            self.undo_last()
-        elif cmd_lower == "/save":
+        elif canonical == "save":
            self.save_conversation()
-        elif cmd_lower.startswith("/cron"):
+        elif canonical == "cron":
            self._handle_cron_command(cmd_original)
-        elif cmd_lower.startswith("/skills"):
+        elif canonical == "skills":
            with self._busy_command(self._slow_command_status(cmd_original)):
                self._handle_skills_command(cmd_original)
-        elif cmd_lower == "/platforms" or cmd_lower == "/gateway":
+        elif canonical == "platforms":
            self._show_gateway_status()
-        elif cmd_lower == "/verbose":
+        elif canonical == "verbose":
            self._toggle_verbose()
-        elif cmd_lower.startswith("/reasoning"):
+        elif canonical == "reasoning":
            self._handle_reasoning_command(cmd_original)
-        elif cmd_lower == "/compress":
+        elif canonical == "compress":
            self._manual_compress()
-        elif cmd_lower == "/usage":
+        elif canonical == "usage":
            self._show_usage()
-        elif cmd_lower.startswith("/insights"):
+        elif canonical == "insights":
            self._show_insights(cmd_original)
-        elif cmd_lower == "/paste":
+        elif canonical == "paste":
            self._handle_paste_command()
-        elif cmd_lower == "/reload-mcp":
+        elif canonical == "reload-mcp":
            with self._busy_command(self._slow_command_status(cmd_original)):
                self._reload_mcp()
-        elif cmd_lower.startswith("/rollback"):
+        elif _base_word == "browser":
+            self._handle_browser_command(cmd_original)
+        elif canonical == "plugins":
+            try:
+                from hermes_cli.plugins import get_plugin_manager
+                mgr = get_plugin_manager()
+                plugins = mgr.list_plugins()
+                if not plugins:
+                    print("No plugins installed.")
+                    print(f"Drop plugin directories into ~/.hermes/plugins/ to get started.")
+                else:
+                    print(f"Plugins ({len(plugins)}):")
+                    for p in plugins:
+                        status = "✓" if p["enabled"] else "✗"
+                        version = f" v{p['version']}" if p["version"] else ""
+                        tools = f"{p['tools']} tools" if p["tools"] else ""
+                        hooks = f"{p['hooks']} hooks" if p["hooks"] else ""
+                        parts = [x for x in [tools, hooks] if x]
+                        detail = f" ({', '.join(parts)})" if parts else ""
+                        error = f" — {p['error']}" if p["error"] else ""
+                        print(f"  {status} {p['name']}{version}{detail}{error}")
+            except Exception as e:
+                print(f"Plugin system error: {e}")
+        elif canonical == "rollback":
            self._handle_rollback_command(cmd_original)
-        elif cmd_lower.startswith("/background"):
+        elif canonical == "stop":
+            self._handle_stop_command()
+        elif canonical == "background":
            self._handle_background_command(cmd_original)
-        elif cmd_lower.startswith("/skin"):
+        elif canonical == "skin":
            self._handle_skin_command(cmd_original)
-        elif cmd_lower.startswith("/voice"):
+        elif canonical == "voice":
            self._handle_voice_command(cmd_original)
        else:
            # Check for user-defined quick commands (bypass agent loop, no LLM call)
@@ -3340,18 +3618,18 @@ class HermesCLI:
                    full_name = matches[0]
                    if full_name == typed_base:
                        # Already an exact token — no expansion possible; fall through
-                        self.console.print(f"[bold red]Unknown command: {cmd_lower}[/]")
-                        self.console.print("[dim #B8860B]Type /help for available commands[/]")
+                        _cprint(f"\033[1;31mUnknown command: {cmd_lower}{_RST}")
+                        _cprint(f"{_DIM}{_GOLD}Type /help for available commands{_RST}")
                    else:
                        remainder = cmd_original.strip()[len(typed_base):]
                        full_cmd = full_name + remainder
                        return self.process_command(full_cmd)
                elif len(matches) > 1:
-                    self.console.print(f"[bold yellow]Ambiguous command: {cmd_lower}[/]")
-                    self.console.print(f"[dim]Did you mean: {', '.join(sorted(matches))}?[/]")
+                    _cprint(f"{_GOLD}Ambiguous command: {cmd_lower}{_RST}")
+                    _cprint(f"{_DIM}Did you mean: {', '.join(sorted(matches))}?{_RST}")
                else:
-                    self.console.print(f"[bold red]Unknown command: {cmd_lower}[/]")
-                    self.console.print("[dim #B8860B]Type /help for available commands[/]")
+                    _cprint(f"\033[1;31mUnknown command: {cmd_lower}{_RST}")
+                    _cprint(f"{_DIM}{_GOLD}Type /help for available commands{_RST}")
        
        return True
    
@@ -3493,6 +3771,210 @@ class HermesCLI:
        self._background_tasks[task_id] = thread
        thread.start()

+    @staticmethod
+    def _try_launch_chrome_debug(port: int, system: str) -> bool:
+        """Try to launch Chrome/Chromium with remote debugging enabled.
+
+        Returns True if a launch command was executed (doesn't guarantee success).
+        """
+        import shutil
+        import subprocess as _sp
+
+        candidates = []
+        if system == "Darwin":
+            # macOS: try common app bundle locations
+            for app in (
+                "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome",
+                "/Applications/Chromium.app/Contents/MacOS/Chromium",
+                "/Applications/Brave Browser.app/Contents/MacOS/Brave Browser",
+                "/Applications/Microsoft Edge.app/Contents/MacOS/Microsoft Edge",
+            ):
+                if os.path.isfile(app):
+                    candidates.append(app)
+        else:
+            # Linux: try common binary names
+            for name in ("google-chrome", "google-chrome-stable", "chromium-browser",
+                         "chromium", "brave-browser", "microsoft-edge"):
+                path = shutil.which(name)
+                if path:
+                    candidates.append(path)
+
+        if not candidates:
+            return False
+
+        chrome = candidates[0]
+        try:
+            _sp.Popen(
+                [chrome, f"--remote-debugging-port={port}"],
+                stdout=_sp.DEVNULL,
+                stderr=_sp.DEVNULL,
+                start_new_session=True,  # detach from terminal
+            )
+            return True
+        except Exception:
+            return False
+
+    def _handle_browser_command(self, cmd: str):
+        """Handle /browser connect|disconnect|status — manage live Chrome CDP connection."""
+        import platform as _plat
+        import subprocess as _sp
+
+        parts = cmd.strip().split(None, 1)
+        sub = parts[1].lower().strip() if len(parts) > 1 else "status"
+
+        _DEFAULT_CDP = "ws://localhost:9222"
+        current = os.environ.get("BROWSER_CDP_URL", "").strip()
+
+        if sub.startswith("connect"):
+            # Optionally accept a custom CDP URL: /browser connect ws://host:port
+            connect_parts = cmd.strip().split(None, 2)  # ["/browser", "connect", "ws://..."]
+            cdp_url = connect_parts[2].strip() if len(connect_parts) > 2 else _DEFAULT_CDP
+
+            # Clear any existing browser sessions so the next tool call uses the new backend
+            try:
+                from tools.browser_tool import cleanup_all_browsers
+                cleanup_all_browsers()
+            except Exception:
+                pass
+
+            print()
+
+            # Extract port for connectivity checks
+            _port = 9222
+            try:
+                _port = int(cdp_url.rsplit(":", 1)[-1].split("/")[0])
+            except (ValueError, IndexError):
+                pass
+
+            # Check if Chrome is already listening on the debug port
+            import socket
+            _already_open = False
+            try:
+                s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+                s.settimeout(1)
+                s.connect(("127.0.0.1", _port))
+                s.close()
+                _already_open = True
+            except (OSError, socket.timeout):
+                pass
+
+            if _already_open:
+                print(f"   ✓ Chrome is already listening on port {_port}")
+            elif cdp_url == _DEFAULT_CDP:
+                # Try to auto-launch Chrome with remote debugging
+                print("   Chrome isn't running with remote debugging — attempting to launch...")
+                _launched = self._try_launch_chrome_debug(_port, _plat.system())
+                if _launched:
+                    # Wait for the port to come up
+                    import time as _time
+                    for _wait in range(10):
+                        try:
+                            s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+                            s.settimeout(1)
+                            s.connect(("127.0.0.1", _port))
+                            s.close()
+                            _already_open = True
+                            break
+                        except (OSError, socket.timeout):
+                            _time.sleep(0.5)
+                    if _already_open:
+                        print(f"   ✓ Chrome launched and listening on port {_port}")
+                    else:
+                        print(f"   ⚠ Chrome launched but port {_port} isn't responding yet")
+                        print("     You may need to close existing Chrome windows first and retry")
+                else:
+                    print(f"   ⚠ Could not auto-launch Chrome")
+                    # Show manual instructions as fallback
+                    sys_name = _plat.system()
+                    if sys_name == "Darwin":
+                        chrome_cmd = 'open -a "Google Chrome" --args --remote-debugging-port=9222'
+                    elif sys_name == "Windows":
+                        chrome_cmd = 'chrome.exe --remote-debugging-port=9222'
+                    else:
+                        chrome_cmd = "google-chrome --remote-debugging-port=9222"
+                    print(f"     Launch Chrome manually: {chrome_cmd}")
+            else:
+                print(f"   ⚠ Port {_port} is not reachable at {cdp_url}")
+
+            os.environ["BROWSER_CDP_URL"] = cdp_url
+            print()
+            print("🌐 Browser connected to live Chrome via CDP")
+            print(f"   Endpoint: {cdp_url}")
+            print()
+
+            # Inject context message so the model knows
+            if hasattr(self, '_pending_input'):
+                self._pending_input.put(
+                    "[System note: The user has connected your browser tools to their live Chrome browser "
+                    "via Chrome DevTools Protocol. Your browser_navigate, browser_snapshot, browser_click, "
+                    "and other browser tools now control their real browser — including any pages they have "
+                    "open, logged-in sessions, and cookies. They likely opened specific sites or logged into "
+                    "services before connecting. Please await their instruction before attempting to operate "
+                    "the browser. When you do act, be mindful that your actions affect their real browser — "
+                    "don't close tabs or navigate away from pages without asking.]"
+                )
+
+        elif sub == "disconnect":
+            if current:
+                os.environ.pop("BROWSER_CDP_URL", None)
+                try:
+                    from tools.browser_tool import cleanup_all_browsers
+                    cleanup_all_browsers()
+                except Exception:
+                    pass
+                print()
+                print("🌐 Browser disconnected from live Chrome")
+                print("   Browser tools reverted to default mode (local headless or Browserbase)")
+                print()
+
+                if hasattr(self, '_pending_input'):
+                    self._pending_input.put(
+                        "[System note: The user has disconnected the browser tools from their live Chrome. "
+                        "Browser tools are back to default mode (headless local browser or Browserbase cloud).]"
+                    )
+            else:
+                print()
+                print("Browser is not connected to live Chrome (already using default mode)")
+                print()
+
+        elif sub == "status":
+            print()
+            if current:
+                print(f"🌐 Browser: connected to live Chrome via CDP")
+                print(f"   Endpoint: {current}")
+
+                _port = 9222
+                try:
+                    _port = int(current.rsplit(":", 1)[-1].split("/")[0])
+                except (ValueError, IndexError):
+                    pass
+                try:
+                    import socket
+                    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+                    s.settimeout(1)
+                    s.connect(("127.0.0.1", _port))
+                    s.close()
+                    print(f"   Status: ✓ reachable")
+                except (OSError, Exception):
+                    print(f"   Status: ⚠ not reachable (Chrome may not be running)")
+            elif os.environ.get("BROWSERBASE_API_KEY"):
+                print("🌐 Browser: Browserbase (cloud)")
+            else:
+                print("🌐 Browser: local headless Chromium (agent-browser)")
+            print()
+            print("   /browser connect      — connect to your live Chrome")
+            print("   /browser disconnect   — revert to default")
+            print()
+
+        else:
+            print()
+            print("Usage: /browser connect|disconnect|status")
+            print()
+            print("   connect      Connect browser tools to your live Chrome session")
+            print("   disconnect   Revert to default browser backend")
+            print("   status       Show current browser mode")
+            print()
+
    def _handle_skin_command(self, cmd: str):
        """Handle /skin [name] — show or change the display skin."""
        try:
@@ -4668,6 +5150,9 @@ class HermesCLI:
            # Run the conversation with interrupt monitoring
            result = None

+            # Reset streaming display state for this turn
+            self._reset_stream_state()
+
            # --- Streaming TTS setup ---
            # When ElevenLabs is the TTS provider and sounddevice is available,
            # we stream audio sentence-by-sentence as the agent generates tokens
@@ -4794,6 +5279,9 @@ class HermesCLI:

            agent_thread.join()  # Ensure agent thread completes

+            # Flush any remaining streamed text and close the box
+            self._flush_stream()
+
            # Signal end-of-text to TTS consumer and wait for it to finish
            if use_streaming_tts and text_queue is not None:
                text_queue.put(None)  # sentinel
@@ -4836,8 +5324,9 @@ class HermesCLI:

            response_previewed = result.get("response_previewed", False) if result else False

-            # Display reasoning (thinking) box if enabled and available
-            if self.show_reasoning and result:
+            # Display reasoning (thinking) box if enabled and available.
+            # Skip when streaming already showed reasoning live.
+            if self.show_reasoning and result and not self._stream_started:
                reasoning = result.get("last_reasoning")
                if reasoning:
                    w = shutil.get_terminal_size().columns
@@ -4868,10 +5357,15 @@ class HermesCLI:
                    _resp_text = "#FFF8DC"

                is_error_response = result and (result.get("failed") or result.get("partial"))
+                already_streamed = self._stream_started and self._stream_box_opened and not is_error_response
                if use_streaming_tts and _streaming_box_opened and not is_error_response:
                    # Text was already printed sentence-by-sentence; just close the box
                    w = shutil.get_terminal_size().columns
                    _cprint(f"\n{_GOLD}╰{'─' * (w - 2)}╯{_RST}")
+                elif already_streamed:
+                    # Response was already streamed token-by-token with box framing;
+                    # _flush_stream() already closed the box. Skip Rich Panel.
+                    pass
                else:
                    _chat_console = ChatConsole()
                    _chat_console.print(Panel(
@@ -5063,7 +5557,7 @@ class HermesCLI:
            from honcho_integration.client import HonchoClientConfig
            from agent.display import honcho_session_line, write_tty
            hcfg = HonchoClientConfig.from_global_config()
-            if hcfg.enabled:
+            if hcfg.enabled and hcfg.api_key:
                sname = hcfg.resolve_session_name(session_id=self.session_id)
                if sname:
                    write_tty(honcho_session_line(hcfg.workspace_id, sname) + "\n")
@@ -5252,6 +5746,34 @@ class HermesCLI:
            """Ctrl+Enter (c-j) inserts a newline. Most terminals send c-j for Ctrl+Enter."""
            event.current_buffer.insert_text('\n')

+        @kb.add('tab', eager=True)
+        def handle_tab(event):
+            """Tab: accept completion and re-trigger if we just completed a provider.
+
+            After accepting a provider like 'anthropic:', the completion menu
+            closes and complete_while_typing doesn't fire (no keystroke).
+            This binding re-triggers completions so stage-2 models appear
+            immediately.
+            """
+            buf = event.current_buffer
+            if buf.complete_state:
+                completion = buf.complete_state.current_completion
+                if completion is None:
+                    # Menu open but nothing selected — select first then grab it
+                    buf.go_to_completion(0)
+                    completion = buf.complete_state and buf.complete_state.current_completion
+                if completion is None:
+                    return
+                # Accept the selected completion
+                buf.apply_completion(completion)
+                # If text now looks like "/model provider:", re-trigger completions
+                text = buf.document.text_before_cursor
+                if text.startswith("/model ") and text.endswith(":"):
+                    buf.start_completion()
+            else:
+                # No menu open — start completions from scratch
+                buf.start_completion()
+
        # --- Clarify tool: arrow-key navigation for multiple-choice questions ---

        @kb.add('up', filter=Condition(lambda: bool(self._clarify_state) and not self._clarify_freetext))
@@ -5518,6 +6040,39 @@ class HermesCLI:
            return cli_ref._get_tui_prompt_fragments()

        # Create the input area with multiline (shift+enter), autocomplete, and paste handling
+        from prompt_toolkit.auto_suggest import AutoSuggestFromHistory
+
+        def _get_model_completer_info() -> dict:
+            """Return provider/model info for /model autocomplete."""
+            try:
+                from hermes_cli.models import (
+                    _PROVIDER_LABELS, _PROVIDER_MODELS, normalize_provider,
+                    provider_model_ids,
+                )
+                current = getattr(cli_ref, "provider", None) or getattr(cli_ref, "requested_provider", "openrouter")
+                current = normalize_provider(current)
+
+                # Provider map: id -> label (only providers with known models)
+                providers = {}
+                for pid, plabel in _PROVIDER_LABELS.items():
+                    providers[pid] = plabel
+
+                def models_for(provider_name: str) -> list[str]:
+                    norm = normalize_provider(provider_name)
+                    return provider_model_ids(norm)
+
+                return {
+                    "current_provider": current,
+                    "providers": providers,
+                    "models_for": models_for,
+                }
+            except Exception:
+                return {}
+
+        _completer = SlashCommandCompleter(
+            skill_commands_provider=lambda: _skill_commands,
+            model_completer_provider=_get_model_completer_info,
+        )
        input_area = TextArea(
            height=Dimension(min=1, max=8, preferred=1),
            prompt=get_prompt,
@@ -5526,8 +6081,12 @@ class HermesCLI:
            wrap_lines=True,
            read_only=Condition(lambda: bool(cli_ref._command_running)),
            history=FileHistory(str(self._history_file)),
-            completer=SlashCommandCompleter(skill_commands_provider=lambda: _skill_commands),
+            completer=_completer,
            complete_while_typing=True,
+            auto_suggest=SlashCommandAutoSuggest(
+                history_suggest=AutoSuggestFromHistory(),
+                completer=_completer,
+            ),
        )

        # Dynamic height: accounts for both explicit newlines AND visual
@@ -6,6 +6,7 @@ Output is saved to ~/.hermes/cron/output/{job_id}/{timestamp}.md
 """

 import json
+import logging
 import tempfile
 import os
 import re
@@ -14,6 +15,8 @@ from datetime import datetime, timedelta
 from pathlib import Path
 from typing import Optional, Dict, List, Any

+logger = logging.getLogger(__name__)
+
 from hermes_time import now as _hermes_now

 try:
@@ -528,10 +531,18 @@ def mark_job_run(job_id: str, success: bool, error: Optional[str] = None):


 def get_due_jobs() -> List[Dict[str, Any]]:
-    """Get all jobs that are due to run now."""
+    """Get all jobs that are due to run now.
+
+    For recurring jobs (cron/interval), if the scheduled time is stale
+    (more than one period in the past, e.g. because the gateway was down),
+    the job is fast-forwarded to the next future run instead of firing
+    immediately.  This prevents a burst of missed jobs on gateway restart.
+    """
    now = _hermes_now()
    jobs = [_apply_skill_fields(j) for j in load_jobs()]
+    raw_jobs = load_jobs()  # For saving updates
    due = []
+    needs_save = False

    for job in jobs:
        if not job.get("enabled", True):
@@ -543,8 +554,37 @@ def get_due_jobs() -> List[Dict[str, Any]]:

        next_run_dt = _ensure_aware(datetime.fromisoformat(next_run))
        if next_run_dt <= now:
+            schedule = job.get("schedule", {})
+            kind = schedule.get("kind")
+
+            # For recurring jobs, check if the scheduled time is stale
+            # (gateway was down and missed the window). Fast-forward to
+            # the next future occurrence instead of firing a stale run.
+            if kind in ("cron", "interval") and (now - next_run_dt).total_seconds() > 120:
+                # More than 2 minutes late — this is a missed run, not a current one.
+                # Recompute next_run_at to the next future occurrence.
+                new_next = compute_next_run(schedule, now.isoformat())
+                if new_next:
+                    logger.info(
+                        "Job '%s' missed its scheduled time (%s). "
+                        "Fast-forwarding to next run: %s",
+                        job.get("name", job["id"]),
+                        next_run,
+                        new_next,
+                    )
+                    # Update the job in storage
+                    for rj in raw_jobs:
+                        if rj["id"] == job["id"]:
+                            rj["next_run_at"] = new_next
+                            needs_save = True
+                            break
+                    continue  # Skip this run
+
            due.append(job)

+    if needs_save:
+        save_jobs(raw_jobs)
+
    return due


@@ -146,6 +146,37 @@ class PlatformConfig:
        )


+@dataclass
+class StreamingConfig:
+    """Configuration for real-time token streaming to messaging platforms."""
+    enabled: bool = False
+    transport: str = "edit"       # "edit" (progressive editMessageText) or "off"
+    edit_interval: float = 0.3    # Seconds between message edits
+    buffer_threshold: int = 40    # Chars before forcing an edit
+    cursor: str = " ▉"           # Cursor shown during streaming
+
+    def to_dict(self) -> Dict[str, Any]:
+        return {
+            "enabled": self.enabled,
+            "transport": self.transport,
+            "edit_interval": self.edit_interval,
+            "buffer_threshold": self.buffer_threshold,
+            "cursor": self.cursor,
+        }
+
+    @classmethod
+    def from_dict(cls, data: Dict[str, Any]) -> "StreamingConfig":
+        if not data:
+            return cls()
+        return cls(
+            enabled=data.get("enabled", False),
+            transport=data.get("transport", "edit"),
+            edit_interval=float(data.get("edit_interval", 0.3)),
+            buffer_threshold=int(data.get("buffer_threshold", 40)),
+            cursor=data.get("cursor", " ▉"),
+        )
+
+
@dataclass
 class GatewayConfig:
    """
@@ -179,6 +210,9 @@ class GatewayConfig:
    # Session isolation in shared chats
    group_sessions_per_user: bool = True  # Isolate group/channel sessions per participant when user IDs are available

+    # Streaming configuration
+    streaming: StreamingConfig = field(default_factory=StreamingConfig)
+
    def get_connected_platforms(self) -> List[Platform]:
        """Return list of platforms that are enabled and configured."""
        connected = []
@@ -244,6 +278,7 @@ class GatewayConfig:
            "always_log_local": self.always_log_local,
            "stt_enabled": self.stt_enabled,
            "group_sessions_per_user": self.group_sessions_per_user,
+            "streaming": self.streaming.to_dict(),
        }
    
    @classmethod
@@ -297,6 +332,7 @@ class GatewayConfig:
            always_log_local=data.get("always_log_local", True),
            stt_enabled=_coerce_bool(stt_enabled, True),
            group_sessions_per_user=_coerce_bool(group_sessions_per_user, True),
+            streaming=StreamingConfig.from_dict(data.get("streaming", {})),
        )


@@ -510,6 +510,7 @@ class BasePlatformAdapter(ABC):
        image_url: str,
        caption: Optional[str] = None,
        reply_to: Optional[str] = None,
+        metadata: Optional[Dict[str, Any]] = None,
    ) -> SendResult:
        """
        Send an image natively via the platform API.
@@ -528,6 +529,7 @@ class BasePlatformAdapter(ABC):
        animation_url: str,
        caption: Optional[str] = None,
        reply_to: Optional[str] = None,
+        metadata: Optional[Dict[str, Any]] = None,
    ) -> SendResult:
        """
        Send an animated GIF natively via the platform API.
@@ -536,7 +538,7 @@ class BasePlatformAdapter(ABC):
        (e.g., Telegram send_animation) so they auto-play inline.
        Default falls back to send_image.
        """
-        return await self.send_image(chat_id=chat_id, image_url=animation_url, caption=caption, reply_to=reply_to)
+        return await self.send_image(chat_id=chat_id, image_url=animation_url, caption=caption, reply_to=reply_to, metadata=metadata)
    
    @staticmethod
    def _is_animation_url(url: str) -> bool:
@@ -726,7 +728,75 @@ class BasePlatformAdapter(ABC):
            cleaned = re.sub(r'\n{3,}', '\n\n', cleaned).strip()
        
        return media, cleaned
-    
+
+    @staticmethod
+    def extract_local_files(content: str) -> Tuple[List[str], str]:
+        """
+        Detect bare local file paths in response text for native media delivery.
+
+        Matches absolute paths (/...) and tilde paths (~/) ending in common
+        image or video extensions.  Validates each candidate with
+        ``os.path.isfile()`` to avoid false positives from URLs or
+        non-existent paths.
+
+        Paths inside fenced code blocks (``` ... ```) and inline code
+        (`...`) are ignored so that code samples are never mutilated.
+
+        Returns:
+            Tuple of (list of expanded file paths, cleaned text with the
+            raw path strings removed).
+        """
+        _LOCAL_MEDIA_EXTS = (
+            '.png', '.jpg', '.jpeg', '.gif', '.webp',
+            '.mp4', '.mov', '.avi', '.mkv', '.webm',
+        )
+        ext_part = '|'.join(e.lstrip('.') for e in _LOCAL_MEDIA_EXTS)
+
+        # (?<![/:\w.]) prevents matching inside URLs (e.g. https://…/img.png)
+        #             and relative paths (./foo.png)
+        # (?:~/|/)    anchors to absolute or home-relative paths
+        path_re = re.compile(
+            r'(?<![/:\w.])(?:~/|/)(?:[\w.\-]+/)*[\w.\-]+\.(?:' + ext_part + r')\b',
+            re.IGNORECASE,
+        )
+
+        # Build spans covered by fenced code blocks and inline code
+        code_spans: list = []
+        for m in re.finditer(r'```[^\n]*\n.*?```', content, re.DOTALL):
+            code_spans.append((m.start(), m.end()))
+        for m in re.finditer(r'`[^`\n]+`', content):
+            code_spans.append((m.start(), m.end()))
+
+        def _in_code(pos: int) -> bool:
+            return any(s <= pos < e for s, e in code_spans)
+
+        found: list = []  # (raw_match_text, expanded_path)
+        for match in path_re.finditer(content):
+            if _in_code(match.start()):
+                continue
+            raw = match.group(0)
+            expanded = os.path.expanduser(raw)
+            if os.path.isfile(expanded):
+                found.append((raw, expanded))
+
+        # Deduplicate by expanded path, preserving discovery order
+        seen: set = set()
+        unique: list = []
+        for raw, expanded in found:
+            if expanded not in seen:
+                seen.add(expanded)
+                unique.append((raw, expanded))
+
+        paths = [expanded for _, expanded in unique]
+
+        cleaned = content
+        if unique:
+            for raw, _exp in unique:
+                cleaned = cleaned.replace(raw, '')
+            cleaned = re.sub(r'\n{3,}', '\n\n', cleaned).strip()
+
+        return paths, cleaned
+
    async def _keep_typing(self, chat_id: str, interval: float = 2.0, metadata=None) -> None:
        """
        Continuously send typing indicator until cancelled.
@@ -839,8 +909,17 @@ class BasePlatformAdapter(ABC):
                
                # Extract image URLs and send them as native platform attachments
                images, text_content = self.extract_images(response)
+                # Strip any remaining internal directives from message body (fixes #1561)
+                text_content = text_content.replace("[[audio_as_voice]]", "").strip()
+                text_content = re.sub(r"MEDIA:\s*\S+", "", text_content).strip()
                if images:
                    logger.info("[%s] extract_images found %d image(s) in response (%d chars)", self.name, len(images), len(response))
+
+                # Auto-detect bare local file paths for native media delivery
+                # (helps small models that don't use MEDIA: syntax)
+                local_files, text_content = self.extract_local_files(text_content)
+                if local_files:
+                    logger.info("[%s] extract_local_files found %d file(s) in response", self.name, len(local_files))
                
                # Auto-TTS: if voice message, generate audio FIRST (before sending text)
                # Skipped when the chat has voice mode disabled (/voice off)
@@ -934,7 +1013,7 @@ class BasePlatformAdapter(ABC):

                # Send extracted media files — route by file type
                _AUDIO_EXTS = {'.ogg', '.opus', '.mp3', '.wav', '.m4a'}
-                _VIDEO_EXTS = {'.mp4', '.mov', '.avi', '.mkv', '.3gp'}
+                _VIDEO_EXTS = {'.mp4', '.mov', '.avi', '.mkv', '.webm', '.3gp'}
                _IMAGE_EXTS = {'.jpg', '.jpeg', '.png', '.webp', '.gif'}

                for media_path, is_voice in media_files:
@@ -971,7 +1050,34 @@ class BasePlatformAdapter(ABC):
                            print(f"[{self.name}] Failed to send media ({ext}): {media_result.error}")
                    except Exception as media_err:
                        print(f"[{self.name}] Error sending media: {media_err}")
-            
+
+                # Send auto-detected local files as native attachments
+                for file_path in local_files:
+                    if human_delay > 0:
+                        await asyncio.sleep(human_delay)
+                    try:
+                        ext = Path(file_path).suffix.lower()
+                        if ext in _IMAGE_EXTS:
+                            await self.send_image_file(
+                                chat_id=event.source.chat_id,
+                                image_path=file_path,
+                                metadata=_thread_metadata,
+                            )
+                        elif ext in _VIDEO_EXTS:
+                            await self.send_video(
+                                chat_id=event.source.chat_id,
+                                video_path=file_path,
+                                metadata=_thread_metadata,
+                            )
+                        else:
+                            await self.send_document(
+                                chat_id=event.source.chat_id,
+                                file_path=file_path,
+                                metadata=_thread_metadata,
+                            )
+                    except Exception as file_err:
+                        logger.error("[%s] Error sending local file %s: %s", self.name, file_path, file_err)
+
            # Check if there's a pending message that was queued during our processing
            if session_key in self._pending_messages:
                pending_event = self._pending_messages.pop(session_key)
@@ -1077,7 +1183,8 @@ class BasePlatformAdapter(ABC):
        """
        return content
    
-    def truncate_message(self, content: str, max_length: int = 4096) -> List[str]:
+    @staticmethod
+    def truncate_message(content: str, max_length: int = 4096) -> List[str]:
        """
        Split a long message into chunks, preserving code block boundaries.

@@ -1129,6 +1236,27 @@ class BasePlatformAdapter(ABC):
            if split_at < 1:
                split_at = headroom

+            # Avoid splitting inside an inline code span (`...`).
+            # If the text before split_at has an odd number of unescaped
+            # backticks, the split falls inside inline code — the resulting
+            # chunk would have an unpaired backtick and any special characters
+            # (like parentheses) inside the broken span would be unescaped,
+            # causing MarkdownV2 parse errors on Telegram.
+            candidate = remaining[:split_at]
+            backtick_count = candidate.count("`") - candidate.count("\\`")
+            if backtick_count % 2 == 1:
+                # Find the last unescaped backtick and split before it
+                last_bt = candidate.rfind("`")
+                while last_bt > 0 and candidate[last_bt - 1] == "\\":
+                    last_bt = candidate.rfind("`", 0, last_bt)
+                if last_bt > 0:
+                    # Try to find a space or newline just before the backtick
+                    safe_split = candidate.rfind(" ", 0, last_bt)
+                    nl_split = candidate.rfind("\n", 0, last_bt)
+                    safe_split = max(safe_split, nl_split)
+                    if safe_split > headroom // 4:
+                        split_at = safe_split
+
            chunk_body = remaining[:split_at]
            remaining = remaining[split_at:].lstrip()

@@ -10,6 +10,7 @@ Uses discord.py library for:
 """

 import asyncio
+import json
 import logging
 import os
 import struct
@@ -18,6 +19,7 @@ import tempfile
 import threading
 import time
 from collections import defaultdict
+from pathlib import Path
 from typing import Callable, Dict, List, Optional, Any

 logger = logging.getLogger(__name__)
@@ -434,8 +436,11 @@ class DiscordAdapter(BasePlatformAdapter):
        self._voice_input_callback: Optional[Callable] = None  # set by run.py
        self._on_voice_disconnect: Optional[Callable] = None  # set by run.py
        # Track threads where the bot has participated so follow-up messages
-        # in those threads don't require @mention.
-        self._bot_participated_threads: set = set()
+        # in those threads don't require @mention.  Persisted to disk so the
+        # set survives gateway restarts.
+        self._bot_participated_threads: set = self._load_participated_threads()
+        # Cap to prevent unbounded growth (Discord threads get archived).
+        self._MAX_TRACKED_THREADS = 500
    
    async def connect(self) -> bool:
        """Connect to Discord and start receiving events."""
@@ -1573,6 +1578,10 @@ class DiscordAdapter(BasePlatformAdapter):
        link = f"<#{thread_id}>" if thread_id else f"**{thread_name}**"
        await interaction.followup.send(f"Created thread {link}", ephemeral=True)

+        # Track thread participation so follow-ups don't require @mention
+        if thread_id:
+            self._track_thread(thread_id)
+
        # If a message was provided, kick off a new Hermes session in the thread
        starter = (message or "").strip()
        if starter and thread_id:
@@ -1798,6 +1807,49 @@ class DiscordAdapter(BasePlatformAdapter):
            return f"{parent_name} / {thread_name}"
        return thread_name

+    # ------------------------------------------------------------------
+    # Thread participation persistence
+    # ------------------------------------------------------------------
+
+    @staticmethod
+    def _thread_state_path() -> Path:
+        """Path to the persisted thread participation set."""
+        from hermes_cli.config import get_hermes_home
+        return get_hermes_home() / "discord_threads.json"
+
+    @classmethod
+    def _load_participated_threads(cls) -> set:
+        """Load persisted thread IDs from disk."""
+        path = cls._thread_state_path()
+        try:
+            if path.exists():
+                data = json.loads(path.read_text(encoding="utf-8"))
+                if isinstance(data, list):
+                    return set(data)
+        except Exception as e:
+            logger.debug("Could not load discord thread state: %s", e)
+        return set()
+
+    def _save_participated_threads(self) -> None:
+        """Persist the current thread set to disk (best-effort)."""
+        path = self._thread_state_path()
+        try:
+            # Trim to most recent entries if over cap
+            thread_list = list(self._bot_participated_threads)
+            if len(thread_list) > self._MAX_TRACKED_THREADS:
+                thread_list = thread_list[-self._MAX_TRACKED_THREADS:]
+                self._bot_participated_threads = set(thread_list)
+            path.parent.mkdir(parents=True, exist_ok=True)
+            path.write_text(json.dumps(thread_list), encoding="utf-8")
+        except Exception as e:
+            logger.debug("Could not save discord thread state: %s", e)
+
+    def _track_thread(self, thread_id: str) -> None:
+        """Add a thread to the participation set and persist."""
+        if thread_id not in self._bot_participated_threads:
+            self._bot_participated_threads.add(thread_id)
+            self._save_participated_threads()
+
    async def _handle_message(self, message: DiscordMessage) -> None:
        """Handle incoming Discord messages."""
        # In server channels (not DMs), require the bot to be @mentioned
@@ -1850,7 +1902,7 @@ class DiscordAdapter(BasePlatformAdapter):
                    is_thread = True
                    thread_id = str(thread.id)
                    auto_threaded_channel = thread
-                    self._bot_participated_threads.add(thread_id)
+                    self._track_thread(thread_id)

        # Determine message type
        msg_type = MessageType.TEXT
@@ -1954,7 +2006,7 @@ class DiscordAdapter(BasePlatformAdapter):
        # Track thread participation so the bot won't require @mention for
        # follow-up messages in threads it has already engaged in.
        if thread_id:
-            self._bot_participated_threads.add(thread_id)
+            self._track_thread(thread_id)

        await self.handle_message(event)

@@ -135,14 +135,23 @@ def _extract_email_address(raw: str) -> str:
    return raw.strip().lower()


-def _extract_attachments(msg: email_lib.message.Message) -> List[Dict[str, Any]]:
-    """Extract attachment metadata and cache files locally."""
+def _extract_attachments(
+    msg: email_lib.message.Message,
+    skip_attachments: bool = False,
+) -> List[Dict[str, Any]]:
+    """Extract attachment metadata and cache files locally.
+
+    When *skip_attachments* is True, all attachment/inline parts are ignored
+    (useful for malware protection or bandwidth savings).
+    """
    attachments = []
    if not msg.is_multipart():
        return attachments

    for part in msg.walk():
        disposition = str(part.get("Content-Disposition", ""))
+        if skip_attachments and ("attachment" in disposition or "inline" in disposition):
+            continue
        if "attachment" not in disposition and "inline" not in disposition:
            continue
        # Skip text/plain and text/html body parts
@@ -196,6 +205,13 @@ class EmailAdapter(BasePlatformAdapter):
        self._smtp_port = int(os.getenv("EMAIL_SMTP_PORT", "587"))
        self._poll_interval = int(os.getenv("EMAIL_POLL_INTERVAL", "15"))

+        # Skip attachments — configured via config.yaml:
+        #   platforms:
+        #     email:
+        #       skip_attachments: true
+        extra = config.extra or {}
+        self._skip_attachments = extra.get("skip_attachments", False)
+
        # Track message IDs we've already processed to avoid duplicates
        self._seen_uids: set = set()
        self._poll_task: Optional[asyncio.Task] = None
@@ -306,7 +322,7 @@ class EmailAdapter(BasePlatformAdapter):
                message_id = msg.get("Message-ID", "")
                in_reply_to = msg.get("In-Reply-To", "")
                body = _extract_text_body(msg)
-                attachments = _extract_attachments(msg)
+                attachments = _extract_attachments(msg, skip_attachments=self._skip_attachments)

                results.append({
                    "uid": uid,
@@ -789,23 +789,11 @@ class SlackAdapter(BasePlatformAdapter):
        user_id = command.get("user_id", "")
        channel_id = command.get("channel_id", "")

-        # Map subcommands to gateway commands
-        subcommand_map = {
-            "new": "/reset", "reset": "/reset",
-            "status": "/status", "stop": "/stop",
-            "help": "/help",
-            "model": "/model", "personality": "/personality",
-            "retry": "/retry", "undo": "/undo",
-            "compact": "/compress", "compress": "/compress",
-            "resume": "/resume",
-            "background": "/background",
-            "usage": "/usage",
-            "insights": "/insights",
-            "title": "/title",
-            "reasoning": "/reasoning",
-            "provider": "/provider",
-            "rollback": "/rollback",
-        }
+        # Map subcommands to gateway commands — derived from central registry.
+        # Also keep "compact" as a Slack-specific alias for /compress.
+        from hermes_cli.commands import slack_subcommand_map
+        subcommand_map = slack_subcommand_map()
+        subcommand_map["compact"] = "/compress"
        first_word = text.split()[0] if text else ""
        if first_word in subcommand_map:
            # Preserve arguments after the subcommand
@@ -202,8 +202,26 @@ class TelegramAdapter(BasePlatformAdapter):
                self._handle_media_message
            ))
            
-            # Start polling in background
-            await self._app.initialize()
+            # Start polling — retry initialize() for transient TLS resets
+            try:
+                from telegram.error import NetworkError, TimedOut
+            except ImportError:
+                NetworkError = TimedOut = OSError  # type: ignore[misc,assignment]
+            _max_connect = 3
+            for _attempt in range(_max_connect):
+                try:
+                    await self._app.initialize()
+                    break
+                except (NetworkError, TimedOut, OSError) as init_err:
+                    if _attempt < _max_connect - 1:
+                        wait = 2 ** _attempt
+                        logger.warning(
+                            "[%s] Connect attempt %d/%d failed: %s — retrying in %ds",
+                            self.name, _attempt + 1, _max_connect, init_err, wait,
+                        )
+                        await asyncio.sleep(wait)
+                    else:
+                        raise
            await self._app.start()
            loop = asyncio.get_running_loop()

@@ -222,29 +240,13 @@ class TelegramAdapter(BasePlatformAdapter):
            )
            
            # Register bot commands so Telegram shows a hint menu when users type /
+            # List is derived from the central COMMAND_REGISTRY — adding a new
+            # gateway command there automatically adds it to the Telegram menu.
            try:
                from telegram import BotCommand
+                from hermes_cli.commands import telegram_bot_commands
                await self._bot.set_my_commands([
-                    BotCommand("new", "Start a new conversation"),
-                    BotCommand("reset", "Reset conversation history"),
-                    BotCommand("model", "Show or change the model"),
-                    BotCommand("reasoning", "Show or change reasoning effort"),
-                    BotCommand("personality", "Set a personality"),
-                    BotCommand("retry", "Retry your last message"),
-                    BotCommand("undo", "Remove the last exchange"),
-                    BotCommand("status", "Show session info"),
-                    BotCommand("stop", "Stop the running agent"),
-                    BotCommand("sethome", "Set this chat as the home channel"),
-                    BotCommand("compress", "Compress conversation context"),
-                    BotCommand("title", "Set or show the session title"),
-                    BotCommand("resume", "Resume a previously-named session"),
-                    BotCommand("usage", "Show token usage for this session"),
-                    BotCommand("provider", "Show available providers"),
-                    BotCommand("insights", "Show usage insights and analytics"),
-                    BotCommand("update", "Update Hermes to the latest version"),
-                    BotCommand("reload_mcp", "Reload MCP servers from config"),
-                    BotCommand("voice", "Toggle voice reply mode"),
-                    BotCommand("help", "Show available commands"),
+                    BotCommand(name, desc) for name, desc in telegram_bot_commands()
                ])
            except Exception as e:
                logger.warning(
@@ -265,6 +267,8 @@ class TelegramAdapter(BasePlatformAdapter):
                    release_scoped_lock("telegram-bot-token", self._token_lock_identity)
                except Exception:
                    pass
+            message = f"Telegram startup failed: {e}"
+            self._set_fatal_error("telegram_connect_error", message, retryable=True)
            logger.error("[%s] Failed to connect to Telegram: %s", self.name, e, exc_info=True)
            return False
    
@@ -334,32 +338,47 @@ class TelegramAdapter(BasePlatformAdapter):
            message_ids = []
            thread_id = metadata.get("thread_id") if metadata else None
            
+            try:
+                from telegram.error import NetworkError as _NetErr
+            except ImportError:
+                _NetErr = OSError  # type: ignore[misc,assignment]
+
            for i, chunk in enumerate(chunks):
-                # Try Markdown first, fall back to plain text if it fails
-                try:
-                    msg = await self._bot.send_message(
-                        chat_id=int(chat_id),
-                        text=chunk,
-                        parse_mode=ParseMode.MARKDOWN_V2,
-                        reply_to_message_id=int(reply_to) if reply_to and i == 0 else None,
-                        message_thread_id=int(thread_id) if thread_id else None,
-                    )
-                except Exception as md_error:
-                    # Markdown parsing failed, try plain text
-                    if "parse" in str(md_error).lower() or "markdown" in str(md_error).lower():
-                        logger.warning("[%s] MarkdownV2 parse failed, falling back to plain text: %s", self.name, md_error)
-                        # Strip MDV2 escape backslashes so the user doesn't
-                        # see raw backslashes littered through the message.
-                        plain_chunk = _strip_mdv2(chunk)
-                        msg = await self._bot.send_message(
-                            chat_id=int(chat_id),
-                            text=plain_chunk,
-                            parse_mode=None,  # Plain text
-                            reply_to_message_id=int(reply_to) if reply_to and i == 0 else None,
-                            message_thread_id=int(thread_id) if thread_id else None,
-                        )
-                    else:
-                        raise  # Re-raise if not a parse error
+                msg = None
+                for _send_attempt in range(3):
+                    try:
+                        # Try Markdown first, fall back to plain text if it fails
+                        try:
+                            msg = await self._bot.send_message(
+                                chat_id=int(chat_id),
+                                text=chunk,
+                                parse_mode=ParseMode.MARKDOWN_V2,
+                                reply_to_message_id=int(reply_to) if reply_to and i == 0 else None,
+                                message_thread_id=int(thread_id) if thread_id else None,
+                            )
+                        except Exception as md_error:
+                            # Markdown parsing failed, try plain text
+                            if "parse" in str(md_error).lower() or "markdown" in str(md_error).lower():
+                                logger.warning("[%s] MarkdownV2 parse failed, falling back to plain text: %s", self.name, md_error)
+                                plain_chunk = _strip_mdv2(chunk)
+                                msg = await self._bot.send_message(
+                                    chat_id=int(chat_id),
+                                    text=plain_chunk,
+                                    parse_mode=None,
+                                    reply_to_message_id=int(reply_to) if reply_to and i == 0 else None,
+                                    message_thread_id=int(thread_id) if thread_id else None,
+                                )
+                            else:
+                                raise
+                        break  # success
+                    except _NetErr as send_err:
+                        if _send_attempt < 2:
+                            wait = 2 ** _send_attempt
+                            logger.warning("[%s] Network error on send (attempt %d/3), retrying in %ds: %s",
+                                           self.name, _send_attempt + 1, wait, send_err)
+                            await asyncio.sleep(wait)
+                        else:
+                            raise
                message_ids.append(str(msg.message_id))
            
            return SendResult(
@@ -157,6 +157,12 @@ if _config_path.exists():
                    "base_url": "AUXILIARY_WEB_EXTRACT_BASE_URL",
                    "api_key": "AUXILIARY_WEB_EXTRACT_API_KEY",
                },
+                "approval": {
+                    "provider": "AUXILIARY_APPROVAL_PROVIDER",
+                    "model": "AUXILIARY_APPROVAL_MODEL",
+                    "base_url": "AUXILIARY_APPROVAL_BASE_URL",
+                    "api_key": "AUXILIARY_APPROVAL_API_KEY",
+                },
            }
            for _task_key, _env_map in _aux_task_env.items():
                _task_cfg = _auxiliary_cfg.get(_task_key, {})
@@ -858,12 +864,15 @@ class GatewayRunner:
            logger.warning("Process checkpoint recovery: %s", e)
        
        connected_count = 0
+        enabled_platform_count = 0
        startup_nonretryable_errors: list[str] = []
+        startup_retryable_errors: list[str] = []
        
        # Initialize and connect each configured platform
        for platform, platform_config in self.config.platforms.items():
            if not platform_config.enabled:
                continue
+            enabled_platform_count += 1
            
            adapter = self._create_adapter(platform, platform_config)
            if not adapter:
@@ -885,12 +894,22 @@ class GatewayRunner:
                    logger.info("✓ %s connected", platform.value)
                else:
                    logger.warning("✗ %s failed to connect", platform.value)
-                    if adapter.has_fatal_error and not adapter.fatal_error_retryable:
-                        startup_nonretryable_errors.append(
+                    if adapter.has_fatal_error:
+                        target = (
+                            startup_retryable_errors
+                            if adapter.fatal_error_retryable
+                            else startup_nonretryable_errors
+                        )
+                        target.append(
                            f"{platform.value}: {adapter.fatal_error_message}"
                        )
+                    else:
+                        startup_retryable_errors.append(
+                            f"{platform.value}: failed to connect"
+                        )
            except Exception as e:
                logger.error("✗ %s error: %s", platform.value, e)
+                startup_retryable_errors.append(f"{platform.value}: {e}")
        
        if connected_count == 0:
            if startup_nonretryable_errors:
@@ -903,7 +922,16 @@ class GatewayRunner:
                    pass
                self._request_clean_exit(reason)
                return True
-            logger.warning("No messaging platforms connected.")
+            if enabled_platform_count > 0:
+                reason = "; ".join(startup_retryable_errors) or "all configured messaging platforms failed to connect"
+                logger.error("Gateway failed to connect any configured messaging platform: %s", reason)
+                try:
+                    from gateway.status import write_runtime_status
+                    write_runtime_status(gateway_state="startup_failed", exit_reason=reason)
+                except Exception:
+                    pass
+                return False
+            logger.warning("No messaging platforms enabled.")
            logger.info("Gateway will continue running for cron job execution.")
        
        # Update delivery router with adapters
@@ -1257,45 +1285,47 @@ class GatewayRunner:
        # Check for commands
        command = event.get_command()
        
-        # Emit command:* hook for any recognized slash command
-        _known_commands = {"new", "reset", "help", "status", "stop", "model", "reasoning",
-                          "personality", "plan", "retry", "undo", "sethome", "set-home",
-                          "compress", "usage", "insights", "reload-mcp", "reload_mcp",
-                          "update", "title", "resume", "provider", "rollback",
-                          "background", "reasoning", "voice"}
-        if command and command in _known_commands:
+        # Emit command:* hook for any recognized slash command.
+        # GATEWAY_KNOWN_COMMANDS is derived from the central COMMAND_REGISTRY
+        # in hermes_cli/commands.py — no hardcoded set to maintain here.
+        from hermes_cli.commands import GATEWAY_KNOWN_COMMANDS, resolve_command as _resolve_cmd
+        if command and command in GATEWAY_KNOWN_COMMANDS:
            await self.hooks.emit(f"command:{command}", {
                "platform": source.platform.value if source.platform else "",
                "user_id": source.user_id,
                "command": command,
                "args": event.get_command_args().strip(),
            })
-        
-        if command in ["new", "reset"]:
+
+        # Resolve aliases to canonical name so dispatch only checks canonicals.
+        _cmd_def = _resolve_cmd(command) if command else None
+        canonical = _cmd_def.name if _cmd_def else command
+
+        if canonical == "new":
            return await self._handle_reset_command(event)
        
-        if command == "help":
+        if canonical == "help":
            return await self._handle_help_command(event)
        
-        if command == "status":
+        if canonical == "status":
            return await self._handle_status_command(event)
        
-        if command == "stop":
+        if canonical == "stop":
            return await self._handle_stop_command(event)
        
-        if command == "model":
+        if canonical == "model":
            return await self._handle_model_command(event)

-        if command == "reasoning":
+        if canonical == "reasoning":
            return await self._handle_reasoning_command(event)

-        if command == "provider":
+        if canonical == "provider":
            return await self._handle_provider_command(event)
        
-        if command == "personality":
+        if canonical == "personality":
            return await self._handle_personality_command(event)

-        if command == "plan":
+        if canonical == "plan":
            try:
                from agent.skill_commands import build_plan_path, build_skill_invocation_message

@@ -1312,51 +1342,48 @@ class GatewayRunner:
                )
                if not event.text:
                    return "Failed to load the bundled /plan skill."
-                command = None
+                canonical = None
            except Exception as e:
                logger.exception("Failed to prepare /plan command")
                return f"Failed to enter plan mode: {e}"
        
-        if command == "retry":
+        if canonical == "retry":
            return await self._handle_retry_command(event)
        
-        if command == "undo":
+        if canonical == "undo":
            return await self._handle_undo_command(event)
        
-        if command in ["sethome", "set-home"]:
+        if canonical == "sethome":
            return await self._handle_set_home_command(event)

-        if command == "compress":
+        if canonical == "compress":
            return await self._handle_compress_command(event)

-        if command == "usage":
+        if canonical == "usage":
            return await self._handle_usage_command(event)

-        if command == "insights":
+        if canonical == "insights":
            return await self._handle_insights_command(event)

-        if command in ("reload-mcp", "reload_mcp"):
+        if canonical == "reload-mcp":
            return await self._handle_reload_mcp_command(event)

-        if command == "update":
+        if canonical == "update":
            return await self._handle_update_command(event)

-        if command == "title":
+        if canonical == "title":
            return await self._handle_title_command(event)

-        if command == "resume":
+        if canonical == "resume":
            return await self._handle_resume_command(event)

-        if command == "rollback":
+        if canonical == "rollback":
            return await self._handle_rollback_command(event)

-        if command == "background":
+        if canonical == "background":
            return await self._handle_background_command(event)

-        if command == "reasoning":
-            return await self._handle_reasoning_command(event)
-
-        if command == "voice":
+        if canonical == "voice":
            return await self._handle_voice_command(event)

        # User-defined quick commands (bypass agent loop, no LLM call)
@@ -1457,8 +1484,17 @@ class GatewayRunner:
        # Set environment variables for tools
        self._set_session_env(context)
        
+        # Read privacy.redact_pii from config (re-read per message)
+        _redact_pii = False
+        try:
+            with open(_config_path, encoding="utf-8") as _pf:
+                _pcfg = yaml.safe_load(_pf) or {}
+            _redact_pii = bool((_pcfg.get("privacy") or {}).get("redact_pii", False))
+        except Exception:
+            pass
+
        # Build the context prompt to inject
-        context_prompt = build_session_context_prompt(context)
+        context_prompt = build_session_context_prompt(context, redact_pii=_redact_pii)
        
        # If the previous session expired and was auto-reset, prepend a notice
        # so the agent knows this is a fresh conversation (not an intentional /reset).
@@ -1827,9 +1863,37 @@ class GatewayRunner:
                session_key=session_key
            )
            
-            response = agent_result.get("final_response", "")
+            response = agent_result.get("final_response") or ""
            agent_messages = agent_result.get("messages", [])

+            # Surface error details when the agent failed silently (final_response=None)
+            if not response and agent_result.get("failed"):
+                error_detail = agent_result.get("error", "unknown error")
+                error_str = str(error_detail).lower()
+
+                # Detect context-overflow failures and give specific guidance.
+                # Generic 400 "Error" from Anthropic with large sessions is the
+                # most common cause of this (#1630).
+                _is_ctx_fail = any(p in error_str for p in (
+                    "context", "token", "too large", "too long",
+                    "exceed", "payload",
+                )) or (
+                    "400" in error_str
+                    and len(history) > 50
+                )
+
+                if _is_ctx_fail:
+                    response = (
+                        "⚠️ Session too large for the model's context window.\n"
+                        "Use /compact to compress the conversation, or "
+                        "/reset to start fresh."
+                    )
+                else:
+                    response = (
+                        f"The request failed: {str(error_detail)[:300]}\n"
+                        "Try again or use /reset to start a fresh session."
+                    )
+
            # If the agent's session_id changed during compression, update
            # session_entry so transcript writes below go to the right session.
            if agent_result.get("session_id") and agent_result["session_id"] != session_entry.session_id:
@@ -1876,12 +1940,30 @@ class GatewayRunner:
            # This preserves the complete agent loop (tool_calls, tool results,
            # intermediate reasoning) so sessions can be resumed with full context
            # and transcripts are useful for debugging and training data.
+            #
+            # IMPORTANT: When the agent failed before producing any response
+            # (e.g. context-overflow 400), do NOT persist the user's message.
+            # Persisting it would make the session even larger, causing the
+            # same failure on the next attempt — an infinite loop. (#1630)
+            agent_failed_early = (
+                agent_result.get("failed")
+                and not agent_result.get("final_response")
+            )
+            if agent_failed_early:
+                logger.info(
+                    "Skipping transcript persistence for failed request in "
+                    "session %s to prevent session growth loop.",
+                    session_entry.session_id,
+                )
+
            ts = datetime.now().isoformat()
            
            # If this is a fresh session (no history), write the full tool
            # definitions as the first entry so the transcript is self-describing
            # -- the same list of dicts sent as tools=[...] in the API request.
-            if not history:
+            if agent_failed_early:
+                pass  # Skip all transcript writes — don't grow a broken session
+            elif not history:
                tool_defs = agent_result.get("tools", [])
                self.session_store.append_to_transcript(
                    session_entry.session_id,
@@ -1898,36 +1980,37 @@ class GatewayRunner:
            # Use the filtered history length (history_offset) that was actually
            # passed to the agent, not len(history) which includes session_meta
            # entries that were stripped before the agent saw them.
-            history_len = agent_result.get("history_offset", len(history))
-            new_messages = agent_messages[history_len:] if len(agent_messages) > history_len else []
-            
-            # If no new messages found (edge case), fall back to simple user/assistant
-            if not new_messages:
-                self.session_store.append_to_transcript(
-                    session_entry.session_id,
-                    {"role": "user", "content": message_text, "timestamp": ts}
-                )
-                if response:
+            if not agent_failed_early:
+                history_len = agent_result.get("history_offset", len(history))
+                new_messages = agent_messages[history_len:] if len(agent_messages) > history_len else []
+                
+                # If no new messages found (edge case), fall back to simple user/assistant
+                if not new_messages:
                    self.session_store.append_to_transcript(
                        session_entry.session_id,
-                        {"role": "assistant", "content": response, "timestamp": ts}
-                    )
-            else:
-                # The agent already persisted these messages to SQLite via
-                # _flush_messages_to_session_db(), so skip the DB write here
-                # to prevent the duplicate-write bug (#860).  We still write
-                # to JSONL for backward compatibility and as a backup.
-                agent_persisted = self._session_db is not None
-                for msg in new_messages:
-                    # Skip system messages (they're rebuilt each run)
-                    if msg.get("role") == "system":
-                        continue
-                    # Add timestamp to each message for debugging
-                    entry = {**msg, "timestamp": ts}
-                    self.session_store.append_to_transcript(
-                        session_entry.session_id, entry,
-                        skip_db=agent_persisted,
+                        {"role": "user", "content": message_text, "timestamp": ts}
                    )
+                    if response:
+                        self.session_store.append_to_transcript(
+                            session_entry.session_id,
+                            {"role": "assistant", "content": response, "timestamp": ts}
+                        )
+                else:
+                    # The agent already persisted these messages to SQLite via
+                    # _flush_messages_to_session_db(), so skip the DB write here
+                    # to prevent the duplicate-write bug (#860).  We still write
+                    # to JSONL for backward compatibility and as a backup.
+                    agent_persisted = self._session_db is not None
+                    for msg in new_messages:
+                        # Skip system messages (they're rebuilt each run)
+                        if msg.get("role") == "system":
+                            continue
+                        # Add timestamp to each message for debugging
+                        entry = {**msg, "timestamp": ts}
+                        self.session_store.append_to_transcript(
+                            session_entry.session_id, entry,
+                            skip_db=agent_persisted,
+                        )
            
            # Update session with actual prompt token count and model from the agent
            self.session_store.update_session(
@@ -1942,13 +2025,41 @@ class GatewayRunner:
            if self._should_send_voice_reply(event, response, agent_messages):
                await self._send_voice_reply(event, response)

+            # If streaming already delivered the response, return None so
+            # _process_message_background doesn't send it again.
+            if agent_result.get("already_sent"):
+                return None
+
            return response
            
        except Exception as e:
            logger.exception("Agent error in session %s", session_key)
+            error_type = type(e).__name__
+            error_detail = str(e)[:300] if str(e) else "no details available"
+            status_hint = ""
+            status_code = getattr(e, "status_code", None)
+            if status_code == 401:
+                status_hint = " Check your API key or run `claude /login` to refresh OAuth credentials."
+            elif status_code == 429:
+                status_hint = " You are being rate-limited. Please wait a moment and try again."
+            elif status_code == 529:
+                status_hint = " The API is temporarily overloaded. Please try again shortly."
+            elif status_code == 400:
+                # 400 with a large session is almost always a context overflow.
+                # Give specific guidance instead of a generic error. (#1630)
+                _hist_len = len(history) if 'history' in locals() else 0
+                if _hist_len > 50:
+                    return (
+                        "⚠️ Session too large for the model's context window.\n"
+                        "Use /compact to compress the conversation, or "
+                        "/reset to start fresh."
+                    )
+                else:
+                    status_hint = " The request was rejected by the API."
            return (
-                "Sorry, I encountered an unexpected error. "
-                "The details have been logged for debugging. "
+                f"Sorry, I encountered an error ({error_type}).\n"
+                f"{error_detail}\n"
+                f"{status_hint}"
                "Try again or use /reset to start a fresh session."
            )
        finally:
@@ -2032,30 +2143,10 @@ class GatewayRunner:
    
    async def _handle_help_command(self, event: MessageEvent) -> str:
        """Handle /help command - list available commands."""
+        from hermes_cli.commands import gateway_help_lines
        lines = [
            "📖 **Hermes Commands**\n",
-            "`/new` — Start a new conversation",
-            "`/reset` — Reset conversation history",
-            "`/status` — Show session info",
-            "`/stop` — Interrupt the running agent",
-            "`/model [provider:model]` — Show/change model (or switch provider)",
-            "`/provider` — Show available providers and auth status",
-            "`/personality [name]` — Set a personality",
-            "`/retry` — Retry your last message",
-            "`/undo` — Remove the last exchange",
-            "`/sethome` — Set this chat as the home channel",
-            "`/compress` — Compress conversation context",
-            "`/title [name]` — Set or show the session title",
-            "`/resume [name]` — Resume a previously-named session",
-            "`/usage` — Show token usage for this session",
-            "`/insights [days]` — Show usage insights and analytics",
-            "`/reasoning [level|show|hide]` — Set reasoning effort or toggle display",
-            "`/rollback [number]` — List or restore filesystem checkpoints",
-            "`/background <prompt>` — Run a prompt in a separate background session",
-            "`/voice [on|off|tts|status]` — Toggle voice reply mode",
-            "`/reload-mcp` — Reload MCP servers from config",
-            "`/update` — Update Hermes Agent to the latest version",
-            "`/help` — Show this message",
+            *gateway_help_lines(),
        ]
        try:
            from agent.skill_commands import get_skill_commands
@@ -4098,6 +4189,7 @@ class GatewayRunner:
        agent_holder = [None]  # Mutable container for the agent instance
        result_holder = [None]  # Mutable container for the result
        tools_holder = [None]   # Mutable container for the tool definitions
+        stream_consumer_holder = [None]  # Mutable container for stream consumer
        
        # Bridge sync step_callback → async hooks.emit for agent:step events
        _loop_for_step = asyncio.get_event_loop()
@@ -4160,6 +4252,35 @@ class GatewayRunner:
            honcho_manager, honcho_config = self._get_or_create_gateway_honcho(session_key)
            reasoning_config = self._load_reasoning_config()
            self._reasoning_config = reasoning_config
+            # Set up streaming consumer if enabled
+            _stream_consumer = None
+            _stream_delta_cb = None
+            _scfg = getattr(getattr(self, 'config', None), 'streaming', None)
+            if _scfg is None:
+                from gateway.config import StreamingConfig
+                _scfg = StreamingConfig()
+
+            if _scfg.enabled and _scfg.transport != "off":
+                try:
+                    from gateway.stream_consumer import GatewayStreamConsumer, StreamConsumerConfig
+                    _adapter = self.adapters.get(source.platform)
+                    if _adapter:
+                        _consumer_cfg = StreamConsumerConfig(
+                            edit_interval=_scfg.edit_interval,
+                            buffer_threshold=_scfg.buffer_threshold,
+                            cursor=_scfg.cursor,
+                        )
+                        _stream_consumer = GatewayStreamConsumer(
+                            adapter=_adapter,
+                            chat_id=source.chat_id,
+                            config=_consumer_cfg,
+                            metadata={"thread_id": source.thread_id} if source.thread_id else None,
+                        )
+                        _stream_delta_cb = _stream_consumer.on_delta
+                        stream_consumer_holder[0] = _stream_consumer
+                except Exception as _sc_err:
+                    logger.debug("Could not set up stream consumer: %s", _sc_err)
+
            turn_route = self._resolve_turn_agent_config(message, model, runtime_kwargs)
            agent = AIAgent(
                model=turn_route["model"],
@@ -4180,6 +4301,7 @@ class GatewayRunner:
                session_id=session_id,
                tool_progress_callback=progress_callback if tool_progress_enabled else None,
                step_callback=_step_callback_sync if _hooks_ref.loaded_hooks else None,
+                stream_delta_callback=_stream_delta_cb,
                platform=platform_key,
                honcho_session_key=session_key,
                honcho_manager=honcho_manager,
@@ -4250,6 +4372,10 @@ class GatewayRunner:
            
            result = agent.run_conversation(message, conversation_history=agent_history, task_id=session_id)
            result_holder[0] = result
+
+            # Signal the stream consumer that the agent is done
+            if _stream_consumer is not None:
+                _stream_consumer.finish()
            
            # Return final response, or a message if something went wrong
            final_response = result.get("final_response")
@@ -4349,6 +4475,20 @@ class GatewayRunner:
        progress_task = None
        if tool_progress_enabled:
            progress_task = asyncio.create_task(send_progress_messages())
+
+        # Start stream consumer task — polls for consumer creation since it
+        # happens inside run_sync (thread pool) after the agent is constructed.
+        stream_task = None
+
+        async def _start_stream_consumer():
+            """Wait for the stream consumer to be created, then run it."""
+            for _ in range(200):  # Up to 10s wait
+                if stream_consumer_holder[0] is not None:
+                    await stream_consumer_holder[0].run()
+                    return
+                await asyncio.sleep(0.05)
+
+        stream_task = asyncio.create_task(_start_stream_consumer())
        
        # Track this agent as running for this session (for interrupt support)
        # We do this in a callback after the agent is created
@@ -4431,6 +4571,17 @@ class GatewayRunner:
            if progress_task:
                progress_task.cancel()
            interrupt_monitor.cancel()
+
+            # Wait for stream consumer to finish its final edit
+            if stream_task:
+                try:
+                    await asyncio.wait_for(stream_task, timeout=5.0)
+                except (asyncio.TimeoutError, asyncio.CancelledError):
+                    stream_task.cancel()
+                    try:
+                        await stream_task
+                    except asyncio.CancelledError:
+                        pass
            
            # Clean up tracking
            tracking_task.cancel()
@@ -4444,6 +4595,12 @@ class GatewayRunner:
                        await task
                    except asyncio.CancelledError:
                        pass
+
+        # If streaming already delivered the response, mark it so the
+        # caller's send() is skipped (avoiding duplicate messages).
+        _sc = stream_consumer_holder[0]
+        if _sc and _sc.already_sent and isinstance(response, dict):
+            response["already_sent"] = True
        
        return response

@@ -8,9 +8,11 @@ Handles:
 - Dynamic system prompt injection (agent knows its context)
 """

+import hashlib
 import logging
 import os
 import json
+import re
 import uuid
 from pathlib import Path
 from datetime import datetime, timedelta
@@ -19,6 +21,41 @@ from typing import Dict, List, Optional, Any

 logger = logging.getLogger(__name__)

+
+# ---------------------------------------------------------------------------
+# PII redaction helpers
+# ---------------------------------------------------------------------------
+
+_PHONE_RE = re.compile(r"^\+?\d[\d\-\s]{6,}$")
+
+
+def _hash_id(value: str) -> str:
+    """Deterministic 12-char hex hash of an identifier."""
+    return hashlib.sha256(value.encode("utf-8")).hexdigest()[:12]
+
+
+def _hash_sender_id(value: str) -> str:
+    """Hash a sender ID to ``user_<12hex>``."""
+    return f"user_{_hash_id(value)}"
+
+
+def _hash_chat_id(value: str) -> str:
+    """Hash the numeric portion of a chat ID, preserving platform prefix.
+
+    ``telegram:12345`` → ``telegram:<hash>``
+    ``12345``          → ``<hash>``
+    """
+    colon = value.find(":")
+    if colon > 0:
+        prefix = value[:colon]
+        return f"{prefix}:{_hash_id(value[colon + 1:])}"
+    return _hash_id(value)
+
+
+def _looks_like_phone(value: str) -> bool:
+    """Return True if *value* looks like a phone number (E.164 or similar)."""
+    return bool(_PHONE_RE.match(value.strip()))
+
 from .config import (
    Platform,
    GatewayConfig,
@@ -146,7 +183,21 @@ class SessionContext:
        }


-def build_session_context_prompt(context: SessionContext) -> str:
+_PII_SAFE_PLATFORMS = frozenset({
+    Platform.WHATSAPP,
+    Platform.SIGNAL,
+    Platform.TELEGRAM,
+})
+"""Platforms where user IDs can be safely redacted (no in-message mention system
+that requires raw IDs).  Discord is excluded because mentions use ``<@user_id>``
+and the LLM needs the real ID to tag users."""
+
+
+def build_session_context_prompt(
+    context: SessionContext,
+    *,
+    redact_pii: bool = False,
+) -> str:
    """
    Build the dynamic system prompt section that tells the agent about its context.
    
@@ -154,7 +205,15 @@ def build_session_context_prompt(context: SessionContext) -> str:
    - Where messages are coming from
    - What platforms are connected
    - Where it can deliver scheduled task outputs
+
+    When *redact_pii* is True **and** the source platform is in
+    ``_PII_SAFE_PLATFORMS``, phone numbers are stripped and user/chat IDs
+    are replaced with deterministic hashes before being sent to the LLM.
+    Platforms like Discord are excluded because mentions need real IDs.
+    Routing still uses the original values (they stay in SessionSource).
    """
+    # Only apply redaction on platforms where IDs aren't needed for mentions
+    redact_pii = redact_pii and context.source.platform in _PII_SAFE_PLATFORMS
    lines = [
        "## Current Session Context",
        "",
@@ -165,7 +224,25 @@ def build_session_context_prompt(context: SessionContext) -> str:
    if context.source.platform == Platform.LOCAL:
        lines.append(f"**Source:** {platform_name} (the machine running this agent)")
    else:
-        lines.append(f"**Source:** {platform_name} ({context.source.description})")
+        # Build a description that respects PII redaction
+        src = context.source
+        if redact_pii:
+            # Build a safe description without raw IDs
+            _uname = src.user_name or (
+                _hash_sender_id(src.user_id) if src.user_id else "user"
+            )
+            _cname = src.chat_name or _hash_chat_id(src.chat_id)
+            if src.chat_type == "dm":
+                desc = f"DM with {_uname}"
+            elif src.chat_type == "group":
+                desc = f"group: {_cname}"
+            elif src.chat_type == "channel":
+                desc = f"channel: {_cname}"
+            else:
+                desc = _cname
+        else:
+            desc = src.description
+        lines.append(f"**Source:** {platform_name} ({desc})")
    
    # Channel topic (if available - provides context about the channel's purpose)
    if context.source.chat_topic:
@@ -175,7 +252,10 @@ def build_session_context_prompt(context: SessionContext) -> str:
    if context.source.user_name:
        lines.append(f"**User:** {context.source.user_name}")
    elif context.source.user_id:
-        lines.append(f"**User ID:** {context.source.user_id}")
+        uid = context.source.user_id
+        if redact_pii:
+            uid = _hash_sender_id(uid)
+        lines.append(f"**User ID:** {uid}")
    
    # Platform-specific behavioral notes
    if context.source.platform == Platform.SLACK:
@@ -210,7 +290,8 @@ def build_session_context_prompt(context: SessionContext) -> str:
        lines.append("")
        lines.append("**Home Channels (default destinations):**")
        for platform, home in context.home_channels.items():
-            lines.append(f"  - {platform.value}: {home.name} (ID: {home.chat_id})")
+            hc_id = _hash_chat_id(home.chat_id) if redact_pii else home.chat_id
+            lines.append(f"  - {platform.value}: {home.name} (ID: {hc_id})")
    
    # Delivery options for scheduled tasks
    lines.append("")
@@ -220,7 +301,10 @@ def build_session_context_prompt(context: SessionContext) -> str:
    if context.source.platform == Platform.LOCAL:
        lines.append("- `\"origin\"` → Local output (saved to files)")
    else:
-        lines.append(f"- `\"origin\"` → Back to this chat ({context.source.chat_name or context.source.chat_id})")
+        _origin_label = context.source.chat_name or (
+            _hash_chat_id(context.source.chat_id) if redact_pii else context.source.chat_id
+        )
+        lines.append(f"- `\"origin\"` → Back to this chat ({_origin_label})")
    
    # Local always available
    lines.append("- `\"local\"` → Save to local files only (~/.hermes/cron/output/)")
@@ -195,8 +195,8 @@ def write_runtime_status(
    payload = _read_json_file(path) or _build_runtime_status_record()
    payload.setdefault("platforms", {})
    payload.setdefault("kind", _GATEWAY_KIND)
-    payload.setdefault("pid", os.getpid())
-    payload.setdefault("start_time", _get_process_start_time(os.getpid()))
+    payload["pid"] = os.getpid()
+    payload["start_time"] = _get_process_start_time(os.getpid())
    payload["updated_at"] = _utc_now_iso()

    if gateway_state is not None:
@@ -0,0 +1,177 @@
+"""Gateway streaming consumer — bridges sync agent callbacks to async platform delivery.
+
+The agent fires stream_delta_callback(text) synchronously from its worker thread.
+GatewayStreamConsumer:
+  1. Receives deltas via on_delta() (thread-safe, sync)
+  2. Queues them to an asyncio task via queue.Queue
+  3. The async run() task buffers, rate-limits, and progressively edits
+     a single message on the target platform
+
+Design: Uses the edit transport (send initial message, then editMessageText).
+This is universally supported across Telegram, Discord, and Slack.
+
+Credit: jobless0x (#774, #1312), OutThisLife (#798), clicksingh (#697).
+"""
+
+from __future__ import annotations
+
+import asyncio
+import logging
+import queue
+import time
+from dataclasses import dataclass
+from typing import Any, Optional
+
+logger = logging.getLogger("gateway.stream_consumer")
+
+# Sentinel to signal the stream is complete
+_DONE = object()
+
+
+@dataclass
+class StreamConsumerConfig:
+    """Runtime config for a single stream consumer instance."""
+    edit_interval: float = 0.3
+    buffer_threshold: int = 40
+    cursor: str = " ▉"
+
+
+class GatewayStreamConsumer:
+    """Async consumer that progressively edits a platform message with streamed tokens.
+
+    Usage::
+
+        consumer = GatewayStreamConsumer(adapter, chat_id, config, metadata=metadata)
+        # Pass consumer.on_delta as stream_delta_callback to AIAgent
+        agent = AIAgent(..., stream_delta_callback=consumer.on_delta)
+        # Start the consumer as an asyncio task
+        task = asyncio.create_task(consumer.run())
+        # ... run agent in thread pool ...
+        consumer.finish()  # signal completion
+        await task         # wait for final edit
+    """
+
+    def __init__(
+        self,
+        adapter: Any,
+        chat_id: str,
+        config: Optional[StreamConsumerConfig] = None,
+        metadata: Optional[dict] = None,
+    ):
+        self.adapter = adapter
+        self.chat_id = chat_id
+        self.cfg = config or StreamConsumerConfig()
+        self.metadata = metadata
+        self._queue: queue.Queue = queue.Queue()
+        self._accumulated = ""
+        self._message_id: Optional[str] = None
+        self._already_sent = False
+        self._edit_supported = True  # Disabled on first edit failure (Signal/Email/HA)
+        self._last_edit_time = 0.0
+
+    @property
+    def already_sent(self) -> bool:
+        """True if at least one message was sent/edited — signals the base
+        adapter to skip re-sending the final response."""
+        return self._already_sent
+
+    def on_delta(self, text: str) -> None:
+        """Thread-safe callback — called from the agent's worker thread."""
+        if text:
+            self._queue.put(text)
+
+    def finish(self) -> None:
+        """Signal that the stream is complete."""
+        self._queue.put(_DONE)
+
+    async def run(self) -> None:
+        """Async task that drains the queue and edits the platform message."""
+        try:
+            while True:
+                # Drain all available items from the queue
+                got_done = False
+                while True:
+                    try:
+                        item = self._queue.get_nowait()
+                        if item is _DONE:
+                            got_done = True
+                            break
+                        self._accumulated += item
+                    except queue.Empty:
+                        break
+
+                # Decide whether to flush an edit
+                now = time.monotonic()
+                elapsed = now - self._last_edit_time
+                should_edit = (
+                    got_done
+                    or (elapsed >= self.cfg.edit_interval
+                        and len(self._accumulated) > 0)
+                    or len(self._accumulated) >= self.cfg.buffer_threshold
+                )
+
+                if should_edit and self._accumulated:
+                    display_text = self._accumulated
+                    if not got_done:
+                        display_text += self.cfg.cursor
+
+                    await self._send_or_edit(display_text)
+                    self._last_edit_time = time.monotonic()
+
+                if got_done:
+                    # Final edit without cursor
+                    if self._accumulated and self._message_id:
+                        await self._send_or_edit(self._accumulated)
+                    return
+
+                await asyncio.sleep(0.05)  # Small yield to not busy-loop
+
+        except asyncio.CancelledError:
+            # Best-effort final edit on cancellation
+            if self._accumulated and self._message_id:
+                try:
+                    await self._send_or_edit(self._accumulated)
+                except Exception:
+                    pass
+        except Exception as e:
+            logger.error("Stream consumer error: %s", e)
+
+    async def _send_or_edit(self, text: str) -> None:
+        """Send or edit the streaming message."""
+        try:
+            if self._message_id is not None:
+                if self._edit_supported:
+                    # Edit existing message
+                    result = await self.adapter.edit_message(
+                        chat_id=self.chat_id,
+                        message_id=self._message_id,
+                        content=text,
+                    )
+                    if result.success:
+                        self._already_sent = True
+                    else:
+                        # Edit not supported by this adapter — stop streaming,
+                        # let the normal send path handle the final response.
+                        # Without this guard, adapters like Signal/Email would
+                        # flood the chat with a new message every edit_interval.
+                        logger.debug("Edit failed, disabling streaming for this adapter")
+                        self._edit_supported = False
+                else:
+                    # Editing not supported — skip intermediate updates.
+                    # The final response will be sent by the normal path.
+                    pass
+            else:
+                # First message — send new
+                result = await self.adapter.send(
+                    chat_id=self.chat_id,
+                    content=text,
+                    metadata=self.metadata,
+                )
+                if result.success and result.message_id:
+                    self._message_id = result.message_id
+                    self._already_sent = True
+                else:
+                    # Initial send failed — disable streaming for this session
+                    self._edit_supported = False
+        except Exception as e:
+            logger.error("Stream send/edit error: %s", e)
@@ -11,5 +11,5 @@ Provides subcommands for:
 - hermes cron          - Manage cron jobs
 """

-__version__ = "0.2.0"
-__release_date__ = "2026.3.12"
+__version__ = "0.3.0"
+__release_date__ = "2026.3.17"
@@ -155,6 +155,14 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
        api_key_env_vars=("DEEPSEEK_API_KEY",),
        base_url_env_var="DEEPSEEK_BASE_URL",
    ),
+    "ai-gateway": ProviderConfig(
+        id="ai-gateway",
+        name="AI Gateway",
+        auth_type="api_key",
+        inference_base_url="https://ai-gateway.vercel.sh/v1",
+        api_key_env_vars=("AI_GATEWAY_API_KEY",),
+        base_url_env_var="AI_GATEWAY_BASE_URL",
+    ),
 }


@@ -532,6 +540,7 @@ def resolve_provider(
        "kimi": "kimi-coding", "moonshot": "kimi-coding",
        "minimax-china": "minimax-cn", "minimax_cn": "minimax-cn",
        "claude": "anthropic", "claude-code": "anthropic",
+        "aigateway": "ai-gateway", "vercel": "ai-gateway", "vercel-ai-gateway": "ai-gateway",
    }
    normalized = _PROVIDER_ALIASES.get(normalized, normalized)

@@ -1,77 +1,296 @@
 """Slash command definitions and autocomplete for the Hermes CLI.

-Contains the shared built-in ``COMMANDS`` dict and ``SlashCommandCompleter``.
-The completer can optionally include dynamic skill slash commands supplied by the
-interactive CLI.
+Central registry for all slash commands. Every consumer -- CLI help, gateway
+dispatch, Telegram BotCommands, Slack subcommand mapping, autocomplete --
+derives its data from ``COMMAND_REGISTRY``.
+
+To add a command: add a ``CommandDef`` entry to ``COMMAND_REGISTRY``.
+To add an alias: set ``aliases=("short",)`` on the existing ``CommandDef``.
 """

 from __future__ import annotations

+import os
+import re
 from collections.abc import Callable, Mapping
+from dataclasses import dataclass, field
+from pathlib import Path
 from typing import Any

+from prompt_toolkit.auto_suggest import AutoSuggest, Suggestion
 from prompt_toolkit.completion import Completer, Completion


-# Commands organized by category for better help display
-COMMANDS_BY_CATEGORY = {
-    "Session": {
-        "/new": "Start a new session (fresh session ID + history)",
-        "/reset": "Start a new session (alias for /new)",
-        "/clear": "Clear screen and start a new session",
-        "/history": "Show conversation history",
-        "/save": "Save the current conversation",
-        "/retry": "Retry the last message (resend to agent)",
-        "/undo": "Remove the last user/assistant exchange",
-        "/title": "Set a title for the current session (usage: /title My Session Name)",
-        "/compress": "Manually compress conversation context (flush memories + summarize)",
-        "/rollback": "List or restore filesystem checkpoints (usage: /rollback [number])",
-        "/background": "Run a prompt in the background (usage: /background <prompt>)",
-    },
-    "Configuration": {
-        "/config": "Show current configuration",
-        "/model": "Show or change the current model",
-        "/provider": "Show available providers and current provider",
-        "/prompt": "View/set custom system prompt",
-        "/personality": "Set a predefined personality",
-        "/verbose": "Cycle tool progress display: off → new → all → verbose",
-        "/reasoning": "Manage reasoning effort and display (usage: /reasoning [level|show|hide])",
-        "/skin": "Show or change the display skin/theme",
-        "/voice": "Toggle voice mode (Ctrl+B to record). Usage: /voice [on|off|tts|status]",
-    },
-    "Tools & Skills": {
-        "/tools": "List available tools",
-        "/toolsets": "List available toolsets",
-        "/skills": "Search, install, inspect, or manage skills from online registries",
-        "/cron": "Manage scheduled tasks (list, add/create, edit, pause, resume, run, remove)",
-        "/reload-mcp": "Reload MCP servers from config.yaml",
-    },
-    "Info": {
-        "/help": "Show this help message",
-        "/usage": "Show token usage for the current session",
-        "/insights": "Show usage insights and analytics (last 30 days)",
-        "/platforms": "Show gateway/messaging platform status",
-        "/paste": "Check clipboard for an image and attach it",
-    },
-    "Exit": {
-        "/quit": "Exit the CLI (also: /exit, /q)",
-    },
-}
+# ---------------------------------------------------------------------------
+# CommandDef dataclass
+# ---------------------------------------------------------------------------

-# Flat dict for backwards compatibility and autocomplete
-COMMANDS = {}
-for category_commands in COMMANDS_BY_CATEGORY.values():
-    COMMANDS.update(category_commands)
+@dataclass(frozen=True)
+class CommandDef:
+    """Definition of a single slash command."""

+    name: str                          # canonical name without slash: "background"
+    description: str                   # human-readable description
+    category: str                      # "Session", "Configuration", etc.
+    aliases: tuple[str, ...] = ()      # alternative names: ("bg",)
+    args_hint: str = ""                # argument placeholder: "<prompt>", "[name]"
+    subcommands: tuple[str, ...] = ()  # tab-completable subcommands
+    cli_only: bool = False             # only available in CLI
+    gateway_only: bool = False         # only available in gateway/messaging
+
+
+# ---------------------------------------------------------------------------
+# Central registry -- single source of truth
+# ---------------------------------------------------------------------------
+
+COMMAND_REGISTRY: list[CommandDef] = [
+    # Session
+    CommandDef("new", "Start a new session (fresh session ID + history)", "Session",
+               aliases=("reset",)),
+    CommandDef("clear", "Clear screen and start a new session", "Session",
+               cli_only=True),
+    CommandDef("history", "Show conversation history", "Session",
+               cli_only=True),
+    CommandDef("save", "Save the current conversation", "Session",
+               cli_only=True),
+    CommandDef("retry", "Retry the last message (resend to agent)", "Session"),
+    CommandDef("undo", "Remove the last user/assistant exchange", "Session"),
+    CommandDef("title", "Set a title for the current session", "Session",
+               args_hint="[name]"),
+    CommandDef("compress", "Manually compress conversation context", "Session"),
+    CommandDef("rollback", "List or restore filesystem checkpoints", "Session",
+               args_hint="[number]"),
+    CommandDef("stop", "Kill all running background processes", "Session"),
+    CommandDef("background", "Run a prompt in the background", "Session",
+               aliases=("bg",), args_hint="<prompt>"),
+    CommandDef("status", "Show session info", "Session",
+               gateway_only=True),
+    CommandDef("sethome", "Set this chat as the home channel", "Session",
+               gateway_only=True, aliases=("set-home",)),
+    CommandDef("resume", "Resume a previously-named session", "Session",
+               args_hint="[name]"),
+
+    # Configuration
+    CommandDef("config", "Show current configuration", "Configuration",
+               cli_only=True),
+    CommandDef("model", "Show or change the current model", "Configuration",
+               args_hint="[name]"),
+    CommandDef("provider", "Show available providers and current provider",
+               "Configuration"),
+    CommandDef("prompt", "View/set custom system prompt", "Configuration",
+               cli_only=True, args_hint="[text]", subcommands=("clear",)),
+    CommandDef("personality", "Set a predefined personality", "Configuration",
+               args_hint="[name]"),
+    CommandDef("verbose", "Cycle tool progress display: off -> new -> all -> verbose",
+               "Configuration", cli_only=True),
+    CommandDef("reasoning", "Manage reasoning effort and display", "Configuration",
+               args_hint="[level|show|hide]",
+               subcommands=("none", "low", "minimal", "medium", "high", "xhigh", "show", "hide", "on", "off")),
+    CommandDef("skin", "Show or change the display skin/theme", "Configuration",
+               cli_only=True, args_hint="[name]"),
+    CommandDef("voice", "Toggle voice mode", "Configuration",
+               args_hint="[on|off|tts|status]", subcommands=("on", "off", "tts", "status")),
+
+    # Tools & Skills
+    CommandDef("tools", "List available tools", "Tools & Skills",
+               cli_only=True),
+    CommandDef("toolsets", "List available toolsets", "Tools & Skills",
+               cli_only=True),
+    CommandDef("skills", "Search, install, inspect, or manage skills",
+               "Tools & Skills", cli_only=True,
+               subcommands=("search", "browse", "inspect", "install")),
+    CommandDef("cron", "Manage scheduled tasks", "Tools & Skills",
+               cli_only=True, args_hint="[subcommand]",
+               subcommands=("list", "add", "create", "edit", "pause", "resume", "run", "remove")),
+    CommandDef("reload-mcp", "Reload MCP servers from config", "Tools & Skills",
+               aliases=("reload_mcp",)),
+    CommandDef("plugins", "List installed plugins and their status",
+               "Tools & Skills", cli_only=True),
+
+    # Info
+    CommandDef("help", "Show available commands", "Info"),
+    CommandDef("usage", "Show token usage for the current session", "Info"),
+    CommandDef("insights", "Show usage insights and analytics", "Info",
+               args_hint="[days]"),
+    CommandDef("platforms", "Show gateway/messaging platform status", "Info",
+               cli_only=True, aliases=("gateway",)),
+    CommandDef("paste", "Check clipboard for an image and attach it", "Info",
+               cli_only=True),
+    CommandDef("update", "Update Hermes Agent to the latest version", "Info",
+               gateway_only=True),
+
+    # Exit
+    CommandDef("quit", "Exit the CLI", "Exit",
+               cli_only=True, aliases=("exit", "q")),
+]
+
+
+# ---------------------------------------------------------------------------
+# Derived lookups -- rebuilt once at import time
+# ---------------------------------------------------------------------------
+
+def _build_command_lookup() -> dict[str, CommandDef]:
+    """Map every name and alias to its CommandDef."""
+    lookup: dict[str, CommandDef] = {}
+    for cmd in COMMAND_REGISTRY:
+        lookup[cmd.name] = cmd
+        for alias in cmd.aliases:
+            lookup[alias] = cmd
+    return lookup
+
+
+_COMMAND_LOOKUP: dict[str, CommandDef] = _build_command_lookup()
+
+
+def resolve_command(name: str) -> CommandDef | None:
+    """Resolve a command name or alias to its CommandDef.
+
+    Accepts names with or without the leading slash.
+    """
+    return _COMMAND_LOOKUP.get(name.lower().lstrip("/"))
+
+
+def _build_description(cmd: CommandDef) -> str:
+    """Build a CLI-facing description string including usage hint."""
+    if cmd.args_hint:
+        return f"{cmd.description} (usage: /{cmd.name} {cmd.args_hint})"
+    return cmd.description
+
+
+# Backwards-compatible flat dict: "/command" -> description
+COMMANDS: dict[str, str] = {}
+for _cmd in COMMAND_REGISTRY:
+    if not _cmd.gateway_only:
+        COMMANDS[f"/{_cmd.name}"] = _build_description(_cmd)
+        for _alias in _cmd.aliases:
+            COMMANDS[f"/{_alias}"] = f"{_cmd.description} (alias for /{_cmd.name})"
+
+# Backwards-compatible categorized dict
+COMMANDS_BY_CATEGORY: dict[str, dict[str, str]] = {}
+for _cmd in COMMAND_REGISTRY:
+    if not _cmd.gateway_only:
+        _cat = COMMANDS_BY_CATEGORY.setdefault(_cmd.category, {})
+        _cat[f"/{_cmd.name}"] = COMMANDS[f"/{_cmd.name}"]
+        for _alias in _cmd.aliases:
+            _cat[f"/{_alias}"] = COMMANDS[f"/{_alias}"]
+
+
+# Subcommands lookup: "/cmd" -> ["sub1", "sub2", ...]
+SUBCOMMANDS: dict[str, list[str]] = {}
+for _cmd in COMMAND_REGISTRY:
+    if _cmd.subcommands:
+        SUBCOMMANDS[f"/{_cmd.name}"] = list(_cmd.subcommands)
+
+# Also extract subcommands hinted in args_hint via pipe-separated patterns
+# e.g. args_hint="[on|off|tts|status]" for commands that don't have explicit subcommands.
+# NOTE: If a command already has explicit subcommands, this fallback is skipped.
+# Use the `subcommands` field on CommandDef for intentional tab-completable args.
+_PIPE_SUBS_RE = re.compile(r"[a-z]+(?:\|[a-z]+)+")
+for _cmd in COMMAND_REGISTRY:
+    key = f"/{_cmd.name}"
+    if key in SUBCOMMANDS or not _cmd.args_hint:
+        continue
+    m = _PIPE_SUBS_RE.search(_cmd.args_hint)
+    if m:
+        SUBCOMMANDS[key] = m.group(0).split("|")
+
+
+# ---------------------------------------------------------------------------
+# Gateway helpers
+# ---------------------------------------------------------------------------
+
+# Set of all command names + aliases recognized by the gateway
+GATEWAY_KNOWN_COMMANDS: frozenset[str] = frozenset(
+    name
+    for cmd in COMMAND_REGISTRY
+    if not cmd.cli_only
+    for name in (cmd.name, *cmd.aliases)
+)
+
+
+def gateway_help_lines() -> list[str]:
+    """Generate gateway help text lines from the registry."""
+    lines: list[str] = []
+    for cmd in COMMAND_REGISTRY:
+        if cmd.cli_only:
+            continue
+        args = f" {cmd.args_hint}" if cmd.args_hint else ""
+        alias_parts: list[str] = []
+        for a in cmd.aliases:
+            # Skip internal aliases like reload_mcp (underscore variant)
+            if a.replace("-", "_") == cmd.name.replace("-", "_") and a != cmd.name:
+                continue
+            alias_parts.append(f"`/{a}`")
+        alias_note = f" (alias: {', '.join(alias_parts)})" if alias_parts else ""
+        lines.append(f"`/{cmd.name}{args}` -- {cmd.description}{alias_note}")
+    return lines
+
+
+def telegram_bot_commands() -> list[tuple[str, str]]:
+    """Return (command_name, description) pairs for Telegram setMyCommands.
+
+    Telegram command names cannot contain hyphens, so they are replaced with
+    underscores.  Aliases are skipped -- Telegram shows one menu entry per
+    canonical command.
+    """
+    result: list[tuple[str, str]] = []
+    for cmd in COMMAND_REGISTRY:
+        if cmd.cli_only:
+            continue
+        tg_name = cmd.name.replace("-", "_")
+        result.append((tg_name, cmd.description))
+    return result
+
+
+def slack_subcommand_map() -> dict[str, str]:
+    """Return subcommand -> /command mapping for Slack /hermes handler.
+
+    Maps both canonical names and aliases so /hermes bg do stuff works
+    the same as /hermes background do stuff.
+    """
+    mapping: dict[str, str] = {}
+    for cmd in COMMAND_REGISTRY:
+        if cmd.cli_only:
+            continue
+        mapping[cmd.name] = f"/{cmd.name}"
+        for alias in cmd.aliases:
+            mapping[alias] = f"/{alias}"
+    return mapping
+
+
+# ---------------------------------------------------------------------------
+# Autocomplete
+# ---------------------------------------------------------------------------

 class SlashCommandCompleter(Completer):
-    """Autocomplete for built-in slash commands and optional skill commands."""
+    """Autocomplete for built-in slash commands, subcommands, and skill commands."""

    def __init__(
        self,
        skill_commands_provider: Callable[[], Mapping[str, dict[str, Any]]] | None = None,
+        model_completer_provider: Callable[[], dict[str, Any]] | None = None,
    ) -> None:
        self._skill_commands_provider = skill_commands_provider
+        # model_completer_provider returns {"current_provider": str,
+        #   "providers": {id: label, ...}, "models_for": callable(provider) -> list[str]}
+        self._model_completer_provider = model_completer_provider
+        self._model_info_cache: dict[str, Any] | None = None
+        self._model_info_cache_time: float = 0
+
+    def _get_model_info(self) -> dict[str, Any]:
+        """Get cached model/provider info for /model autocomplete."""
+        import time
+        now = time.monotonic()
+        if self._model_info_cache is not None and now - self._model_info_cache_time < 60:
+            return self._model_info_cache
+        if self._model_completer_provider is None:
+            return {}
+        try:
+            self._model_info_cache = self._model_completer_provider() or {}
+            self._model_info_cache_time = now
+        except Exception:
+            self._model_info_cache = self._model_info_cache or {}
+        return self._model_info_cache

    def _iter_skill_commands(self) -> Mapping[str, dict[str, Any]]:
        if self._skill_commands_provider is None:
@@ -92,9 +311,152 @@ class SlashCommandCompleter(Completer):
        """
        return f"{cmd_name} " if cmd_name == word else cmd_name

+    @staticmethod
+    def _extract_path_word(text: str) -> str | None:
+        """Extract the current word if it looks like a file path.
+
+        Returns the path-like token under the cursor, or None if the
+        current word doesn't look like a path.  A word is path-like when
+        it starts with ``./``, ``../``, ``~/``, ``/``, or contains a
+        ``/`` separator (e.g. ``src/main.py``).
+        """
+        if not text:
+            return None
+        # Walk backwards to find the start of the current "word".
+        # Words are delimited by spaces, but paths can contain almost anything.
+        i = len(text) - 1
+        while i >= 0 and text[i] != " ":
+            i -= 1
+        word = text[i + 1:]
+        if not word:
+            return None
+        # Only trigger path completion for path-like tokens
+        if word.startswith(("./", "../", "~/", "/")) or "/" in word:
+            return word
+        return None
+
+    @staticmethod
+    def _path_completions(word: str, limit: int = 30):
+        """Yield Completion objects for file paths matching *word*."""
+        expanded = os.path.expanduser(word)
+        # Split into directory part and prefix to match inside it
+        if expanded.endswith("/"):
+            search_dir = expanded
+            prefix = ""
+        else:
+            search_dir = os.path.dirname(expanded) or "."
+            prefix = os.path.basename(expanded)
+
+        try:
+            entries = os.listdir(search_dir)
+        except OSError:
+            return
+
+        count = 0
+        prefix_lower = prefix.lower()
+        for entry in sorted(entries):
+            if prefix and not entry.lower().startswith(prefix_lower):
+                continue
+            if count >= limit:
+                break
+
+            full_path = os.path.join(search_dir, entry)
+            is_dir = os.path.isdir(full_path)
+
+            # Build the completion text (what replaces the typed word)
+            if word.startswith("~"):
+                display_path = "~/" + os.path.relpath(full_path, os.path.expanduser("~"))
+            elif os.path.isabs(word):
+                display_path = full_path
+            else:
+                # Keep relative
+                display_path = os.path.relpath(full_path)
+
+            if is_dir:
+                display_path += "/"
+
+            suffix = "/" if is_dir else ""
+            meta = "dir" if is_dir else _file_size_label(full_path)
+
+            yield Completion(
+                display_path,
+                start_position=-len(word),
+                display=entry + suffix,
+                display_meta=meta,
+            )
+            count += 1
+
    def get_completions(self, document, complete_event):
        text = document.text_before_cursor
        if not text.startswith("/"):
+            # Try file path completion for non-slash input
+            path_word = self._extract_path_word(text)
+            if path_word is not None:
+                yield from self._path_completions(path_word)
+            return
+
+        # Check if we're completing a subcommand (base command already typed)
+        parts = text.split(maxsplit=1)
+        base_cmd = parts[0].lower()
+        if len(parts) > 1 or (len(parts) == 1 and text.endswith(" ")):
+            sub_text = parts[1] if len(parts) > 1 else ""
+            sub_lower = sub_text.lower()
+
+            # /model gets two-stage completion:
+            #   Stage 1: provider names (with : suffix)
+            #   Stage 2: after "provider:", list that provider's models
+            if base_cmd == "/model" and " " not in sub_text:
+                info = self._get_model_info()
+                if info:
+                    current_prov = info.get("current_provider", "")
+                    providers = info.get("providers", {})
+                    models_for = info.get("models_for")
+
+                    if ":" in sub_text:
+                        # Stage 2: "anthropic:cl" → models for anthropic
+                        prov_part, model_part = sub_text.split(":", 1)
+                        model_lower = model_part.lower()
+                        if models_for:
+                            try:
+                                prov_models = models_for(prov_part)
+                            except Exception:
+                                prov_models = []
+                            for mid in prov_models:
+                                if mid.lower().startswith(model_lower) and mid.lower() != model_lower:
+                                    full = f"{prov_part}:{mid}"
+                                    yield Completion(
+                                        full,
+                                        start_position=-len(sub_text),
+                                        display=mid,
+                                    )
+                    else:
+                        # Stage 1: providers sorted: non-current first, current last
+                        for pid, plabel in sorted(
+                            providers.items(),
+                            key=lambda kv: (kv[0] == current_prov, kv[0]),
+                        ):
+                            display_name = f"{pid}:"
+                            if display_name.lower().startswith(sub_lower):
+                                meta = f"({plabel})" if plabel != pid else ""
+                                if pid == current_prov:
+                                    meta = f"(current — {plabel})" if plabel != pid else "(current)"
+                                yield Completion(
+                                    display_name,
+                                    start_position=-len(sub_text),
+                                    display=display_name,
+                                    display_meta=meta,
+                                )
+                return
+
+            # Static subcommand completions
+            if " " not in sub_text and base_cmd in SUBCOMMANDS:
+                for sub in SUBCOMMANDS[base_cmd]:
+                    if sub.startswith(sub_lower) and sub != sub_lower:
+                        yield Completion(
+                            sub,
+                            start_position=-len(sub_text),
+                            display=sub,
+                        )
            return

        word = text[1:]
@@ -120,3 +482,102 @@ class SlashCommandCompleter(Completer):
                    display=cmd,
                    display_meta=f"⚡ {short_desc}",
                )
+
+
+# ---------------------------------------------------------------------------
+# Inline auto-suggest (ghost text) for slash commands
+# ---------------------------------------------------------------------------
+
+class SlashCommandAutoSuggest(AutoSuggest):
+    """Inline ghost-text suggestions for slash commands and their subcommands.
+
+    Shows the rest of a command or subcommand in dim text as you type.
+    Falls back to history-based suggestions for non-slash input.
+    """
+
+    def __init__(
+        self,
+        history_suggest: AutoSuggest | None = None,
+        completer: SlashCommandCompleter | None = None,
+    ) -> None:
+        self._history = history_suggest
+        self._completer = completer  # Reuse its model cache
+
+    def get_suggestion(self, buffer, document):
+        text = document.text_before_cursor
+
+        # Only suggest for slash commands
+        if not text.startswith("/"):
+            # Fall back to history for regular text
+            if self._history:
+                return self._history.get_suggestion(buffer, document)
+            return None
+
+        parts = text.split(maxsplit=1)
+        base_cmd = parts[0].lower()
+
+        if len(parts) == 1 and not text.endswith(" "):
+            # Still typing the command name: /upd → suggest "ate"
+            word = text[1:].lower()
+            for cmd in COMMANDS:
+                cmd_name = cmd[1:]  # strip leading /
+                if cmd_name.startswith(word) and cmd_name != word:
+                    return Suggestion(cmd_name[len(word):])
+            return None
+
+        # Command is complete — suggest subcommands or model names
+        sub_text = parts[1] if len(parts) > 1 else ""
+        sub_lower = sub_text.lower()
+
+        # /model gets two-stage ghost text
+        if base_cmd == "/model" and " " not in sub_text and self._completer:
+            info = self._completer._get_model_info()
+            if info:
+                providers = info.get("providers", {})
+                models_for = info.get("models_for")
+                current_prov = info.get("current_provider", "")
+
+                if ":" in sub_text:
+                    # Stage 2: after provider:, suggest model
+                    prov_part, model_part = sub_text.split(":", 1)
+                    model_lower = model_part.lower()
+                    if models_for:
+                        try:
+                            for mid in models_for(prov_part):
+                                if mid.lower().startswith(model_lower) and mid.lower() != model_lower:
+                                    return Suggestion(mid[len(model_part):])
+                        except Exception:
+                            pass
+                else:
+                    # Stage 1: suggest provider name with :
+                    for pid in sorted(providers, key=lambda p: (p == current_prov, p)):
+                        candidate = f"{pid}:"
+                        if candidate.lower().startswith(sub_lower) and candidate.lower() != sub_lower:
+                            return Suggestion(candidate[len(sub_text):])
+
+        # Static subcommands
+        if base_cmd in SUBCOMMANDS and SUBCOMMANDS[base_cmd]:
+            if " " not in sub_text:
+                for sub in SUBCOMMANDS[base_cmd]:
+                    if sub.startswith(sub_lower) and sub != sub_lower:
+                        return Suggestion(sub[len(sub_text):])
+
+        # Fall back to history
+        if self._history:
+            return self._history.get_suggestion(buffer, document)
+        return None
+
+
+def _file_size_label(path: str) -> str:
+    """Return a compact human-readable file size, or '' on error."""
+    try:
+        size = os.path.getsize(path)
+    except OSError:
+        return ""
+    if size < 1024:
+        return f"{size}B"
+    if size < 1024 * 1024:
+        return f"{size / 1024:.0f}K"
+    if size < 1024 * 1024 * 1024:
+        return f"{size / (1024 * 1024):.1f}M"
+    return f"{size / (1024 * 1024 * 1024):.1f}G"
@@ -25,6 +25,18 @@ from typing import Dict, Any, Optional, List, Tuple

 _IS_WINDOWS = platform.system() == "Windows"
 _ENV_VAR_NAME_RE = re.compile(r"^[A-Za-z_][A-Za-z0-9_]*$")
+# Env var names written to .env that aren't in OPTIONAL_ENV_VARS
+# (managed by setup/provider flows directly).
+_EXTRA_ENV_KEYS = frozenset({
+    "OPENAI_API_KEY", "OPENAI_BASE_URL",
+    "ANTHROPIC_API_KEY", "ANTHROPIC_TOKEN",
+    "AUXILIARY_VISION_MODEL",
+    "DISCORD_HOME_CHANNEL", "TELEGRAM_HOME_CHANNEL",
+    "SIGNAL_ACCOUNT", "SIGNAL_HTTP_URL",
+    "SIGNAL_ALLOWED_USERS", "SIGNAL_GROUP_ALLOWED_USERS",
+    "TERMINAL_ENV", "TERMINAL_SSH_KEY", "TERMINAL_SSH_PORT",
+    "WHATSAPP_MODE", "WHATSAPP_ENABLED",
+})

 import yaml

@@ -191,6 +203,12 @@ DEFAULT_CONFIG = {
            "base_url": "",
            "api_key": "",
        },
+        "approval": {
+            "provider": "auto",
+            "model": "",           # fast/cheap model recommended (e.g. gemini-flash, haiku)
+            "base_url": "",
+            "api_key": "",
+        },
        "mcp": {
            "provider": "auto",
            "model": "",
@@ -211,8 +229,15 @@ DEFAULT_CONFIG = {
        "resume_display": "full",
        "bell_on_complete": False,
        "show_reasoning": False,
+        "streaming": False,
+        "show_cost": False,       # Show $ cost in the status bar (off by default)
        "skin": "default",
    },
+
+    # Privacy settings
+    "privacy": {
+        "redact_pii": False,  # When True, hash user IDs and strip phone numbers from LLM context
+    },
    
    # Text-to-speech configuration
    "tts": {
@@ -297,6 +322,14 @@ DEFAULT_CONFIG = {
        "auto_thread": True,           # Auto-create threads on @mention in channels (like Slack)
    },

+    # Approval mode for dangerous commands:
+    #   manual — always prompt the user (default)
+    #   smart  — use auxiliary LLM to auto-approve low-risk commands, prompt for high-risk
+    #   off    — skip all approval prompts (equivalent to --yolo)
+    "approvals": {
+        "mode": "manual",
+    },
+
    # Permanently allowed dangerous command patterns (added via "always" approval)
    "command_allowlist": [],
    # User-defined quick commands that bypass the agent loop (type: exec only)
@@ -316,7 +349,7 @@ DEFAULT_CONFIG = {
    },

    # Config schema version - bump this when adding new required fields
-    "_config_version": 8,
+    "_config_version": 9,
 }

 # =============================================================================
@@ -486,6 +519,14 @@ OPTIONAL_ENV_VARS = {
        "password": False,
        "category": "tool",
    },
+    "BROWSER_USE_API_KEY": {
+        "description": "Browser Use API key for cloud browser (optional — local browser works without this)",
+        "prompt": "Browser Use API key",
+        "url": "https://browser-use.com/",
+        "tools": ["browser_navigate", "browser_click"],
+        "password": True,
+        "category": "tool",
+    },
    "FAL_KEY": {
        "description": "FAL API key for image generation",
        "prompt": "FAL API key",
@@ -744,7 +785,15 @@ def migrate_config(interactive: bool = True, quiet: bool = False) -> Dict[str, A
        Dict with migration results: {"env_added": [...], "config_added": [...], "warnings": [...]}
    """
    results = {"env_added": [], "config_added": [], "warnings": []}
-    
+
+    # ── Always: sanitize .env (split concatenated keys) ──
+    try:
+        fixes = sanitize_env_file()
+        if fixes and not quiet:
+            print(f"  ✓ Repaired .env file ({fixes} corrupted entries fixed)")
+    except Exception:
+        pass  # best-effort; don't block migration on sanitize failure
+
    # Check config version
    current_ver, latest_ver = check_config_version()
    
@@ -787,6 +836,18 @@ def migrate_config(interactive: bool = True, quiet: bool = False) -> Dict[str, A
                tz_display = config["timezone"] or "(server-local)"
                print(f"  ✓ Added timezone to config.yaml: {tz_display}")

+    # ── Version 8 → 9: clear ANTHROPIC_TOKEN from .env ──
+    # The new Anthropic auth flow no longer uses this env var.
+    if current_ver < 9:
+        try:
+            old_token = get_env_value("ANTHROPIC_TOKEN")
+            if old_token:
+                save_env_value("ANTHROPIC_TOKEN", "")
+                if not quiet:
+                    print("  ✓ Cleared ANTHROPIC_TOKEN from .env (no longer used)")
+        except Exception:
+            pass
+
    if current_ver < latest_ver and not quiet:
        print(f"Config version: {current_ver} → {latest_ver}")
    
@@ -1100,6 +1161,102 @@ def load_env() -> Dict[str, str]:
    return env_vars


+def _sanitize_env_lines(lines: list) -> list:
+    """Fix corrupted .env lines before writing.
+
+    Handles two known corruption patterns:
+    1. Concatenated KEY=VALUE pairs on a single line (missing newline between
+       entries, e.g. ``ANTHROPIC_API_KEY=sk-...OPENAI_BASE_URL=https://...``).
+    2. Stale ``KEY=***`` placeholder entries left by incomplete setup runs.
+
+    Uses a known-keys set (OPTIONAL_ENV_VARS + _EXTRA_ENV_KEYS) so we only
+    split on real Hermes env var names, avoiding false positives from values
+    that happen to contain uppercase text with ``=``.
+    """
+    # Build the known keys set lazily from OPTIONAL_ENV_VARS + extras.
+    # Done inside the function so OPTIONAL_ENV_VARS is guaranteed to be defined.
+    known_keys = set(OPTIONAL_ENV_VARS.keys()) | _EXTRA_ENV_KEYS
+
+    sanitized: list[str] = []
+    for line in lines:
+        raw = line.rstrip("\r\n")
+        stripped = raw.strip()
+
+        # Preserve blank lines and comments
+        if not stripped or stripped.startswith("#"):
+            sanitized.append(raw + "\n")
+            continue
+
+        # Detect concatenated KEY=VALUE pairs on one line.
+        # Search for known KEY= patterns at any position in the line.
+        split_positions = []
+        for key_name in known_keys:
+            needle = key_name + "="
+            idx = stripped.find(needle)
+            while idx >= 0:
+                split_positions.append(idx)
+                idx = stripped.find(needle, idx + len(needle))
+
+        if len(split_positions) > 1:
+            split_positions.sort()
+            # Deduplicate (shouldn't happen, but be safe)
+            split_positions = sorted(set(split_positions))
+            for i, pos in enumerate(split_positions):
+                end = split_positions[i + 1] if i + 1 < len(split_positions) else len(stripped)
+                part = stripped[pos:end].strip()
+                if part:
+                    sanitized.append(part + "\n")
+        else:
+            sanitized.append(stripped + "\n")
+
+    return sanitized
+
+
+def sanitize_env_file() -> int:
+    """Read, sanitize, and rewrite ~/.hermes/.env in place.
+
+    Returns the number of lines that were fixed (concatenation splits +
+    placeholder removals).  Returns 0 when no changes are needed.
+    """
+    env_path = get_env_path()
+    if not env_path.exists():
+        return 0
+
+    read_kw = {"encoding": "utf-8", "errors": "replace"} if _IS_WINDOWS else {}
+    write_kw = {"encoding": "utf-8"} if _IS_WINDOWS else {}
+
+    with open(env_path, **read_kw) as f:
+        original_lines = f.readlines()
+
+    sanitized = _sanitize_env_lines(original_lines)
+
+    if sanitized == original_lines:
+        return 0
+
+    # Count fixes: difference in line count (from splits) + removed lines
+    fixes = abs(len(sanitized) - len(original_lines))
+    if fixes == 0:
+        # Lines changed content (e.g. *** removal) even if count is same
+        fixes = sum(1 for a, b in zip(original_lines, sanitized) if a != b)
+        fixes += abs(len(sanitized) - len(original_lines))
+
+    fd, tmp_path = tempfile.mkstemp(dir=str(env_path.parent), suffix=".tmp", prefix=".env_")
+    try:
+        with os.fdopen(fd, "w", **write_kw) as f:
+            f.writelines(sanitized)
+            f.flush()
+            os.fsync(f.fileno())
+        os.replace(tmp_path, env_path)
+    except BaseException:
+        try:
+            os.unlink(tmp_path)
+        except OSError:
+            pass
+        raise
+    _secure_file(env_path)
+    return fixes
+
+
 def save_env_value(key: str, value: str):
    """Save or update a value in ~/.hermes/.env."""
    if not _ENV_VAR_NAME_RE.match(key):
@@ -1117,6 +1274,8 @@ def save_env_value(key: str, value: str):
    if env_path.exists():
        with open(env_path, **read_kw) as f:
            lines = f.readlines()
+        # Sanitize on every read: split concatenated keys, drop stale placeholders
+        lines = _sanitize_env_lines(lines)
    
    # Find and update or append
    found = False
@@ -1237,6 +1396,7 @@ def show_config():
        ("VOICE_TOOLS_OPENAI_KEY", "OpenAI (STT/TTS)"),
        ("FIRECRAWL_API_KEY", "Firecrawl"),
        ("BROWSERBASE_API_KEY", "Browserbase"),
+        ("BROWSER_USE_API_KEY", "Browser Use"),
        ("FAL_KEY", "FAL"),
    ]
    
@@ -1383,7 +1543,7 @@ def set_config_value(key: str, value: str):
    # Check if it's an API key (goes to .env)
    api_keys = [
        'OPENROUTER_API_KEY', 'OPENAI_API_KEY', 'ANTHROPIC_API_KEY', 'VOICE_TOOLS_OPENAI_KEY',
-        'FIRECRAWL_API_KEY', 'FIRECRAWL_API_URL', 'BROWSERBASE_API_KEY', 'BROWSERBASE_PROJECT_ID',
+        'FIRECRAWL_API_KEY', 'FIRECRAWL_API_URL', 'BROWSERBASE_API_KEY', 'BROWSERBASE_PROJECT_ID', 'BROWSER_USE_API_KEY',
        'FAL_KEY', 'TELEGRAM_BOT_TOKEN', 'DISCORD_BOT_TOKEN',
        'TERMINAL_SSH_HOST', 'TERMINAL_SSH_USER', 'TERMINAL_SSH_KEY',
        'SUDO_PASSWORD', 'SLACK_BOT_TOKEN', 'SLACK_APP_TOKEN',
@@ -570,6 +570,7 @@ def run_doctor(args):
        # MiniMax APIs don't support /models endpoint — https://github.com/NousResearch/hermes-agent/issues/811
        ("MiniMax",          ("MINIMAX_API_KEY",),                            None,                                  "MINIMAX_BASE_URL", False),
        ("MiniMax (China)",  ("MINIMAX_CN_API_KEY",),                         None,                                  "MINIMAX_CN_BASE_URL", False),
+        ("AI Gateway",       ("AI_GATEWAY_API_KEY",),                          "https://ai-gateway.vercel.sh/v1/models", "AI_GATEWAY_BASE_URL", True),
    ]
    for _pname, _env_vars, _default_url, _base_env, _supports_health_check in _apikey_providers:
        _key = ""
@@ -150,7 +150,31 @@ def get_systemd_unit_path(system: bool = False) -> Path:
    return Path.home() / ".config" / "systemd" / "user" / f"{name}.service"


+def _ensure_user_systemd_env() -> None:
+    """Ensure DBUS_SESSION_BUS_ADDRESS and XDG_RUNTIME_DIR are set for systemctl --user.
+
+    On headless servers (SSH sessions), these env vars may be missing even when
+    the user's systemd instance is running (via linger).  Without them,
+    ``systemctl --user`` fails with "Failed to connect to bus: No medium found".
+    We detect the standard socket path and set the vars so all subsequent
+    subprocess calls inherit them.
+    """
+    uid = os.getuid()
+    if "XDG_RUNTIME_DIR" not in os.environ:
+        runtime_dir = f"/run/user/{uid}"
+        if Path(runtime_dir).exists():
+            os.environ["XDG_RUNTIME_DIR"] = runtime_dir
+
+    if "DBUS_SESSION_BUS_ADDRESS" not in os.environ:
+        xdg_runtime = os.environ.get("XDG_RUNTIME_DIR", f"/run/user/{uid}")
+        bus_path = Path(xdg_runtime) / "bus"
+        if bus_path.exists():
+            os.environ["DBUS_SESSION_BUS_ADDRESS"] = f"unix:path={bus_path}"
+
+
 def _systemctl_cmd(system: bool = False) -> list[str]:
+    if not system:
+        _ensure_user_systemd_env()
    return ["systemctl"] if system else ["systemctl", "--user"]


@@ -371,8 +395,6 @@ def get_hermes_cli_path() -> str:
 # =============================================================================

 def generate_systemd_unit(system: bool = False, run_as_user: str | None = None) -> str:
-    import shutil
-
    python_path = get_python_path()
    working_dir = str(PROJECT_ROOT)
    venv_dir = str(PROJECT_ROOT / "venv")
@@ -381,7 +403,6 @@ def generate_systemd_unit(system: bool = False, run_as_user: str | None = None)

    # Build a PATH that includes the venv, node_modules, and standard system dirs
    sane_path = f"{venv_bin}:{node_bin}:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
-    hermes_cli = shutil.which("hermes") or f"{python_path} -m hermes_cli.main"

    hermes_home = str(Path(os.getenv("HERMES_HOME", Path.home() / ".hermes")).resolve())

@@ -408,7 +429,7 @@ Restart=on-failure
 RestartSec=10
 KillMode=mixed
 KillSignal=SIGTERM
-TimeoutStopSec=15
+TimeoutStopSec=60
 StandardOutput=journal
 StandardError=journal

@@ -423,7 +444,6 @@ After=network.target
 [Service]
 Type=simple
 ExecStart={python_path} -m hermes_cli.main gateway run --replace
-ExecStop={hermes_cli} gateway stop
 WorkingDirectory={working_dir}
 Environment="PATH={sane_path}"
 Environment="VIRTUAL_ENV={venv_dir}"
@@ -432,7 +452,7 @@ Restart=on-failure
 RestartSec=10
 KillMode=mixed
 KillSignal=SIGTERM
-TimeoutStopSec=15
+TimeoutStopSec=60
 StandardOutput=journal
 StandardError=journal

@@ -542,6 +562,12 @@ def systemd_install(force: bool = False, system: bool = False, run_as_user: str
    scope_flag = " --system" if system else ""

    if unit_path.exists() and not force:
+        if not systemd_unit_is_current(system=system):
+            print(f"↻ Repairing outdated {_service_scope_label(system)} systemd service at: {unit_path}")
+            refresh_systemd_unit_if_needed(system=system)
+            subprocess.run(_systemctl_cmd(system) + ["enable", get_service_name()], check=True)
+            print(f"✓ {_service_scope_label(system).capitalize()} service definition updated")
+            return
        print(f"Service already installed at: {unit_path}")
        print("Use --force to reinstall")
        return
@@ -709,6 +735,7 @@ def generate_launchd_plist() -> str:
        <string>hermes_cli.main</string>
        <string>gateway</string>
        <string>run</string>
+        <string>--replace</string>
    </array>
    
    <key>WorkingDirectory</key>
@@ -732,10 +759,45 @@ def generate_launchd_plist() -> str:
 </plist>
 """

+def launchd_plist_is_current() -> bool:
+    """Check if the installed launchd plist matches the currently generated one."""
+    plist_path = get_launchd_plist_path()
+    if not plist_path.exists():
+        return False
+
+    installed = plist_path.read_text(encoding="utf-8")
+    expected = generate_launchd_plist()
+    return _normalize_service_definition(installed) == _normalize_service_definition(expected)
+
+
+def refresh_launchd_plist_if_needed() -> bool:
+    """Rewrite the installed launchd plist when the generated definition has changed.
+
+    Unlike systemd, launchd picks up plist changes on the next ``launchctl stop``/
+    ``launchctl start`` cycle — no daemon-reload is needed.  We still unload/reload
+    to make launchd re-read the updated plist immediately.
+    """
+    plist_path = get_launchd_plist_path()
+    if not plist_path.exists() or launchd_plist_is_current():
+        return False
+
+    plist_path.write_text(generate_launchd_plist(), encoding="utf-8")
+    # Unload/reload so launchd picks up the new definition
+    subprocess.run(["launchctl", "unload", str(plist_path)], check=False)
+    subprocess.run(["launchctl", "load", str(plist_path)], check=False)
+    print("↻ Updated gateway launchd service definition to match the current Hermes install")
+    return True
+
+
 def launchd_install(force: bool = False):
    plist_path = get_launchd_plist_path()
    
    if plist_path.exists() and not force:
+        if not launchd_plist_is_current():
+            print(f"↻ Repairing outdated launchd service at: {plist_path}")
+            refresh_launchd_plist_if_needed()
+            print("✓ Service definition updated")
+            return
        print(f"Service already installed at: {plist_path}")
        print("Use --force to reinstall")
        return
@@ -764,7 +826,16 @@ def launchd_uninstall():
    print("✓ Service uninstalled")

 def launchd_start():
-    subprocess.run(["launchctl", "start", "ai.hermes.gateway"], check=True)
+    refresh_launchd_plist_if_needed()
+    plist_path = get_launchd_plist_path()
+    try:
+        subprocess.run(["launchctl", "start", "ai.hermes.gateway"], check=True)
+    except subprocess.CalledProcessError as e:
+        if e.returncode != 3 or not plist_path.exists():
+            raise
+        print("↻ launchd job was unloaded; reloading service definition")
+        subprocess.run(["launchctl", "load", str(plist_path)], check=True)
+        subprocess.run(["launchctl", "start", "ai.hermes.gateway"], check=True)
    print("✓ Service started")

 def launchd_stop():
@@ -772,21 +843,36 @@ def launchd_stop():
    print("✓ Service stopped")

 def launchd_restart():
-    launchd_stop()
+    try:
+        launchd_stop()
+    except subprocess.CalledProcessError as e:
+        if e.returncode != 3:
+            raise
+        print("↻ launchd job was unloaded; skipping stop")
    launchd_start()

 def launchd_status(deep: bool = False):
+    plist_path = get_launchd_plist_path()
    result = subprocess.run(
        ["launchctl", "list", "ai.hermes.gateway"],
        capture_output=True,
        text=True
    )
+
+    print(f"Launchd plist: {plist_path}")
+    if launchd_plist_is_current():
+        print("✓ Service definition matches the current Hermes install")
+    else:
+        print("⚠ Service definition is stale relative to the current Hermes install")
+        print("  Run: hermes gateway start")
    
    if result.returncode == 0:
        print("✓ Gateway service is loaded")
        print(result.stdout)
    else:
        print("✗ Gateway service is not loaded")
+        print("  Service definition exists locally but launchd has not loaded it.")
+        print("  Run: hermes gateway start")
    
    if deep:
        log_file = get_hermes_home() / "logs" / "gateway.log"
@@ -1502,14 +1588,17 @@ def gateway_command(args):
        # Try service first, fall back to killing and restarting
        service_available = False
        system = getattr(args, 'system', False)
+        service_configured = False
        
        if is_linux() and (get_systemd_unit_path(system=False).exists() or get_systemd_unit_path(system=True).exists()):
+            service_configured = True
            try:
                systemd_restart(system=system)
                service_available = True
            except subprocess.CalledProcessError:
                pass
        elif is_macos() and get_launchd_plist_path().exists():
+            service_configured = True
            try:
                launchd_restart()
                service_available = True
@@ -1517,6 +1606,29 @@ def gateway_command(args):
                pass
        
        if not service_available:
+            # systemd/launchd restart failed — check if linger is the issue
+            if is_linux():
+                linger_ok, _detail = get_systemd_linger_status()
+                if linger_ok is not True:
+                    import getpass
+                    _username = getpass.getuser()
+                    print()
+                    print("⚠ Cannot restart gateway as a service — linger is not enabled.")
+                    print("  The gateway user service requires linger to function on headless servers.")
+                    print()
+                    print(f"  Run:  sudo loginctl enable-linger {_username}")
+                    print()
+                    print("  Then restart the gateway:")
+                    print("    hermes gateway restart")
+                    return
+
+            if service_configured:
+                print()
+                print("✗ Gateway service restart failed.")
+                print("  The service definition exists, but the service manager did not recover it.")
+                print("  Fix the service, then retry: hermes gateway start")
+                sys.exit(1)
+
            # Manual restart: kill existing processes
            killed = kill_gateway_processes()
            if killed:
@@ -768,6 +768,7 @@ def cmd_model(args):
        "kimi-coding": "Kimi / Moonshot",
        "minimax": "MiniMax",
        "minimax-cn": "MiniMax (China)",
+        "ai-gateway": "AI Gateway",
        "custom": "Custom endpoint",
    }
    active_label = provider_labels.get(active, active)
@@ -787,6 +788,7 @@ def cmd_model(args):
        ("kimi-coding", "Kimi / Moonshot (Moonshot AI direct API)"),
        ("minimax", "MiniMax (global direct API)"),
        ("minimax-cn", "MiniMax China (domestic direct API)"),
+        ("ai-gateway", "AI Gateway (Vercel — 200+ models, pay-per-use)"),
    ]

    # Add user-defined custom providers from config.yaml
@@ -855,7 +857,7 @@ def cmd_model(args):
        _model_flow_anthropic(config, current_model)
    elif selected_provider == "kimi-coding":
        _model_flow_kimi(config, current_model)
-    elif selected_provider in ("zai", "minimax", "minimax-cn"):
+    elif selected_provider in ("zai", "minimax", "minimax-cn", "ai-gateway"):
        _model_flow_api_key_provider(config, selected_provider, current_model)


@@ -2122,7 +2124,17 @@ def _restore_stashed_changes(
    print("  Review `git diff` / `git status` if Hermes behaves unexpectedly.")
    return True

-
+def _invalidate_update_cache():
+    """Delete the update-check cache so ``hermes --version`` doesn't
+    report a stale "commits behind" count after a successful update."""
+    try:
+        cache_file = Path(os.getenv(
+            "HERMES_HOME", Path.home() / ".hermes"
+        )) / ".update_check"
+        if cache_file.exists():
+            cache_file.unlink()
+    except Exception:
+        pass

 def cmd_update(args):
    """Update Hermes Agent to the latest version."""
@@ -2195,6 +2207,7 @@ def cmd_update(args):
        commit_count = int(result.stdout.strip())
        
        if commit_count == 0:
+            _invalidate_update_cache()
            print("✓ Already up to date!")
            return
        
@@ -2215,6 +2228,8 @@ def cmd_update(args):
                    prompt_user=prompt_for_restore,
                )
        
+        _invalidate_update_cache()
+        
        # Reinstall Python dependencies (prefer uv for speed, fall back to pip)
        print("→ Updating Python dependencies...")
        uv_bin = shutil.which("uv")
@@ -2306,14 +2321,20 @@ def cmd_update(args):
        # installation's gateway — safe with multiple installations.
        try:
            from gateway.status import get_running_pid, remove_pid_file
-            from hermes_cli.gateway import get_service_name
+            from hermes_cli.gateway import (
+                get_service_name, get_launchd_plist_path, is_macos, is_linux,
+                refresh_launchd_plist_if_needed,
+                _ensure_user_systemd_env, get_systemd_linger_status,
+            )
            import signal as _signal

            _gw_service_name = get_service_name()
            existing_pid = get_running_pid()
            has_systemd_service = False
+            has_launchd_service = False

            try:
+                _ensure_user_systemd_env()
                check = subprocess.run(
                    ["systemctl", "--user", "is-active", _gw_service_name],
                    capture_output=True, text=True, timeout=5,
@@ -2322,23 +2343,36 @@ def cmd_update(args):
            except (FileNotFoundError, subprocess.TimeoutExpired):
                pass

-            if existing_pid or has_systemd_service:
+            # Check for macOS launchd service
+            if is_macos():
+                try:
+                    plist_path = get_launchd_plist_path()
+                    if plist_path.exists():
+                        check = subprocess.run(
+                            ["launchctl", "list", "ai.hermes.gateway"],
+                            capture_output=True, text=True, timeout=5,
+                        )
+                        has_launchd_service = check.returncode == 0
+                except (FileNotFoundError, subprocess.TimeoutExpired):
+                    pass
+
+            if existing_pid or has_systemd_service or has_launchd_service:
                print()

-                # Kill the PID-file-tracked process (may be manual or systemd)
-                if existing_pid:
-                    try:
-                        os.kill(existing_pid, _signal.SIGTERM)
-                        print(f"→ Stopped gateway process (PID {existing_pid})")
-                    except ProcessLookupError:
-                        pass  # Already gone
-                    except PermissionError:
-                        print(f"⚠ Permission denied killing gateway PID {existing_pid}")
-                    remove_pid_file()
-
-                # Restart the systemd service (starts a fresh process)
+                # When a service manager is handling the gateway, let it
+                # manage the lifecycle — don't manually SIGTERM the PID
+                # (launchd KeepAlive would respawn immediately, causing races).
                if has_systemd_service:
                    import time as _time
+                    if existing_pid:
+                        try:
+                            os.kill(existing_pid, _signal.SIGTERM)
+                            print(f"→ Stopped gateway process (PID {existing_pid})")
+                        except ProcessLookupError:
+                            pass
+                        except PermissionError:
+                            print(f"⚠ Permission denied killing gateway PID {existing_pid}")
+                        remove_pid_file()
                    _time.sleep(1)  # Brief pause for port/socket release
                    print("→ Restarting gateway service...")
                    restart = subprocess.run(
@@ -2349,8 +2383,50 @@ def cmd_update(args):
                        print("✓ Gateway restarted.")
                    else:
                        print(f"⚠ Gateway restart failed: {restart.stderr.strip()}")
+                        # Check if linger is the issue
+                        if is_linux():
+                            linger_ok, _detail = get_systemd_linger_status()
+                            if linger_ok is not True:
+                                import getpass
+                                _username = getpass.getuser()
+                                print()
+                                print("  Linger must be enabled for the gateway user service to function.")
+                                print(f"  Run:  sudo loginctl enable-linger {_username}")
+                                print()
+                                print("  Then restart the gateway:")
+                                print("    hermes gateway restart")
+                            else:
+                                print("  Try manually: hermes gateway restart")
+                elif has_launchd_service:
+                    # Refresh the plist first (picks up --replace and other
+                    # changes from the update we just pulled).
+                    refresh_launchd_plist_if_needed()
+                    # Explicit stop+start — don't rely on KeepAlive respawn
+                    # after a manual SIGTERM, which would race with the
+                    # PID file cleanup.
+                    print("→ Restarting gateway service...")
+                    stop = subprocess.run(
+                        ["launchctl", "stop", "ai.hermes.gateway"],
+                        capture_output=True, text=True, timeout=10,
+                    )
+                    start = subprocess.run(
+                        ["launchctl", "start", "ai.hermes.gateway"],
+                        capture_output=True, text=True, timeout=10,
+                    )
+                    if start.returncode == 0:
+                        print("✓ Gateway restarted via launchd.")
+                    else:
+                        print(f"⚠ Gateway restart failed: {start.stderr.strip()}")
                        print("  Try manually: hermes gateway restart")
                elif existing_pid:
+                    try:
+                        os.kill(existing_pid, _signal.SIGTERM)
+                        print(f"→ Stopped gateway process (PID {existing_pid})")
+                    except ProcessLookupError:
+                        pass  # Already gone
+                    except PermissionError:
+                        print(f"⚠ Permission denied killing gateway PID {existing_pid}")
+                    remove_pid_file()
                    print("  ℹ️  Gateway was running manually (not as a service).")
                    print("  Restart it with: hermes gateway run")
        except Exception as e:
@@ -2917,7 +2993,8 @@ For more help on a command:
    skills_install = skills_subparsers.add_parser("install", help="Install a skill")
    skills_install.add_argument("identifier", help="Skill identifier (e.g. openai/skills/skill-creator)")
    skills_install.add_argument("--category", default="", help="Category folder to install into")
-    skills_install.add_argument("--force", "--yes", "-y", dest="force", action="store_true", help="Install despite blocked scan verdict")
+    skills_install.add_argument("--force", action="store_true", help="Install despite blocked scan verdict")
+    skills_install.add_argument("--yes", "-y", action="store_true", help="Skip confirmation prompt (needed in TUI mode)")

    skills_inspect = skills_subparsers.add_parser("inspect", help="Preview a skill without installing")
    skills_inspect.add_argument("identifier", help="Skill identifier")
@@ -8,6 +8,7 @@ Add, remove, or reorder entries here — both `hermes setup` and
 from __future__ import annotations

 import json
+import os
 import urllib.request
 import urllib.error
 from difflib import get_close_matches
@@ -82,6 +83,20 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
        "deepseek-chat",
        "deepseek-reasoner",
    ],
+    "ai-gateway": [
+        "anthropic/claude-opus-4.6",
+        "anthropic/claude-sonnet-4.6",
+        "anthropic/claude-sonnet-4.5",
+        "anthropic/claude-haiku-4.5",
+        "openai/gpt-5",
+        "openai/gpt-4.1",
+        "openai/gpt-4.1-mini",
+        "google/gemini-3-pro-preview",
+        "google/gemini-3-flash",
+        "google/gemini-2.5-pro",
+        "google/gemini-2.5-flash",
+        "deepseek/deepseek-v3.2",
+    ],
 }

 _PROVIDER_LABELS = {
@@ -94,6 +109,7 @@ _PROVIDER_LABELS = {
    "minimax-cn": "MiniMax (China)",
    "anthropic": "Anthropic",
    "deepseek": "DeepSeek",
+    "ai-gateway": "AI Gateway",
    "custom": "Custom endpoint",
 }

@@ -109,6 +125,9 @@ _PROVIDER_ALIASES = {
    "claude": "anthropic",
    "claude-code": "anthropic",
    "deep-seek": "deepseek",
+    "aigateway": "ai-gateway",
+    "vercel": "ai-gateway",
+    "vercel-ai-gateway": "ai-gateway",
 }


@@ -142,7 +161,8 @@ def list_available_providers() -> list[dict[str, str]]:
    # Canonical providers in display order
    _PROVIDER_ORDER = [
        "openrouter", "nous", "openai-codex",
-        "zai", "kimi-coding", "minimax", "minimax-cn", "anthropic", "deepseek",
+        "zai", "kimi-coding", "minimax", "minimax-cn", "anthropic",
+        "ai-gateway", "deepseek", "custom",
    ]
    # Build reverse alias map
    aliases_for: dict[str, list[str]] = {}
@@ -156,9 +176,12 @@ def list_available_providers() -> list[dict[str, str]]:
        # Check if this provider has credentials available
        has_creds = False
        try:
-            from hermes_cli.runtime_provider import resolve_runtime_provider
-            runtime = resolve_runtime_provider(requested=pid)
-            has_creds = bool(runtime.get("api_key"))
+            if pid == "custom":
+                has_creds = bool(_get_custom_base_url())
+            else:
+                from hermes_cli.runtime_provider import resolve_runtime_provider
+                runtime = resolve_runtime_provider(requested=pid)
+                has_creds = bool(runtime.get("api_key"))
        except Exception:
            pass
        result.append({
@@ -197,6 +220,19 @@ def parse_model_input(raw: str, current_provider: str) -> tuple[str, str]:
    return (current_provider, stripped)


+def _get_custom_base_url() -> str:
+    """Get the custom endpoint base_url from config.yaml."""
+    try:
+        from hermes_cli.config import load_config
+        config = load_config()
+        model_cfg = config.get("model", {})
+        if isinstance(model_cfg, dict):
+            return str(model_cfg.get("base_url", "")).strip()
+    except Exception:
+        pass
+    return ""
+
+
 def curated_models_for_provider(provider: Optional[str]) -> list[tuple[str, str]]:
    """Return ``(model_id, description)`` tuples for a provider's model list.

@@ -372,6 +408,22 @@ def provider_model_ids(provider: Optional[str]) -> list[str]:
        live = _fetch_anthropic_models()
        if live:
            return live
+    if normalized == "ai-gateway":
+        live = _fetch_ai_gateway_models()
+        if live:
+            return live
+    if normalized == "custom":
+        base_url = _get_custom_base_url()
+        if base_url:
+            # Try common API key env vars for custom endpoints
+            api_key = (
+                os.getenv("CUSTOM_API_KEY", "")
+                or os.getenv("OPENAI_API_KEY", "")
+                or os.getenv("OPENROUTER_API_KEY", "")
+            )
+            live = fetch_api_models(api_key, base_url)
+            if live:
+                return live
    return list(_PROVIDER_MODELS.get(normalized, []))


@@ -475,6 +527,33 @@ def probe_api_models(
    }


+def _fetch_ai_gateway_models(timeout: float = 5.0) -> Optional[list[str]]:
+    """Fetch available language models with tool-use from AI Gateway."""
+    api_key = os.getenv("AI_GATEWAY_API_KEY", "").strip()
+    if not api_key:
+        return None
+    base_url = os.getenv("AI_GATEWAY_BASE_URL", "").strip()
+    if not base_url:
+        from hermes_constants import AI_GATEWAY_BASE_URL
+        base_url = AI_GATEWAY_BASE_URL
+
+    url = base_url.rstrip("/") + "/models"
+    headers: dict[str, str] = {"Authorization": f"Bearer {api_key}"}
+    req = urllib.request.Request(url, headers=headers)
+    try:
+        with urllib.request.urlopen(req, timeout=timeout) as resp:
+            data = json.loads(resp.read().decode())
+            return [
+                m["id"]
+                for m in data.get("data", [])
+                if m.get("id")
+                and m.get("type") == "language"
+                and "tool-use" in (m.get("tags") or [])
+            ]
+    except Exception:
+        return None
+
+
 def fetch_api_models(
    api_key: Optional[str],
    base_url: Optional[str],
@@ -0,0 +1,449 @@
+"""
+Hermes Plugin System
+====================
+
+Discovers, loads, and manages plugins from three sources:
+
+1. **User plugins**   – ``~/.hermes/plugins/<name>/``
+2. **Project plugins** – ``./.hermes/plugins/<name>/``
+3. **Pip plugins**     – packages that expose the ``hermes_agent.plugins``
+   entry-point group.
+
+Each directory plugin must contain a ``plugin.yaml`` manifest **and** an
+``__init__.py`` with a ``register(ctx)`` function.
+
+Lifecycle hooks
+---------------
+Plugins may register callbacks for any of the hooks in ``VALID_HOOKS``.
+The agent core calls ``invoke_hook(name, **kwargs)`` at the appropriate
+points.
+
+Tool registration
+-----------------
+``PluginContext.register_tool()`` delegates to ``tools.registry.register()``
+so plugin-defined tools appear alongside the built-in tools.
+"""
+
+from __future__ import annotations
+
+import importlib
+import importlib.metadata
+import importlib.util
+import logging
+import os
+import sys
+import types
+from dataclasses import dataclass, field
+from pathlib import Path
+from typing import Any, Callable, Dict, List, Optional, Set
+
+try:
+    import yaml
+except ImportError:  # pragma: no cover – yaml is optional at import time
+    yaml = None  # type: ignore[assignment]
+
+logger = logging.getLogger(__name__)
+
+# ---------------------------------------------------------------------------
+# Constants
+# ---------------------------------------------------------------------------
+
+VALID_HOOKS: Set[str] = {
+    "pre_tool_call",
+    "post_tool_call",
+    "pre_llm_call",
+    "post_llm_call",
+    "on_session_start",
+    "on_session_end",
+}
+
+ENTRY_POINTS_GROUP = "hermes_agent.plugins"
+
+_NS_PARENT = "hermes_plugins"
+
+
+# ---------------------------------------------------------------------------
+# Data classes
+# ---------------------------------------------------------------------------
+
+@dataclass
+class PluginManifest:
+    """Parsed representation of a plugin.yaml manifest."""
+
+    name: str
+    version: str = ""
+    description: str = ""
+    author: str = ""
+    requires_env: List[str] = field(default_factory=list)
+    provides_tools: List[str] = field(default_factory=list)
+    provides_hooks: List[str] = field(default_factory=list)
+    source: str = ""        # "user", "project", or "entrypoint"
+    path: Optional[str] = None
+
+
+@dataclass
+class LoadedPlugin:
+    """Runtime state for a single loaded plugin."""
+
+    manifest: PluginManifest
+    module: Optional[types.ModuleType] = None
+    tools_registered: List[str] = field(default_factory=list)
+    hooks_registered: List[str] = field(default_factory=list)
+    enabled: bool = False
+    error: Optional[str] = None
+
+
+# ---------------------------------------------------------------------------
+# PluginContext  – handed to each plugin's ``register()`` function
+# ---------------------------------------------------------------------------
+
+class PluginContext:
+    """Facade given to plugins so they can register tools and hooks."""
+
+    def __init__(self, manifest: PluginManifest, manager: "PluginManager"):
+        self.manifest = manifest
+        self._manager = manager
+
+    # -- tool registration --------------------------------------------------
+
+    def register_tool(
+        self,
+        name: str,
+        toolset: str,
+        schema: dict,
+        handler: Callable,
+        check_fn: Callable | None = None,
+        requires_env: list | None = None,
+        is_async: bool = False,
+        description: str = "",
+        emoji: str = "",
+    ) -> None:
+        """Register a tool in the global registry **and** track it as plugin-provided."""
+        from tools.registry import registry
+
+        registry.register(
+            name=name,
+            toolset=toolset,
+            schema=schema,
+            handler=handler,
+            check_fn=check_fn,
+            requires_env=requires_env,
+            is_async=is_async,
+            description=description,
+            emoji=emoji,
+        )
+        self._manager._plugin_tool_names.add(name)
+        logger.debug("Plugin %s registered tool: %s", self.manifest.name, name)
+
+    # -- hook registration --------------------------------------------------
+
+    def register_hook(self, hook_name: str, callback: Callable) -> None:
+        """Register a lifecycle hook callback.
+
+        Unknown hook names produce a warning but are still stored so
+        forward-compatible plugins don't break.
+        """
+        if hook_name not in VALID_HOOKS:
+            logger.warning(
+                "Plugin '%s' registered unknown hook '%s' "
+                "(valid: %s)",
+                self.manifest.name,
+                hook_name,
+                ", ".join(sorted(VALID_HOOKS)),
+            )
+        self._manager._hooks.setdefault(hook_name, []).append(callback)
+        logger.debug("Plugin %s registered hook: %s", self.manifest.name, hook_name)
+
+
+# ---------------------------------------------------------------------------
+# PluginManager
+# ---------------------------------------------------------------------------
+
+class PluginManager:
+    """Central manager that discovers, loads, and invokes plugins."""
+
+    def __init__(self) -> None:
+        self._plugins: Dict[str, LoadedPlugin] = {}
+        self._hooks: Dict[str, List[Callable]] = {}
+        self._plugin_tool_names: Set[str] = set()
+        self._discovered: bool = False
+
+    # -----------------------------------------------------------------------
+    # Public
+    # -----------------------------------------------------------------------
+
+    def discover_and_load(self) -> None:
+        """Scan all plugin sources and load each plugin found."""
+        if self._discovered:
+            return
+        self._discovered = True
+
+        manifests: List[PluginManifest] = []
+
+        # 1. User plugins (~/.hermes/plugins/)
+        hermes_home = os.environ.get("HERMES_HOME", os.path.expanduser("~/.hermes"))
+        user_dir = Path(hermes_home) / "plugins"
+        manifests.extend(self._scan_directory(user_dir, source="user"))
+
+        # 2. Project plugins (./.hermes/plugins/)
+        project_dir = Path.cwd() / ".hermes" / "plugins"
+        manifests.extend(self._scan_directory(project_dir, source="project"))
+
+        # 3. Pip / entry-point plugins
+        manifests.extend(self._scan_entry_points())
+
+        # Load each manifest
+        for manifest in manifests:
+            self._load_plugin(manifest)
+
+        if manifests:
+            logger.info(
+                "Plugin discovery complete: %d found, %d enabled",
+                len(self._plugins),
+                sum(1 for p in self._plugins.values() if p.enabled),
+            )
+
+    # -----------------------------------------------------------------------
+    # Directory scanning
+    # -----------------------------------------------------------------------
+
+    def _scan_directory(self, path: Path, source: str) -> List[PluginManifest]:
+        """Read ``plugin.yaml`` manifests from subdirectories of *path*."""
+        manifests: List[PluginManifest] = []
+        if not path.is_dir():
+            return manifests
+
+        for child in sorted(path.iterdir()):
+            if not child.is_dir():
+                continue
+            manifest_file = child / "plugin.yaml"
+            if not manifest_file.exists():
+                manifest_file = child / "plugin.yml"
+            if not manifest_file.exists():
+                logger.debug("Skipping %s (no plugin.yaml)", child)
+                continue
+
+            try:
+                if yaml is None:
+                    logger.warning("PyYAML not installed – cannot load %s", manifest_file)
+                    continue
+                data = yaml.safe_load(manifest_file.read_text()) or {}
+                manifest = PluginManifest(
+                    name=data.get("name", child.name),
+                    version=str(data.get("version", "")),
+                    description=data.get("description", ""),
+                    author=data.get("author", ""),
+                    requires_env=data.get("requires_env", []),
+                    provides_tools=data.get("provides_tools", []),
+                    provides_hooks=data.get("provides_hooks", []),
+                    source=source,
+                    path=str(child),
+                )
+                manifests.append(manifest)
+            except Exception as exc:
+                logger.warning("Failed to parse %s: %s", manifest_file, exc)
+
+        return manifests
+
+    # -----------------------------------------------------------------------
+    # Entry-point scanning
+    # -----------------------------------------------------------------------
+
+    def _scan_entry_points(self) -> List[PluginManifest]:
+        """Check ``importlib.metadata`` for pip-installed plugins."""
+        manifests: List[PluginManifest] = []
+        try:
+            eps = importlib.metadata.entry_points()
+            # Python 3.12+ returns a SelectableGroups; earlier returns dict
+            if hasattr(eps, "select"):
+                group_eps = eps.select(group=ENTRY_POINTS_GROUP)
+            elif isinstance(eps, dict):
+                group_eps = eps.get(ENTRY_POINTS_GROUP, [])
+            else:
+                group_eps = [ep for ep in eps if ep.group == ENTRY_POINTS_GROUP]
+
+            for ep in group_eps:
+                manifest = PluginManifest(
+                    name=ep.name,
+                    source="entrypoint",
+                    path=ep.value,
+                )
+                manifests.append(manifest)
+        except Exception as exc:
+            logger.debug("Entry-point scan failed: %s", exc)
+
+        return manifests
+
+    # -----------------------------------------------------------------------
+    # Loading
+    # -----------------------------------------------------------------------
+
+    def _load_plugin(self, manifest: PluginManifest) -> None:
+        """Import a plugin module and call its ``register(ctx)`` function."""
+        loaded = LoadedPlugin(manifest=manifest)
+
+        try:
+            if manifest.source in ("user", "project"):
+                module = self._load_directory_module(manifest)
+            else:
+                module = self._load_entrypoint_module(manifest)
+
+            loaded.module = module
+
+            # Call register()
+            register_fn = getattr(module, "register", None)
+            if register_fn is None:
+                loaded.error = "no register() function"
+                logger.warning("Plugin '%s' has no register() function", manifest.name)
+            else:
+                ctx = PluginContext(manifest, self)
+                register_fn(ctx)
+                loaded.tools_registered = [
+                    t for t in self._plugin_tool_names
+                    if t not in {
+                        n
+                        for name, p in self._plugins.items()
+                        for n in p.tools_registered
+                    }
+                ]
+                loaded.hooks_registered = list(
+                    {
+                        h
+                        for h, cbs in self._hooks.items()
+                        if cbs  # non-empty
+                    }
+                    - {
+                        h
+                        for name, p in self._plugins.items()
+                        for h in p.hooks_registered
+                    }
+                )
+                loaded.enabled = True
+
+        except Exception as exc:
+            loaded.error = str(exc)
+            logger.warning("Failed to load plugin '%s': %s", manifest.name, exc)
+
+        self._plugins[manifest.name] = loaded
+
+    def _load_directory_module(self, manifest: PluginManifest) -> types.ModuleType:
+        """Import a directory-based plugin as ``hermes_plugins.<name>``."""
+        plugin_dir = Path(manifest.path)  # type: ignore[arg-type]
+        init_file = plugin_dir / "__init__.py"
+        if not init_file.exists():
+            raise FileNotFoundError(f"No __init__.py in {plugin_dir}")
+
+        # Ensure the namespace parent package exists
+        if _NS_PARENT not in sys.modules:
+            ns_pkg = types.ModuleType(_NS_PARENT)
+            ns_pkg.__path__ = []  # type: ignore[attr-defined]
+            ns_pkg.__package__ = _NS_PARENT
+            sys.modules[_NS_PARENT] = ns_pkg
+
+        module_name = f"{_NS_PARENT}.{manifest.name.replace('-', '_')}"
+        spec = importlib.util.spec_from_file_location(
+            module_name,
+            init_file,
+            submodule_search_locations=[str(plugin_dir)],
+        )
+        if spec is None or spec.loader is None:
+            raise ImportError(f"Cannot create module spec for {init_file}")
+
+        module = importlib.util.module_from_spec(spec)
+        module.__package__ = module_name
+        module.__path__ = [str(plugin_dir)]  # type: ignore[attr-defined]
+        sys.modules[module_name] = module
+        spec.loader.exec_module(module)
+        return module
+
+    def _load_entrypoint_module(self, manifest: PluginManifest) -> types.ModuleType:
+        """Load a pip-installed plugin via its entry-point reference."""
+        eps = importlib.metadata.entry_points()
+        if hasattr(eps, "select"):
+            group_eps = eps.select(group=ENTRY_POINTS_GROUP)
+        elif isinstance(eps, dict):
+            group_eps = eps.get(ENTRY_POINTS_GROUP, [])
+        else:
+            group_eps = [ep for ep in eps if ep.group == ENTRY_POINTS_GROUP]
+
+        for ep in group_eps:
+            if ep.name == manifest.name:
+                return ep.load()
+
+        raise ImportError(
+            f"Entry point '{manifest.name}' not found in group '{ENTRY_POINTS_GROUP}'"
+        )
+
+    # -----------------------------------------------------------------------
+    # Hook invocation
+    # -----------------------------------------------------------------------
+
+    def invoke_hook(self, hook_name: str, **kwargs: Any) -> None:
+        """Call all registered callbacks for *hook_name*.
+
+        Each callback is wrapped in its own try/except so a misbehaving
+        plugin cannot break the core agent loop.
+        """
+        callbacks = self._hooks.get(hook_name, [])
+        for cb in callbacks:
+            try:
+                cb(**kwargs)
+            except Exception as exc:
+                logger.warning(
+                    "Hook '%s' callback %s raised: %s",
+                    hook_name,
+                    getattr(cb, "__name__", repr(cb)),
+                    exc,
+                )
+
+    # -----------------------------------------------------------------------
+    # Introspection
+    # -----------------------------------------------------------------------
+
+    def list_plugins(self) -> List[Dict[str, Any]]:
+        """Return a list of info dicts for all discovered plugins."""
+        result: List[Dict[str, Any]] = []
+        for name, loaded in sorted(self._plugins.items()):
+            result.append(
+                {
+                    "name": name,
+                    "version": loaded.manifest.version,
+                    "description": loaded.manifest.description,
+                    "source": loaded.manifest.source,
+                    "enabled": loaded.enabled,
+                    "tools": len(loaded.tools_registered),
+                    "hooks": len(loaded.hooks_registered),
+                    "error": loaded.error,
+                }
+            )
+        return result
+
+
+# ---------------------------------------------------------------------------
+# Module-level singleton & convenience functions
+# ---------------------------------------------------------------------------
+
+_plugin_manager: Optional[PluginManager] = None
+
+
+def get_plugin_manager() -> PluginManager:
+    """Return (and lazily create) the global PluginManager singleton."""
+    global _plugin_manager
+    if _plugin_manager is None:
+        _plugin_manager = PluginManager()
+    return _plugin_manager
+
+
+def discover_plugins() -> None:
+    """Discover and load all plugins (idempotent)."""
+    get_plugin_manager().discover_and_load()
+
+
+def invoke_hook(hook_name: str, **kwargs: Any) -> None:
+    """Invoke a lifecycle hook on all loaded plugins."""
+    get_plugin_manager().invoke_hook(hook_name, **kwargs)
+
+
+def get_plugin_tool_names() -> Set[str]:
+    """Return the set of tool names registered by plugins."""
+    return get_plugin_manager()._plugin_tool_names
@@ -59,6 +59,7 @@ _DEFAULT_PROVIDER_MODELS = {
    "kimi-coding": ["kimi-k2.5", "kimi-k2-thinking", "kimi-k2-turbo-preview"],
    "minimax": ["MiniMax-M2.5", "MiniMax-M2.5-highspeed", "MiniMax-M2.1"],
    "minimax-cn": ["MiniMax-M2.5", "MiniMax-M2.5-highspeed", "MiniMax-M2.1"],
+    "ai-gateway": ["anthropic/claude-opus-4.6", "anthropic/claude-sonnet-4.6", "openai/gpt-5", "google/gemini-3-flash"],
 }


@@ -724,6 +725,7 @@ def setup_model_provider(config: dict):
        "MiniMax (global endpoint)",
        "MiniMax China (mainland China endpoint)",
        "Anthropic (Claude models — API key or Claude Code subscription)",
+        "AI Gateway (Vercel — 200+ models, pay-per-use)",
    ]
    if keep_label:
        provider_choices.append(keep_label)
@@ -1232,7 +1234,39 @@ def setup_model_provider(config: dict):
        _set_model_provider(config, "anthropic")
        selected_base_url = ""

-    # else: provider_idx == 9 (Keep current) — only shown when a provider already exists
+    elif provider_idx == 9:  # AI Gateway
+        selected_provider = "ai-gateway"
+        print()
+        print_header("AI Gateway API Key")
+        pconfig = PROVIDER_REGISTRY["ai-gateway"]
+        print_info(f"Provider: {pconfig.name}")
+        print_info("Get your API key at: https://vercel.com/docs/ai-gateway")
+        print()
+
+        existing_key = get_env_value("AI_GATEWAY_API_KEY")
+        if existing_key:
+            print_info(f"Current: {existing_key[:8]}... (configured)")
+            if prompt_yes_no("Update API key?", False):
+                api_key = prompt("  AI Gateway API key", password=True)
+                if api_key:
+                    save_env_value("AI_GATEWAY_API_KEY", api_key)
+                    print_success("AI Gateway API key updated")
+        else:
+            api_key = prompt("  AI Gateway API key", password=True)
+            if api_key:
+                save_env_value("AI_GATEWAY_API_KEY", api_key)
+                print_success("AI Gateway API key saved")
+            else:
+                print_warning("Skipped - agent won't work without an API key")
+
+        # Clear custom endpoint vars if switching
+        if existing_custom:
+            save_env_value("OPENAI_BASE_URL", "")
+            save_env_value("OPENAI_API_KEY", "")
+        _update_config_for_provider("ai-gateway", pconfig.inference_base_url, default_model="anthropic/claude-opus-4.6")
+        _set_model_provider(config, "ai-gateway", pconfig.inference_base_url)
+
+    # else: provider_idx == 10 (Keep current) — only shown when a provider already exists
    # Normalize "keep current" to an explicit provider so downstream logic
    # doesn't fall back to the generic OpenRouter/static-model path.
    if selected_provider is None:
@@ -1269,6 +1303,7 @@ def setup_model_provider(config: dict):
            "minimax": "MiniMax",
            "minimax-cn": "MiniMax CN",
            "anthropic": "Anthropic",
+            "ai-gateway": "AI Gateway",
            "custom": "your custom endpoint",
        }
        _prov_display = _prov_names.get(selected_provider, selected_provider or "your provider")
@@ -1402,7 +1437,7 @@ def setup_model_provider(config: dict):
                    _set_default_model(config, custom)
            _update_config_for_provider("openai-codex", DEFAULT_CODEX_BASE_URL)
            _set_model_provider(config, "openai-codex", DEFAULT_CODEX_BASE_URL)
-        elif selected_provider in ("zai", "kimi-coding", "minimax", "minimax-cn"):
+        elif selected_provider in ("zai", "kimi-coding", "minimax", "minimax-cn", "ai-gateway"):
            _setup_provider_model_selection(
                config, selected_provider, current_model,
                prompt_choice, prompt,
@@ -304,7 +304,7 @@ def do_browse(page: int = 1, page_size: int = 20, source: str = "all",


 def do_install(identifier: str, category: str = "", force: bool = False,
-               console: Optional[Console] = None) -> None:
+               console: Optional[Console] = None, skip_confirm: bool = False) -> None:
    """Fetch, quarantine, scan, confirm, and install a skill."""
    from tools.skills_hub import (
        GitHubAuth, create_source_router, ensure_hub_dirs,
@@ -378,7 +378,8 @@ def do_install(identifier: str, category: str = "", force: bool = False,
            c.print(Panel("\n".join(metadata_lines), title="Upstream Metadata", border_style="blue"))

    # Confirm with user — show appropriate warning based on source
-    if not force:
+    # skip_confirm bypasses the prompt (needed in TUI mode where input() hangs)
+    if not force and not skip_confirm:
        c.print()
        if bundle.source == "official":
            c.print(Panel(
@@ -598,20 +599,23 @@ def do_audit(name: Optional[str] = None, console: Optional[Console] = None) -> N
        c.print()


-def do_uninstall(name: str, console: Optional[Console] = None) -> None:
+def do_uninstall(name: str, console: Optional[Console] = None,
+                 skip_confirm: bool = False) -> None:
    """Remove a hub-installed skill with confirmation."""
    from tools.skills_hub import uninstall_skill

    c = console or _console

-    c.print(f"\n[bold]Uninstall '{name}'?[/]")
-    try:
-        answer = input("Confirm [y/N]: ").strip().lower()
-    except (EOFError, KeyboardInterrupt):
-        answer = "n"
-    if answer not in ("y", "yes"):
-        c.print("[dim]Cancelled.[/]\n")
-        return
+    # skip_confirm bypasses the prompt (needed in TUI mode where input() hangs)
+    if not skip_confirm:
+        c.print(f"\n[bold]Uninstall '{name}'?[/]")
+        try:
+            answer = input("Confirm [y/N]: ").strip().lower()
+        except (EOFError, KeyboardInterrupt):
+            answer = "n"
+        if answer not in ("y", "yes"):
+            c.print("[dim]Cancelled.[/]\n")
+            return

    success, msg = uninstall_skill(name)
    if success:
@@ -923,7 +927,8 @@ def skills_command(args) -> None:
    elif action == "search":
        do_search(args.query, source=args.source, limit=args.limit)
    elif action == "install":
-        do_install(args.identifier, category=args.category, force=args.force)
+        do_install(args.identifier, category=args.category, force=args.force,
+                   skip_confirm=getattr(args, "yes", False))
    elif action == "inspect":
        do_inspect(args.identifier)
    elif action == "list":
@@ -1054,11 +1059,15 @@ def handle_skills_slash(cmd: str, console: Optional[Console] = None) -> None:
            return
        identifier = args[0]
        category = ""
-        force = any(flag in args for flag in ("--force", "--yes", "-y"))
+        # --yes / -y bypasses confirmation prompt (needed in TUI mode)
+        # --force handles reinstall override
+        skip_confirm = any(flag in args for flag in ("--yes", "-y"))
+        force = "--force" in args
        for i, a in enumerate(args):
            if a == "--category" and i + 1 < len(args):
                category = args[i + 1]
-        do_install(identifier, category=category, force=force, console=c)
+        do_install(identifier, category=category, force=force,
+                   skip_confirm=skip_confirm, console=c)

    elif action == "inspect":
        if not args:
@@ -1088,9 +1097,10 @@ def handle_skills_slash(cmd: str, console: Optional[Console] = None) -> None:

    elif action == "uninstall":
        if not args:
-            c.print("[bold red]Usage:[/] /skills uninstall <name>\n")
+            c.print("[bold red]Usage:[/] /skills uninstall <name> [--yes]\n")
            return
-        do_uninstall(args[0], console=c)
+        skip_confirm = any(flag in args for flag in ("--yes", "-y"))
+        do_uninstall(args[0], console=c, skip_confirm=skip_confirm)

    elif action == "publish":
        if not args:
@@ -190,6 +190,7 @@ TOOL_CATEGORIES = {
                "name": "Local Browser",
                "tag": "Free headless Chromium (no API key needed)",
                "env_vars": [],
+                "browser_provider": None,
                "post_setup": "browserbase",  # Same npm install for agent-browser
            },
            {
@@ -199,6 +200,16 @@ TOOL_CATEGORIES = {
                    {"key": "BROWSERBASE_API_KEY", "prompt": "Browserbase API key", "url": "https://browserbase.com"},
                    {"key": "BROWSERBASE_PROJECT_ID", "prompt": "Browserbase project ID"},
                ],
+                "browser_provider": "browserbase",
+                "post_setup": "browserbase",
+            },
+            {
+                "name": "Browser Use",
+                "tag": "Cloud browser with remote execution",
+                "env_vars": [
+                    {"key": "BROWSER_USE_API_KEY", "prompt": "Browser Use API key", "url": "https://browser-use.com"},
+                ],
+                "browser_provider": "browser-use",
                "post_setup": "browserbase",
            },
        ],
@@ -575,10 +586,10 @@ def _configure_tool_category(ts_key: str, cat: dict, config: dict):
            configured = ""
            env_vars = p.get("env_vars", [])
            if not env_vars or all(get_env_value(v["key"]) for v in env_vars):
-                if p.get("tts_provider") and config.get("tts", {}).get("provider") == p["tts_provider"]:
+                if _is_provider_active(p, config):
                    configured = " [active]"
                elif not env_vars:
-                    configured = " [active]" if config.get("tts", {}).get("provider", "edge") == p.get("tts_provider", "") else ""
+                    configured = ""
                else:
                    configured = " [configured]"
            provider_choices.append(f"{p['name']}{tag}{configured}")
@@ -587,15 +598,7 @@ def _configure_tool_category(ts_key: str, cat: dict, config: dict):
        provider_choices.append("Skip — keep defaults / configure later")

        # Detect current provider as default
-        default_idx = 0
-        for i, p in enumerate(providers):
-            if p.get("tts_provider") and config.get("tts", {}).get("provider") == p["tts_provider"]:
-                default_idx = i
-                break
-            env_vars = p.get("env_vars", [])
-            if env_vars and all(get_env_value(v["key"]) for v in env_vars):
-                default_idx = i
-                break
+        default_idx = _detect_active_provider_index(providers, config)

        provider_idx = _prompt_choice(f"  {title}:", provider_choices, default_idx)

@@ -607,6 +610,28 @@ def _configure_tool_category(ts_key: str, cat: dict, config: dict):
        _configure_provider(providers[provider_idx], config)


+def _is_provider_active(provider: dict, config: dict) -> bool:
+    """Check if a provider entry matches the currently active config."""
+    if provider.get("tts_provider"):
+        return config.get("tts", {}).get("provider") == provider["tts_provider"]
+    if "browser_provider" in provider:
+        current = config.get("browser", {}).get("cloud_provider")
+        return provider["browser_provider"] == current
+    return False
+
+
+def _detect_active_provider_index(providers: list, config: dict) -> int:
+    """Return the index of the currently active provider, or 0."""
+    for i, p in enumerate(providers):
+        if _is_provider_active(p, config):
+            return i
+        # Fallback: env vars present → likely configured
+        env_vars = p.get("env_vars", [])
+        if env_vars and all(get_env_value(v["key"]) for v in env_vars):
+            return i
+    return 0
+
+
 def _configure_provider(provider: dict, config: dict):
    """Configure a single provider - prompt for API keys and set config."""
    env_vars = provider.get("env_vars", [])
@@ -615,6 +640,15 @@ def _configure_provider(provider: dict, config: dict):
    if provider.get("tts_provider"):
        config.setdefault("tts", {})["provider"] = provider["tts_provider"]

+    # Set browser cloud provider in config if applicable
+    if "browser_provider" in provider:
+        bp = provider["browser_provider"]
+        if bp:
+            config.setdefault("browser", {})["cloud_provider"] = bp
+            _print_success(f"  Browser cloud provider set to: {bp}")
+        else:
+            config.get("browser", {}).pop("cloud_provider", None)
+
    if not env_vars:
        _print_success(f"  {provider['name']} - no configuration needed!")
        return
@@ -767,7 +801,7 @@ def _configure_tool_category_for_reconfig(ts_key: str, cat: dict, config: dict):
            configured = ""
            env_vars = p.get("env_vars", [])
            if not env_vars or all(get_env_value(v["key"]) for v in env_vars):
-                if p.get("tts_provider") and config.get("tts", {}).get("provider") == p["tts_provider"]:
+                if _is_provider_active(p, config):
                    configured = " [active]"
                elif not env_vars:
                    configured = ""
@@ -775,15 +809,7 @@ def _configure_tool_category_for_reconfig(ts_key: str, cat: dict, config: dict):
                    configured = " [configured]"
            provider_choices.append(f"{p['name']}{tag}{configured}")

-        default_idx = 0
-        for i, p in enumerate(providers):
-            if p.get("tts_provider") and config.get("tts", {}).get("provider") == p["tts_provider"]:
-                default_idx = i
-                break
-            env_vars = p.get("env_vars", [])
-            if env_vars and all(get_env_value(v["key"]) for v in env_vars):
-                default_idx = i
-                break
+        default_idx = _detect_active_provider_index(providers, config)

        provider_idx = _prompt_choice("  Select provider:", provider_choices, default_idx)
        _reconfigure_provider(providers[provider_idx], config)
@@ -797,6 +823,15 @@ def _reconfigure_provider(provider: dict, config: dict):
        config.setdefault("tts", {})["provider"] = provider["tts_provider"]
        _print_success(f"  TTS provider set to: {provider['tts_provider']}")

+    if "browser_provider" in provider:
+        bp = provider["browser_provider"]
+        if bp:
+            config.setdefault("browser", {})["cloud_provider"] = bp
+            _print_success(f"  Browser cloud provider set to: {bp}")
+        else:
+            config.get("browser", {}).pop("cloud_provider", None)
+            _print_success(f"  Browser set to local mode")
+
    if not env_vars:
        _print_success(f"  {provider['name']} - no configuration needed!")
        return
@@ -8,5 +8,9 @@ OPENROUTER_BASE_URL = "https://openrouter.ai/api/v1"
 OPENROUTER_MODELS_URL = f"{OPENROUTER_BASE_URL}/models"
 OPENROUTER_CHAT_URL = f"{OPENROUTER_BASE_URL}/chat/completions"

+AI_GATEWAY_BASE_URL = "https://ai-gateway.vercel.sh/v1"
+AI_GATEWAY_MODELS_URL = f"{AI_GATEWAY_BASE_URL}/models"
+AI_GATEWAY_CHAT_URL = f"{AI_GATEWAY_BASE_URL}/chat/completions"
+
 NOUS_API_BASE_URL = "https://inference-api.nousresearch.com/v1"
 NOUS_API_CHAT_URL = f"{NOUS_API_BASE_URL}/chat/completions"
@@ -114,11 +114,12 @@ class HonchoClientConfig:
    @classmethod
    def from_env(cls, workspace_id: str = "hermes") -> HonchoClientConfig:
        """Create config from environment variables (fallback)."""
+        api_key = os.environ.get("HONCHO_API_KEY")
        return cls(
            workspace_id=workspace_id,
-            api_key=os.environ.get("HONCHO_API_KEY"),
+            api_key=api_key,
            environment=os.environ.get("HONCHO_ENVIRONMENT", "production"),
-            enabled=True,
+            enabled=bool(api_key),
        )

    @classmethod
@@ -113,6 +113,13 @@ try:
 except Exception as e:
    logger.debug("MCP tool discovery failed: %s", e)

+# Plugin tool discovery (user/project/pip plugins)
+try:
+    from hermes_cli.plugins import discover_plugins
+    discover_plugins()
+except Exception as e:
+    logger.debug("Plugin discovery failed: %s", e)
+

 # =============================================================================
 # Backward-compat constants  (built once after discovery)
@@ -222,6 +229,16 @@ def get_tool_definitions(
        for ts_name in get_all_toolsets():
            tools_to_include.update(resolve_toolset(ts_name))

+    # Always include plugin-registered tools — they bypass the toolset filter
+    # because their toolsets are dynamic (created at plugin load time).
+    try:
+        from hermes_cli.plugins import get_plugin_tool_names
+        plugin_tools = get_plugin_tool_names()
+        if plugin_tools:
+            tools_to_include.update(plugin_tools)
+    except Exception:
+        pass
+
    # Ask the registry for schemas (only returns tools whose check_fn passes)
    filtered_tools = registry.get_definitions(tools_to_include, quiet=quiet_mode)

@@ -300,25 +317,39 @@ def handle_function_call(
        if function_name in _AGENT_LOOP_TOOLS:
            return json.dumps({"error": f"{function_name} must be handled by the agent loop"})

+        try:
+            from hermes_cli.plugins import invoke_hook
+            invoke_hook("pre_tool_call", tool_name=function_name, args=function_args, task_id=task_id or "")
+        except Exception:
+            pass
+
        if function_name == "execute_code":
            # Prefer the caller-provided list so subagents can't overwrite
            # the parent's tool set via the process-global.
            sandbox_enabled = enabled_tools if enabled_tools is not None else _last_resolved_tool_names
-            return registry.dispatch(
+            result = registry.dispatch(
                function_name, function_args,
                task_id=task_id,
                enabled_tools=sandbox_enabled,
                honcho_manager=honcho_manager,
                honcho_session_key=honcho_session_key,
            )
+        else:
+            result = registry.dispatch(
+                function_name, function_args,
+                task_id=task_id,
+                user_task=user_task,
+                honcho_manager=honcho_manager,
+                honcho_session_key=honcho_session_key,
+            )

-        return registry.dispatch(
-            function_name, function_args,
-            task_id=task_id,
-            user_task=user_task,
-            honcho_manager=honcho_manager,
-            honcho_session_key=honcho_session_key,
-        )
+        try:
+            from hermes_cli.plugins import invoke_hook
+            invoke_hook("post_tool_call", tool_name=function_name, args=function_args, result=result, task_id=task_id or "")
+        except Exception:
+            pass
+
+        return result

    except Exception as e:
        error_msg = f"Error executing {function_name}: {str(e)}"
@@ -0,0 +1,231 @@
+---
+name: base
+description: Query Base (Ethereum L2) blockchain data with USD pricing — wallet balances, token info, transaction details, gas analysis, contract inspection, whale detection, and live network stats. Uses Base RPC + CoinGecko. No API key required.
+version: 0.1.0
+author: youssefea
+license: MIT
+metadata:
+  hermes:
+    tags: [Base, Blockchain, Crypto, Web3, RPC, DeFi, EVM, L2, Ethereum]
+    related_skills: []
+---
+
+# Base Blockchain Skill
+
+Query Base (Ethereum L2) on-chain data enriched with USD pricing via CoinGecko.
+8 commands: wallet portfolio, token info, transactions, gas analysis,
+contract inspection, whale detection, network stats, and price lookup.
+
+No API key needed. Uses only Python standard library (urllib, json, argparse).
+
+---
+
+## When to Use
+
+- User asks for a Base wallet balance, token holdings, or portfolio value
+- User wants to inspect a specific transaction by hash
+- User wants ERC-20 token metadata, price, supply, or market cap
+- User wants to understand Base gas costs and L1 data fees
+- User wants to inspect a contract (ERC type detection, proxy resolution)
+- User wants to find large ETH transfers (whale detection)
+- User wants Base network health, gas price, or ETH price
+- User asks "what's the price of USDC/AERO/DEGEN/ETH?"
+
+---
+
+## Prerequisites
+
+The helper script uses only Python standard library (urllib, json, argparse).
+No external packages required.
+
+Pricing data comes from CoinGecko's free API (no key needed, rate-limited
+to ~10-30 requests/minute). For faster lookups, use `--no-prices` flag.
+
+---
+
+## Quick Reference
+
+RPC endpoint (default): https://mainnet.base.org
+Override: export BASE_RPC_URL=https://your-private-rpc.com
+
+Helper script path: ~/.hermes/skills/blockchain/base/scripts/base_client.py
+
+```
+python3 base_client.py wallet   <address> [--limit N] [--all] [--no-prices]
+python3 base_client.py tx       <hash>
+python3 base_client.py token    <contract_address>
+python3 base_client.py gas
+python3 base_client.py contract <address>
+python3 base_client.py whales   [--min-eth N]
+python3 base_client.py stats
+python3 base_client.py price    <contract_address_or_symbol>
+```
+
+---
+
+## Procedure
+
+### 0. Setup Check
+
+```bash
+python3 --version
+
+# Optional: set a private RPC for better rate limits
+export BASE_RPC_URL="https://mainnet.base.org"
+
+# Confirm connectivity
+python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py stats
+```
+
+### 1. Wallet Portfolio
+
+Get ETH balance and ERC-20 token holdings with USD values.
+Checks ~15 well-known Base tokens (USDC, WETH, AERO, DEGEN, etc.)
+via on-chain `balanceOf` calls. Tokens sorted by value, dust filtered.
+
+```bash
+python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py \
+  wallet 0xd8dA6BF26964aF9D7eEd9e03E53415D37aA96045
+```
+
+Flags:
+- `--limit N` — show top N tokens (default: 20)
+- `--all` — show all tokens, no dust filter, no limit
+- `--no-prices` — skip CoinGecko price lookups (faster, RPC-only)
+
+Output includes: ETH balance + USD value, token list with prices sorted
+by value, dust count, total portfolio value in USD.
+
+Note: Only checks known tokens. Unknown ERC-20s are not discovered.
+Use the `token` command with a specific contract address for any token.
+
+### 2. Transaction Details
+
+Inspect a full transaction by its hash. Shows ETH value transferred,
+gas used, fee in ETH/USD, status, and decoded ERC-20/ERC-721 transfers.
+
+```bash
+python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py \
+  tx 0xabc123...your_tx_hash_here
+```
+
+Output: hash, block, from, to, value (ETH + USD), gas price, gas used,
+fee, status, contract creation address (if any), token transfers.
+
+### 3. Token Info
+
+Get ERC-20 token metadata: name, symbol, decimals, total supply, price,
+market cap, and contract code size.
+
+```bash
+python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py \
+  token 0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913
+```
+
+Output: name, symbol, decimals, total supply, price, market cap.
+Reads name/symbol/decimals directly from the contract via eth_call.
+
+### 4. Gas Analysis
+
+Detailed gas analysis with cost estimates for common operations.
+Shows current gas price, base fee trends over 10 blocks, block
+utilization, and estimated costs for ETH transfers, ERC-20 transfers,
+and swaps.
+
+```bash
+python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py gas
+```
+
+Output: current gas price, base fee, block utilization, 10-block trend,
+cost estimates in ETH and USD.
+
+Note: Base is an L2 — actual transaction costs include an L1 data
+posting fee that depends on calldata size and L1 gas prices. The
+estimates shown are for L2 execution only.
+
+### 5. Contract Inspection
+
+Inspect an address: determine if it's an EOA or contract, detect
+ERC-20/ERC-721/ERC-1155 interfaces, resolve EIP-1967 proxy
+implementation addresses.
+
+```bash
+python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py \
+  contract 0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913
+```
+
+Output: is_contract, code size, ETH balance, detected interfaces
+(ERC-20, ERC-721, ERC-1155), ERC-20 metadata, proxy implementation
+address.
+
+### 6. Whale Detector
+
+Scan the most recent block for large ETH transfers with USD values.
+
+```bash
+python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py \
+  whales --min-eth 1.0
+```
+
+Note: scans the latest block only — point-in-time snapshot, not historical.
+Default threshold is 1.0 ETH (lower than Solana's default since ETH
+values are higher).
+
+### 7. Network Stats
+
+Live Base network health: latest block, chain ID, gas price, base fee,
+block utilization, transaction count, and ETH price.
+
+```bash
+python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py stats
+```
+
+### 8. Price Lookup
+
+Quick price check for any token by contract address or known symbol.
+
+```bash
+python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py price ETH
+python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py price USDC
+python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py price AERO
+python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py price DEGEN
+python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py price 0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913
+```
+
+Known symbols: ETH, WETH, USDC, cbETH, AERO, DEGEN, TOSHI, BRETT,
+WELL, wstETH, rETH, cbBTC.
+
+---
+
+## Pitfalls
+
+- **CoinGecko rate-limits** — free tier allows ~10-30 requests/minute.
+  Price lookups use 1 request per token. Use `--no-prices` for speed.
+- **Public RPC rate-limits** — Base's public RPC limits requests.
+  For production use, set BASE_RPC_URL to a private endpoint
+  (Alchemy, QuickNode, Infura).
+- **Wallet shows known tokens only** — unlike Solana, EVM chains have no
+  built-in "get all tokens" RPC. The wallet command checks ~15 popular
+  Base tokens via `balanceOf`. Unknown ERC-20s won't appear. Use the
+  `token` command for any specific contract.
+- **Token names read from contract** — if a contract doesn't implement
+  `name()` or `symbol()`, these fields may be empty. Known tokens have
+  hardcoded labels as fallback.
+- **Gas estimates are L2 only** — Base transaction costs include an L1
+  data posting fee (depends on calldata size and L1 gas prices). The gas
+  command estimates L2 execution cost only.
+- **Whale detector scans latest block only** — not historical. Results
+  vary by the moment you query. Default threshold is 1.0 ETH.
+- **Proxy detection** — only EIP-1967 proxies are detected. Other proxy
+  patterns (EIP-1167 minimal proxy, custom storage slots) are not checked.
+- **Retry on 429** — both RPC and CoinGecko calls retry up to 2 times
+  with exponential backoff on rate-limit errors.
+
+---
+
+## Verification
+
+```bash
+# Should print Base chain ID (8453), latest block, gas price, and ETH price
+python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py stats
+```
@@ -0,0 +1,116 @@
+---
+name: blender-mcp
+description: Control Blender directly from Hermes via socket connection to the blender-mcp addon. Create 3D objects, materials, animations, and run arbitrary Blender Python (bpy) code. Use when user wants to create or modify anything in Blender.
+version: 1.0.0
+requires: Blender 4.3+ (desktop instance required, headless not supported)
+author: alireza78a
+tags: [blender, 3d, animation, modeling, bpy, mcp]
+---
+
+# Blender MCP
+
+Control a running Blender instance from Hermes via socket on TCP port 9876.
+
+## Setup (one-time)
+
+### 1. Install the Blender addon
+
+    curl -sL https://raw.githubusercontent.com/ahujasid/blender-mcp/main/addon.py -o ~/Desktop/blender_mcp_addon.py
+
+In Blender:
+    Edit > Preferences > Add-ons > Install > select blender_mcp_addon.py
+    Enable "Interface: Blender MCP"
+
+### 2. Start the socket server in Blender
+
+Press N in Blender viewport to open sidebar.
+Find "BlenderMCP" tab and click "Start Server".
+
+### 3. Verify connection
+
+    nc -z -w2 localhost 9876 && echo "OPEN" || echo "CLOSED"
+
+## Protocol
+
+Plain UTF-8 JSON over TCP -- no length prefix.
+
+Send:     {"type": "<command>", "params": {<kwargs>}}
+Receive:  {"status": "success", "result": <value>}
+          {"status": "error",   "message": "<reason>"}
+
+## Available Commands
+
+| type                    | params            | description                     |
+|-------------------------|-------------------|---------------------------------|
+| execute_code            | code (str)        | Run arbitrary bpy Python code   |
+| get_scene_info          | (none)            | List all objects in scene       |
+| get_object_info         | object_name (str) | Details on a specific object    |
+| get_viewport_screenshot | (none)            | Screenshot of current viewport  |
+
+## Python Helper
+
+Use this inside execute_code tool calls:
+
+    import socket, json
+
+    def blender_exec(code: str, host="localhost", port=9876, timeout=15):
+        s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+        s.connect((host, port))
+        s.settimeout(timeout)
+        payload = json.dumps({"type": "execute_code", "params": {"code": code}})
+        s.sendall(payload.encode("utf-8"))
+        buf = b""
+        while True:
+            try:
+                chunk = s.recv(4096)
+                if not chunk:
+                    break
+                buf += chunk
+                try:
+                    json.loads(buf.decode("utf-8"))
+                    break
+                except json.JSONDecodeError:
+                    continue
+            except socket.timeout:
+                break
+        s.close()
+        return json.loads(buf.decode("utf-8"))
+
+## Common bpy Patterns
+
+### Clear scene
+    bpy.ops.object.select_all(action='SELECT')
+    bpy.ops.object.delete()
+
+### Add mesh objects
+    bpy.ops.mesh.primitive_uv_sphere_add(radius=1, location=(0, 0, 0))
+    bpy.ops.mesh.primitive_cube_add(size=2, location=(3, 0, 0))
+    bpy.ops.mesh.primitive_cylinder_add(radius=0.5, depth=2, location=(-3, 0, 0))
+
+### Create and assign material
+    mat = bpy.data.materials.new(name="MyMat")
+    mat.use_nodes = True
+    bsdf = mat.node_tree.nodes.get("Principled BSDF")
+    bsdf.inputs["Base Color"].default_value = (R, G, B, 1.0)
+    bsdf.inputs["Roughness"].default_value = 0.3
+    bsdf.inputs["Metallic"].default_value = 0.0
+    obj.data.materials.append(mat)
+
+### Keyframe animation
+    obj.location = (0, 0, 0)
+    obj.keyframe_insert(data_path="location", frame=1)
+    obj.location = (0, 0, 3)
+    obj.keyframe_insert(data_path="location", frame=60)
+
+### Render to file
+    bpy.context.scene.render.filepath = "/tmp/render.png"
+    bpy.context.scene.render.engine = 'CYCLES'
+    bpy.ops.render.render(write_still=True)
+
+## Pitfalls
+
+- Must check socket is open before running (nc -z localhost 9876)
+- Addon server must be started inside Blender each session (N-panel > BlenderMCP > Connect)
+- Break complex scenes into multiple smaller execute_code calls to avoid timeouts
+- Render output path must be absolute (/tmp/...) not relative
+- shade_smooth() requires object to be selected and in object mode
@@ -1,218 +0,0 @@
-# Checkpoint & Rollback — Implementation Plan
-
-## Goal
-
-Automatic filesystem snapshots before destructive file operations, with user-facing rollback. The agent never sees or interacts with this — it's transparent infrastructure.
-
-## Design Principles
-
-1. **Not a tool** — the LLM never knows about it. Zero prompt tokens, zero tool schema overhead.
-2. **Once per turn** — checkpoint at most once per conversation turn (user message → agent response cycle), triggered lazily on the first file-mutating operation. Not on every write.
-3. **Opt-in via config** — disabled by default, enabled with `checkpoints: true` in config.yaml.
-4. **Works on any directory** — uses a shadow git repo completely separate from the user's project git. Works on git repos, non-git directories, anything.
-5. **User-facing rollback** — `/rollback` slash command (CLI + gateway) to list and restore checkpoints. Also `hermes rollback` CLI subcommand.
-
-## Architecture
-
-```
-~/.hermes/checkpoints/
-  {sha256(abs_dir)[:16]}/       # Shadow git repo per working directory
-    HEAD, refs/, objects/...    # Standard git internals
-    HERMES_WORKDIR              # Original dir path (for display)
-    info/exclude                # Default excludes (node_modules, .env, etc.)
-```
-
-### Core: CheckpointManager (new file: tools/checkpoint_manager.py)
-
-Adapted from PR #559's CheckpointStore. Key changes from the PR:
-
- **Not a tool** — no schema, no registry entry, no handler
- **Turn-scoped deduplication** — tracks `_checkpointed_dirs: Set[str]` per turn
- **Configurable** — reads `checkpoints` config key
- **Pruning** — keeps last N snapshots per directory (default 50), prunes on take
-
-```python
-class CheckpointManager:
-    def __init__(self, enabled: bool = False, max_snapshots: int = 50):
-        self.enabled = enabled
-        self.max_snapshots = max_snapshots
-        self._checkpointed_dirs: Set[str] = set()  # reset each turn
-
-    def new_turn(self):
-        """Call at start of each conversation turn to reset dedup."""
-        self._checkpointed_dirs.clear()
-
-    def ensure_checkpoint(self, working_dir: str, reason: str = "auto") -> None:
-        """Take a checkpoint if enabled and not already done this turn."""
-        if not self.enabled:
-            return
-        abs_dir = str(Path(working_dir).resolve())
-        if abs_dir in self._checkpointed_dirs:
-            return
-        self._checkpointed_dirs.add(abs_dir)
-        try:
-            self._take(abs_dir, reason)
-        except Exception as e:
-            logger.debug("Checkpoint failed (non-fatal): %s", e)
-
-    def list_checkpoints(self, working_dir: str) -> List[dict]:
-        """List available checkpoints for a directory."""
-        ...
-
-    def restore(self, working_dir: str, commit_hash: str) -> dict:
-        """Restore files to a checkpoint state."""
-        ...
-
-    def _take(self, working_dir: str, reason: str):
-        """Shadow git: add -A + commit. Prune if over max_snapshots."""
-        ...
-
-    def _prune(self, shadow_repo: Path):
-        """Keep only last max_snapshots commits."""
-        ...
-```
-
-### Integration Point: run_agent.py
-
-The AIAgent already owns the conversation loop. Add CheckpointManager as an instance attribute:
-
-```python
-class AIAgent:
-    def __init__(self, ...):
-        ...
-        # Checkpoint manager — reads config to determine if enabled
-        self._checkpoint_mgr = CheckpointManager(
-            enabled=config.get("checkpoints", False),
-            max_snapshots=config.get("checkpoint_max_snapshots", 50),
-        )
-```
-
-**Turn boundary** — in `run_conversation()`, call `new_turn()` at the start of each agent iteration (before processing tool calls):
-
-```python
-# Inside the main loop, before _execute_tool_calls():
-self._checkpoint_mgr.new_turn()
-```
-
-**Trigger point** — in `_execute_tool_calls()`, before dispatching file-mutating tools:
-
-```python
-# Before the handle_function_call dispatch:
-if function_name in ("write_file", "patch"):
-    # Determine working dir from the file path in the args
-    file_path = function_args.get("path", "") or function_args.get("old_string", "")
-    if file_path:
-        work_dir = str(Path(file_path).parent.resolve())
-        self._checkpoint_mgr.ensure_checkpoint(work_dir, f"before {function_name}")
-```
-
-This means:
- First `write_file` in a turn → checkpoint (fast, one `git add -A && git commit`)
- Subsequent writes in the same turn → no-op (already checkpointed)
- Next turn (new user message) → fresh checkpoint eligibility
-
-### Config
-
-Add to `DEFAULT_CONFIG` in `hermes_cli/config.py`:
-
-```python
-"checkpoints": False,          # Enable filesystem checkpoints before destructive ops
-"checkpoint_max_snapshots": 50, # Max snapshots to keep per directory
-```
-
-User enables with:
-```yaml
-# ~/.hermes/config.yaml
-checkpoints: true
-```
-
-### User-Facing Rollback
-
-**CLI slash command** — add `/rollback` to `process_command()` in `cli.py`:
-
-```
-/rollback         — List recent checkpoints for the current directory
-/rollback <hash>  — Restore files to that checkpoint
-```
-
-Shows a numbered list:
-```
-📸 Checkpoints for /home/user/project:
-  1. abc1234  2026-03-09 21:15  before write_file (3 files changed)
-  2. def5678  2026-03-09 20:42  before patch (1 file changed)
-  3. ghi9012  2026-03-09 20:30  before write_file (2 files changed)
-
-Use /rollback <number> to restore, e.g. /rollback 1
-```
-
-**Gateway slash command** — add `/rollback` to gateway/run.py with the same behavior.
-
-**CLI subcommand** — `hermes rollback` (optional, lower priority).
-
-### What Gets Excluded (not checkpointed)
-
-Same as the PR's defaults — written to the shadow repo's `info/exclude`:
-
-```
-node_modules/
-dist/
-build/
-.env
-.env.*
-__pycache__/
-*.pyc
-.DS_Store
-*.log
-.cache/
-.venv/
-.git/
-```
-
-Also respects the project's `.gitignore` if present (shadow repo can read it via `core.excludesFile`).
-
-### Safety
-
- `ensure_checkpoint()` wraps everything in try/except — a checkpoint failure never blocks the actual file operation
- Shadow repo is completely isolated — GIT_DIR + GIT_WORK_TREE env vars, never touches user's .git
- If git isn't installed, checkpoints silently disable
- Large directories: add a file count check — skip checkpoint if >50K files to avoid slowdowns
-
-## Files to Create/Modify
-
-| File | Change |
-|------|--------|
-| `tools/checkpoint_manager.py` | **NEW** — CheckpointManager class (adapted from PR #559) |
-| `run_agent.py` | Add CheckpointManager init + trigger in `_execute_tool_calls()` |
-| `hermes_cli/config.py` | Add `checkpoints` + `checkpoint_max_snapshots` to DEFAULT_CONFIG |
-| `cli.py` | Add `/rollback` slash command handler |
-| `gateway/run.py` | Add `/rollback` slash command handler |
-| `tests/tools/test_checkpoint_manager.py` | **NEW** — tests (adapted from PR #559's tests) |
-
-## What We Take From PR #559
-
- `_shadow_repo_path()` — deterministic path hashing ✅
- `_git_env()` — GIT_DIR/GIT_WORK_TREE isolation ✅
- `_run_git()` — subprocess wrapper with timeout ✅
- `_init_shadow_repo()` — shadow repo initialization ✅
- `DEFAULT_EXCLUDES` list ✅
- Test structure and patterns ✅
-
-## What We Change From PR #559
-
- **Remove tool schema/registry** — not a tool
- **Remove injection into file_operations.py and patch_parser.py** — trigger from run_agent.py instead
- **Add turn-scoped deduplication** — one checkpoint per turn, not per operation
- **Add pruning** — keep last N snapshots
- **Add config flag** — opt-in, not mandatory
- **Add /rollback command** — user-facing restore UI
- **Add file count guard** — skip huge directories
-
-## Implementation Order
-
-1. `tools/checkpoint_manager.py` — core class with take/list/restore/prune
-2. `tests/tools/test_checkpoint_manager.py` — tests
-3. `hermes_cli/config.py` — config keys
-4. `run_agent.py` — integration (init + trigger)
-5. `cli.py` — `/rollback` slash command
-6. `gateway/run.py` — `/rollback` slash command
-7. Full test suite run + manual smoke test
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

 [project]
 name = "hermes-agent"
-version = "0.2.0"
+version = "0.3.0"
 description = "The self-improving AI agent — creates skills from experience, improves them during use, and runs anywhere"
 readme = "README.md"
 requires-python = ">=3.11"
@@ -296,6 +296,7 @@ class AIAgent:
        reasoning_callback: callable = None,
        clarify_callback: callable = None,
        step_callback: callable = None,
+        stream_delta_callback: callable = None,
        max_tokens: int = None,
        reasoning_config: Dict[str, Any] = None,
        prefill_messages: List[Dict[str, Any]] = None,
@@ -395,6 +396,7 @@ class AIAgent:
        self.reasoning_callback = reasoning_callback
        self.clarify_callback = clarify_callback
        self.step_callback = step_callback
+        self.stream_delta_callback = stream_delta_callback
        self._last_reported_tool = None  # Track for "new tool" mode
        
        # Interrupt mechanism for breaking out of tool loops
@@ -544,6 +546,8 @@ class AIAgent:
            effective_key = api_key or resolve_anthropic_token() or ""
            self._anthropic_api_key = effective_key
            self._anthropic_base_url = base_url
+            from agent.anthropic_adapter import _is_oauth_token as _is_oat
+            self._is_anthropic_oauth = _is_oat(effective_key)
            self._anthropic_client = build_anthropic_client(effective_key, base_url)
            # No OpenAI client needed for Anthropic mode
            self.client = None
@@ -812,7 +816,7 @@ class AIAgent:
                logger.debug("peer %s memory_mode=honcho: local USER.md writes disabled", _hcfg.peer_name or "user")

        # Skills config: nudge interval for skill creation reminders
-        self._skill_nudge_interval = 15
+        self._skill_nudge_interval = 10
        try:
            from hermes_cli.config import load_config as _load_skills_config
            skills_config = _load_skills_config().get("skills", {})
@@ -856,9 +860,9 @@ class AIAgent:
        """Verbose print — suppressed when streaming TTS is active.

        Pass ``force=True`` for error/warning messages that should always be
-        shown even during streaming TTS playback.
+        shown even during streaming playback (TTS or display).
        """
-        if not force and getattr(self, "_stream_callback", None) is not None:
+        if not force and self._has_stream_consumers():
            return
        print(*args, **kwargs)

@@ -2602,15 +2606,39 @@ class AIAgent:
    def _close_request_openai_client(self, client: Any, *, reason: str) -> None:
        self._close_openai_client(client, reason=reason, shared=False)

-    def _run_codex_stream(self, api_kwargs: dict, client: Any = None):
+    def _run_codex_stream(self, api_kwargs: dict, client: Any = None, on_first_delta: callable = None):
        """Execute one streaming Responses API request and return the final response."""
        active_client = client or self._ensure_primary_openai_client(reason="codex_stream_direct")
        max_stream_retries = 1
+        has_tool_calls = False
+        first_delta_fired = False
        for attempt in range(max_stream_retries + 1):
            try:
                with active_client.responses.stream(**api_kwargs) as stream:
-                    for _ in stream:
-                        pass
+                    for event in stream:
+                        if self._interrupt_requested:
+                            break
+                        event_type = getattr(event, "type", "")
+                        # Fire callbacks on text content deltas (suppress during tool calls)
+                        if "output_text.delta" in event_type or event_type == "response.output_text.delta":
+                            delta_text = getattr(event, "delta", "")
+                            if delta_text and not has_tool_calls:
+                                if not first_delta_fired:
+                                    first_delta_fired = True
+                                    if on_first_delta:
+                                        try:
+                                            on_first_delta()
+                                        except Exception:
+                                            pass
+                                self._fire_stream_delta(delta_text)
+                        # Track tool calls to suppress text streaming
+                        elif "function_call" in event_type:
+                            has_tool_calls = True
+                        # Fire reasoning callbacks
+                        elif "reasoning" in event_type and "delta" in event_type:
+                            reasoning_text = getattr(event, "delta", "")
+                            if reasoning_text:
+                                self._fire_reasoning_delta(reasoning_text)
                    return stream.get_final_response()
            except RuntimeError as exc:
                err_text = str(exc)
@@ -2791,6 +2819,7 @@ class AIAgent:
                    result["response"] = self._run_codex_stream(
                        api_kwargs,
                        client=request_client_holder["client"],
+                        on_first_delta=getattr(self, "_codex_on_first_delta", None),
                    )
                elif self.api_mode == "anthropic_messages":
                    result["response"] = self._anthropic_messages_create(api_kwargs)
@@ -2832,116 +2861,246 @@ class AIAgent:
            raise result["error"]
        return result["response"]

-    def _streaming_api_call(self, api_kwargs: dict, stream_callback):
-        """Streaming variant of _interruptible_api_call for voice TTS pipeline.
+    # ── Unified streaming API call ─────────────────────────────────────────

-        Uses ``stream=True`` and forwards content deltas to *stream_callback*
-        in real-time.  Returns a ``SimpleNamespace`` that mimics a normal
-        ``ChatCompletion`` so the rest of the agent loop works unchanged.
+    def _fire_stream_delta(self, text: str) -> None:
+        """Fire all registered stream delta callbacks (display + TTS)."""
+        for cb in (self.stream_delta_callback, self._stream_callback):
+            if cb is not None:
+                try:
+                    cb(text)
+                except Exception:
+                    pass

-        This method is separate from ``_interruptible_api_call`` to keep the
-        core agent loop untouched for non-voice users.
+    def _fire_reasoning_delta(self, text: str) -> None:
+        """Fire reasoning callback if registered."""
+        cb = self.reasoning_callback
+        if cb is not None:
+            try:
+                cb(text)
+            except Exception:
+                pass
+
+    def _has_stream_consumers(self) -> bool:
+        """Return True if any streaming consumer is registered."""
+        return (
+            self.stream_delta_callback is not None
+            or getattr(self, "_stream_callback", None) is not None
+        )
+
+    def _interruptible_streaming_api_call(
+        self, api_kwargs: dict, *, on_first_delta: callable = None
+    ):
+        """Streaming variant of _interruptible_api_call for real-time token delivery.
+
+        Handles all three api_modes:
+        - chat_completions: stream=True on OpenAI-compatible endpoints
+        - anthropic_messages: client.messages.stream() via Anthropic SDK
+        - codex_responses: delegates to _run_codex_stream (already streaming)
+
+        Fires stream_delta_callback and _stream_callback for each text token.
+        Tool-call turns suppress the callback — only text-only final responses
+        stream to the consumer.  Returns a SimpleNamespace that mimics the
+        non-streaming response shape so the rest of the agent loop is unchanged.
+
+        Falls back to _interruptible_api_call on provider errors indicating
+        streaming is not supported.
        """
+        if self.api_mode == "codex_responses":
+            # Codex streams internally via _run_codex_stream. The main dispatch
+            # in _interruptible_api_call already calls it; we just need to
+            # ensure on_first_delta reaches it. Store it on the instance
+            # temporarily so _run_codex_stream can pick it up.
+            self._codex_on_first_delta = on_first_delta
+            try:
+                return self._interruptible_api_call(api_kwargs)
+            finally:
+                self._codex_on_first_delta = None
+
        result = {"response": None, "error": None}
        request_client_holder = {"client": None}
+        first_delta_fired = {"done": False}
+        deltas_were_sent = {"yes": False}  # Track if any deltas were fired (for fallback)
+
+        def _fire_first_delta():
+            if not first_delta_fired["done"] and on_first_delta:
+                first_delta_fired["done"] = True
+                try:
+                    on_first_delta()
+                except Exception:
+                    pass
+
+        def _call_chat_completions():
+            """Stream a chat completions response."""
+            stream_kwargs = {**api_kwargs, "stream": True, "stream_options": {"include_usage": True}}
+            request_client_holder["client"] = self._create_request_openai_client(
+                reason="chat_completion_stream_request"
+            )
+            stream = request_client_holder["client"].chat.completions.create(**stream_kwargs)
+
+            content_parts: list = []
+            tool_calls_acc: dict = {}
+            finish_reason = None
+            model_name = None
+            role = "assistant"
+            reasoning_parts: list = []
+            usage_obj = None
+
+            for chunk in stream:
+                if self._interrupt_requested:
+                    break
+
+                if not chunk.choices:
+                    if hasattr(chunk, "model") and chunk.model:
+                        model_name = chunk.model
+                    # Usage comes in the final chunk with empty choices
+                    if hasattr(chunk, "usage") and chunk.usage:
+                        usage_obj = chunk.usage
+                    continue
+
+                delta = chunk.choices[0].delta
+                if hasattr(chunk, "model") and chunk.model:
+                    model_name = chunk.model
+
+                # Accumulate reasoning content
+                reasoning_text = getattr(delta, "reasoning_content", None) or getattr(delta, "reasoning", None)
+                if reasoning_text:
+                    reasoning_parts.append(reasoning_text)
+                    self._fire_reasoning_delta(reasoning_text)
+
+                # Accumulate text content — fire callback only when no tool calls
+                if delta and delta.content:
+                    content_parts.append(delta.content)
+                    if not tool_calls_acc:
+                        _fire_first_delta()
+                        self._fire_stream_delta(delta.content)
+                        deltas_were_sent["yes"] = True
+
+                # Accumulate tool call deltas (silently, no callback)
+                if delta and delta.tool_calls:
+                    for tc_delta in delta.tool_calls:
+                        idx = tc_delta.index if tc_delta.index is not None else 0
+                        if idx not in tool_calls_acc:
+                            tool_calls_acc[idx] = {
+                                "id": tc_delta.id or "",
+                                "type": "function",
+                                "function": {"name": "", "arguments": ""},
+                            }
+                        entry = tool_calls_acc[idx]
+                        if tc_delta.id:
+                            entry["id"] = tc_delta.id
+                        if tc_delta.function:
+                            if tc_delta.function.name:
+                                entry["function"]["name"] += tc_delta.function.name
+                            if tc_delta.function.arguments:
+                                entry["function"]["arguments"] += tc_delta.function.arguments
+
+                if chunk.choices[0].finish_reason:
+                    finish_reason = chunk.choices[0].finish_reason
+
+                # Usage in the final chunk
+                if hasattr(chunk, "usage") and chunk.usage:
+                    usage_obj = chunk.usage
+
+            # Build mock response matching non-streaming shape
+            full_content = "".join(content_parts) or None
+            mock_tool_calls = None
+            if tool_calls_acc:
+                mock_tool_calls = []
+                for idx in sorted(tool_calls_acc):
+                    tc = tool_calls_acc[idx]
+                    mock_tool_calls.append(SimpleNamespace(
+                        id=tc["id"],
+                        type=tc["type"],
+                        function=SimpleNamespace(
+                            name=tc["function"]["name"],
+                            arguments=tc["function"]["arguments"],
+                        ),
+                    ))
+
+            full_reasoning = "".join(reasoning_parts) or None
+            mock_message = SimpleNamespace(
+                role=role,
+                content=full_content,
+                tool_calls=mock_tool_calls,
+                reasoning_content=full_reasoning,
+            )
+            mock_choice = SimpleNamespace(
+                index=0,
+                message=mock_message,
+                finish_reason=finish_reason or "stop",
+            )
+            return SimpleNamespace(
+                id="stream-" + str(uuid.uuid4()),
+                model=model_name,
+                choices=[mock_choice],
+                usage=usage_obj,
+            )
+
+        def _call_anthropic():
+            """Stream an Anthropic Messages API response.
+
+            Fires delta callbacks for real-time token delivery, but returns
+            the native Anthropic Message object from get_final_message() so
+            the rest of the agent loop (validation, tool extraction, etc.)
+            works unchanged.
+            """
+            has_tool_use = False
+
+            # Use the Anthropic SDK's streaming context manager
+            with self._anthropic_client.messages.stream(**api_kwargs) as stream:
+                for event in stream:
+                    if self._interrupt_requested:
+                        break
+
+                    event_type = getattr(event, "type", None)
+
+                    if event_type == "content_block_start":
+                        block = getattr(event, "content_block", None)
+                        if block and getattr(block, "type", None) == "tool_use":
+                            has_tool_use = True
+
+                    elif event_type == "content_block_delta":
+                        delta = getattr(event, "delta", None)
+                        if delta:
+                            delta_type = getattr(delta, "type", None)
+                            if delta_type == "text_delta":
+                                text = getattr(delta, "text", "")
+                                if text and not has_tool_use:
+                                    _fire_first_delta()
+                                    self._fire_stream_delta(text)
+                            elif delta_type == "thinking_delta":
+                                thinking_text = getattr(delta, "thinking", "")
+                                if thinking_text:
+                                    self._fire_reasoning_delta(thinking_text)
+
+                # Return the native Anthropic Message for downstream processing
+                return stream.get_final_message()

        def _call():
            try:
-                stream_kwargs = {**api_kwargs, "stream": True}
-                request_client_holder["client"] = self._create_request_openai_client(
-                    reason="chat_completion_stream_request"
-                )
-                stream = request_client_holder["client"].chat.completions.create(**stream_kwargs)
-
-                content_parts: list[str] = []
-                tool_calls_acc: dict[int, dict] = {}
-                finish_reason = None
-                model_name = None
-                role = "assistant"
-
-                for chunk in stream:
-                    if not chunk.choices:
-                        if hasattr(chunk, "model") and chunk.model:
-                            model_name = chunk.model
-                        continue
-
-                    delta = chunk.choices[0].delta
-                    if hasattr(chunk, "model") and chunk.model:
-                        model_name = chunk.model
-
-                    if delta and delta.content:
-                        content_parts.append(delta.content)
-                        try:
-                            stream_callback(delta.content)
-                        except Exception:
-                            pass
-
-                    if delta and delta.tool_calls:
-                        for tc_delta in delta.tool_calls:
-                            idx = tc_delta.index if tc_delta.index is not None else 0
-                            if idx in tool_calls_acc and tc_delta.id and tc_delta.id != tool_calls_acc[idx]["id"]:
-                                matched = False
-                                for eidx, eentry in tool_calls_acc.items():
-                                    if eentry["id"] == tc_delta.id:
-                                        idx = eidx
-                                        matched = True
-                                        break
-                                if not matched:
-                                    idx = (max(k for k in tool_calls_acc if isinstance(k, int)) + 1) if tool_calls_acc else 0
-                            if idx not in tool_calls_acc:
-                                tool_calls_acc[idx] = {
-                                    "id": tc_delta.id or "",
-                                    "type": "function",
-                                    "function": {"name": "", "arguments": ""},
-                                }
-                            entry = tool_calls_acc[idx]
-                            if tc_delta.id:
-                                entry["id"] = tc_delta.id
-                            if tc_delta.function:
-                                if tc_delta.function.name:
-                                    entry["function"]["name"] += tc_delta.function.name
-                                if tc_delta.function.arguments:
-                                    entry["function"]["arguments"] += tc_delta.function.arguments
-
-                    if chunk.choices[0].finish_reason:
-                        finish_reason = chunk.choices[0].finish_reason
-
-                full_content = "".join(content_parts) or None
-                mock_tool_calls = None
-                if tool_calls_acc:
-                    mock_tool_calls = []
-                    for idx in sorted(tool_calls_acc):
-                        tc = tool_calls_acc[idx]
-                        mock_tool_calls.append(SimpleNamespace(
-                            id=tc["id"],
-                            type=tc["type"],
-                            function=SimpleNamespace(
-                                name=tc["function"]["name"],
-                                arguments=tc["function"]["arguments"],
-                            ),
-                        ))
-
-                mock_message = SimpleNamespace(
-                    role=role,
-                    content=full_content,
-                    tool_calls=mock_tool_calls,
-                    reasoning_content=None,
-                )
-                mock_choice = SimpleNamespace(
-                    index=0,
-                    message=mock_message,
-                    finish_reason=finish_reason or "stop",
-                )
-                mock_response = SimpleNamespace(
-                    id="stream-" + str(uuid.uuid4()),
-                    model=model_name,
-                    choices=[mock_choice],
-                    usage=None,
-                )
-                result["response"] = mock_response
-
+                if self.api_mode == "anthropic_messages":
+                    self._try_refresh_anthropic_client_credentials()
+                    result["response"] = _call_anthropic()
+                else:
+                    result["response"] = _call_chat_completions()
            except Exception as e:
-                result["error"] = e
+                if deltas_were_sent["yes"]:
+                    # Streaming failed AFTER some tokens were already delivered
+                    # to consumers. Don't fall back — that would cause
+                    # double-delivery (partial streamed + full non-streamed).
+                    # Let the error propagate; the partial content already
+                    # reached the user via the stream.
+                    logger.warning("Streaming failed after partial delivery, not falling back: %s", e)
+                    result["error"] = e
+                else:
+                    # Streaming failed before any tokens reached consumers.
+                    # Safe to fall back to the standard non-streaming path.
+                    logger.info("Streaming failed before delivery, falling back to non-streaming: %s", e)
+                    try:
+                        result["response"] = self._interruptible_api_call(api_kwargs)
+                    except Exception as fallback_err:
+                        result["error"] = fallback_err
            finally:
                request_client = request_client_holder.get("client")
                if request_client is not None:
@@ -2967,7 +3126,7 @@ class AIAgent:
                            self._close_request_openai_client(request_client, reason="stream_interrupt_abort")
                except Exception:
                    pass
-                raise InterruptedError("Agent interrupted during API call")
+                raise InterruptedError("Agent interrupted during streaming API call")
        if result["error"] is not None:
            raise result["error"]
        return result["response"]
@@ -3215,6 +3374,7 @@ class AIAgent:
                tools=self.tools,
                max_tokens=self.max_tokens,
                reasoning_config=self.reasoning_config,
+                is_oauth=getattr(self, "_is_anthropic_oauth", False),
            )

        if self.api_mode == "codex_responses":
@@ -3363,6 +3523,8 @@ class AIAgent:
        base_url = (self.base_url or "").lower()
        if "nousresearch" in base_url:
            return True
+        if "ai-gateway.vercel.sh" in base_url:
+            return True
        if "openrouter" not in base_url:
            return False
        if "api.mistral.ai" in base_url:
@@ -3542,7 +3704,8 @@ class AIAgent:

        flush_content = (
            "[System: The session is being compressed. "
-            "Please save anything worth remembering to your memories.]"
+            "Save anything worth remembering — prioritize user preferences, "
+            "corrections, and recurring patterns over task-specific details.]"
        )
        _sentinel = f"__flush_{id(self)}_{time.monotonic()}"
        flush_msg = {"role": "user", "content": flush_content, "_flush_sentinel": _sentinel}
@@ -3631,7 +3794,7 @@ class AIAgent:
                    tool_calls = assistant_msg.tool_calls
            elif self.api_mode == "anthropic_messages" and not _aux_available:
                from agent.anthropic_adapter import normalize_anthropic_response as _nar_flush
-                _flush_msg, _ = _nar_flush(response)
+                _flush_msg, _ = _nar_flush(response, strip_tool_prefix=getattr(self, '_is_anthropic_oauth', False))
                if _flush_msg and _flush_msg.tool_calls:
                    tool_calls = _flush_msg.tool_calls
            elif hasattr(response, "choices") and response.choices:
@@ -4172,7 +4335,7 @@ class AIAgent:
                        spinner.stop(cute_msg)
                    elif self.quiet_mode:
                        self._vprint(f"  {cute_msg}")
-            elif self.quiet_mode and self._stream_callback is None:
+            elif self.quiet_mode and not self._has_stream_consumers():
                face = random.choice(KawaiiSpinner.KAWAII_WAITING)
                emoji = _get_tool_emoji(function_name)
                preview = _build_tool_preview(function_name, function_args) or function_name
@@ -4392,9 +4555,10 @@ class AIAgent:
                if self.api_mode == "anthropic_messages":
                    from agent.anthropic_adapter import build_anthropic_kwargs as _bak, normalize_anthropic_response as _nar
                    _ant_kw = _bak(model=self.model, messages=api_messages, tools=None,
-                                   max_tokens=self.max_tokens, reasoning_config=self.reasoning_config)
+                                   max_tokens=self.max_tokens, reasoning_config=self.reasoning_config,
+                                   is_oauth=getattr(self, '_is_anthropic_oauth', False))
                    summary_response = self._anthropic_messages_create(_ant_kw)
-                    _msg, _ = _nar(summary_response)
+                    _msg, _ = _nar(summary_response, strip_tool_prefix=getattr(self, '_is_anthropic_oauth', False))
                    final_response = (_msg.content or "").strip()
                else:
                    summary_response = self._ensure_primary_openai_client(reason="iteration_limit_summary").chat.completions.create(**summary_kwargs)
@@ -4422,9 +4586,10 @@ class AIAgent:
                elif self.api_mode == "anthropic_messages":
                    from agent.anthropic_adapter import build_anthropic_kwargs as _bak2, normalize_anthropic_response as _nar2
                    _ant_kw2 = _bak2(model=self.model, messages=api_messages, tools=None,
+                                    is_oauth=getattr(self, '_is_anthropic_oauth', False),
                                     max_tokens=self.max_tokens, reasoning_config=self.reasoning_config)
                    retry_response = self._anthropic_messages_create(_ant_kw2)
-                    _retry_msg, _ = _nar2(retry_response)
+                    _retry_msg, _ = _nar2(retry_response, strip_tool_prefix=getattr(self, '_is_anthropic_oauth', False))
                    final_response = (_retry_msg.content or "").strip()
                else:
                    summary_kwargs = {
@@ -4541,8 +4706,9 @@ class AIAgent:
            self._turns_since_memory += 1
            if self._turns_since_memory >= self._memory_nudge_interval:
                user_message += (
-                    "\n\n[System: You've had several exchanges in this session. "
-                    "Consider whether there's anything worth saving to your memories.]"
+                    "\n\n[System: You've had several exchanges. Consider: "
+                    "has the user shared preferences, corrected you, or revealed "
+                    "something about their workflow worth remembering for future sessions?]"
                )
                self._turns_since_memory = 0

@@ -4552,8 +4718,9 @@ class AIAgent:
                and self._iters_since_skill >= self._skill_nudge_interval
                and "skill_manage" in self.valid_tool_names):
            user_message += (
-                "\n\n[System: The previous task involved many steps. "
-                "If you discovered a reusable workflow, consider saving it as a skill.]"
+                "\n\n[System: The previous task involved many tool calls. "
+                "Save the approach as a skill if it's reusable, or update "
+                "any existing skill you used if it was wrong or incomplete.]"
            )
            self._iters_since_skill = 0

@@ -4807,8 +4974,8 @@ class AIAgent:
                self._vprint(f"\n{self.log_prefix}🔄 Making API call #{api_call_count}/{self.max_iterations}...")
                self._vprint(f"{self.log_prefix}   📊 Request size: {len(api_messages)} messages, ~{approx_tokens:,} tokens (~{total_chars:,} chars)")
                self._vprint(f"{self.log_prefix}   🔧 Available tools: {len(self.tools) if self.tools else 0}")
-            elif self._stream_callback is None:
-                # Animated thinking spinner in quiet mode (skip during streaming TTS)
+            elif not self._has_stream_consumers():
+                # Animated thinking spinner in quiet mode (skip during streaming)
                face = random.choice(KawaiiSpinner.KAWAII_THINKING)
                verb = random.choice(KawaiiSpinner.THINKING_VERBS)
                if self.thinking_callback:
@@ -4848,33 +5015,22 @@ class AIAgent:
                    if os.getenv("HERMES_DUMP_REQUESTS", "").strip().lower() in {"1", "true", "yes", "on"}:
                        self._dump_api_request_debug(api_kwargs, reason="preflight")

-                    cb = getattr(self, "_stream_callback", None)
-                    if cb is not None and self.api_mode == "chat_completions":
-                        response = self._streaming_api_call(api_kwargs, cb)
+                    if self._has_stream_consumers():
+                        # Streaming path: fire delta callbacks for real-time
+                        # token delivery to CLI display, gateway, or TTS.
+                        def _stop_spinner():
+                            nonlocal thinking_spinner
+                            if thinking_spinner:
+                                thinking_spinner.stop("")
+                                thinking_spinner = None
+                            if self.thinking_callback:
+                                self.thinking_callback("")
+
+                        response = self._interruptible_streaming_api_call(
+                            api_kwargs, on_first_delta=_stop_spinner
+                        )
                    else:
                        response = self._interruptible_api_call(api_kwargs)
-                        # Forward full response to TTS callback for non-streaming providers
-                        # (e.g. Anthropic) so voice TTS still works via batch delivery.
-                        if cb is not None and response:
-                            try:
-                                content = None
-                                # Try choices first — _interruptible_api_call converts all
-                                # providers (including Anthropic) to this format.
-                                try:
-                                    content = response.choices[0].message.content
-                                except (AttributeError, IndexError):
-                                    pass
-                                # Fallback: Anthropic native content blocks
-                                if not content and self.api_mode == "anthropic_messages":
-                                    text_parts = [
-                                        block.text for block in getattr(response, "content", [])
-                                        if getattr(block, "type", None) == "text" and getattr(block, "text", None)
-                                    ]
-                                    content = " ".join(text_parts) if text_parts else None
-                                if content:
-                                    cb(content)
-                            except Exception:
-                                pass
                    
                    api_duration = time.time() - api_start_time
                    
@@ -5102,6 +5258,15 @@ class AIAgent:
                    if hasattr(response, 'usage') and response.usage:
                        if self.api_mode in ("codex_responses", "anthropic_messages"):
                            prompt_tokens = getattr(response.usage, 'input_tokens', 0) or 0
+                            if self.api_mode == "anthropic_messages":
+                                # Anthropic splits input into cache_read + cache_creation
+                                # + non-cached input_tokens. Without adding the cached
+                                # portions, the context bar shows only the tiny non-cached
+                                # portion (e.g. 3 tokens) instead of the real total (~18K).
+                                # Other providers (OpenAI/Codex) already include cached
+                                # tokens in their input_tokens/prompt_tokens field.
+                                prompt_tokens += getattr(response.usage, 'cache_read_input_tokens', 0) or 0
+                                prompt_tokens += getattr(response.usage, 'cache_creation_input_tokens', 0) or 0
                            completion_tokens = getattr(response.usage, 'output_tokens', 0) or 0
                            total_tokens = (
                                getattr(response.usage, 'total_tokens', None)
@@ -5327,6 +5492,27 @@ class AIAgent:
                        'request entity too large',  # OpenRouter/Nous 413 safety net
                        'prompt is too long',  # Anthropic: "prompt is too long: N tokens > M maximum"
                    ])
+
+                    # Fallback heuristic: Anthropic sometimes returns a generic
+                    # 400 invalid_request_error with just "Error" as the message
+                    # when the context is too large.  If the error message is very
+                    # short/generic AND the session is large, treat it as a
+                    # probable context-length error and attempt compression rather
+                    # than aborting.  This prevents an infinite failure loop where
+                    # each failed message gets persisted, making the session even
+                    # larger. (#1630)
+                    if not is_context_length_error and status_code == 400:
+                        ctx_len = getattr(getattr(self, 'context_compressor', None), 'context_length', 200000)
+                        is_large_session = approx_tokens > ctx_len * 0.4 or len(api_messages) > 80
+                        is_generic_error = len(error_msg.strip()) < 30  # e.g. just "error"
+                        if is_large_session and is_generic_error:
+                            is_context_length_error = True
+                            self._vprint(
+                                f"{self.log_prefix}⚠️  Generic 400 with large session "
+                                f"(~{approx_tokens:,} tokens, {len(api_messages)} msgs) — "
+                                f"treating as probable context overflow.",
+                                force=True,
+                            )
                    
                    if is_context_length_error:
                        compressor = self.context_compressor
@@ -5393,10 +5579,19 @@ class AIAgent:
                    # These indicate a problem with the request itself (bad model ID,
                    # invalid API key, forbidden, etc.) and will never succeed on retry.
                    # Note: 413 and context-length errors are excluded — handled above.
+                    # 429 (rate limit) is transient and MUST be retried with backoff.
+                    # 529 (Anthropic overloaded) is also transient.
                    # Also catch local validation errors (ValueError, TypeError) — these
                    # are programming bugs, not transient failures.
+                    _RETRYABLE_STATUS_CODES = {413, 429, 529}
                    is_local_validation_error = isinstance(api_error, (ValueError, TypeError))
-                    is_client_status_error = isinstance(status_code, int) and 400 <= status_code < 500 and status_code != 413
+                    # Detect generic 400s from Anthropic OAuth (transient server-side failures).
+                    # Real invalid_request_error responses include a descriptive message;
+                    # transient ones contain only "Error" or are empty. (ref: issue #1608)
+                    _err_body = getattr(api_error, "body", None) or {}
+                    _err_message = (_err_body.get("error", {}).get("message", "") if isinstance(_err_body, dict) else "")
+                    _is_generic_400 = (status_code == 400 and _err_message.strip().lower() in ("error", ""))
+                    is_client_status_error = isinstance(status_code, int) and 400 <= status_code < 500 and status_code not in _RETRYABLE_STATUS_CODES and not _is_generic_400
                    is_client_error = (is_local_validation_error or is_client_status_error or any(phrase in error_msg for phrase in [
                        'error code: 401', 'error code: 403',
                        'error code: 404', 'error code: 422',
@@ -5417,7 +5612,19 @@ class AIAgent:
                        self._vprint(f"{self.log_prefix}❌ Non-retryable client error detected. Aborting immediately.", force=True)
                        self._vprint(f"{self.log_prefix}   💡 This type of error won't be fixed by retrying.", force=True)
                        logging.error(f"{self.log_prefix}Non-retryable client error: {api_error}")
-                        self._persist_session(messages, conversation_history)
+                        # Skip session persistence when the error is likely
+                        # context-overflow related (status 400 + large session).
+                        # Persisting the failed user message would make the
+                        # session even larger, causing the same failure on the
+                        # next attempt. (#1630)
+                        if status_code == 400 and (approx_tokens > 50000 or len(api_messages) > 80):
+                            self._vprint(
+                                f"{self.log_prefix}⚠️  Skipping session persistence "
+                                f"for large failed session to prevent growth loop.",
+                                force=True,
+                            )
+                        else:
+                            self._persist_session(messages, conversation_history)
                        return {
                            "final_response": None,
                            "messages": messages,
@@ -5492,7 +5699,9 @@ class AIAgent:
                    assistant_message, finish_reason = self._normalize_codex_response(response)
                elif self.api_mode == "anthropic_messages":
                    from agent.anthropic_adapter import normalize_anthropic_response
-                    assistant_message, finish_reason = normalize_anthropic_response(response)
+                    assistant_message, finish_reason = normalize_anthropic_response(
+                        response, strip_tool_prefix=getattr(self, "_is_anthropic_oauth", False)
+                    )
                else:
                    assistant_message = response.choices[0].message
                
@@ -483,6 +483,8 @@ install_system_packages() {
        elif command -v sudo &> /dev/null; then
            if [ "$IS_INTERACTIVE" = true ]; then
                echo ""
+                log_info "sudo is needed ONLY to install optional system packages (${pkgs[*]}) via your package manager."
+                log_info "Hermes Agent itself does not require or retain root access."
                read -p "Install ${description}? (requires sudo) [y/N] " -n 1 -r
                echo
                if [[ $REPLY =~ ^[Yy]$ ]]; then
@@ -496,8 +498,9 @@ install_system_packages() {
                # Non-interactive (e.g. curl | bash) but a terminal is available.
                # Read the prompt from /dev/tty (same approach the setup wizard uses).
                echo ""
-                log_info "Installing ${description} requires sudo."
-                read -p "Install? [Y/n] " -n 1 -r < /dev/tty
+                log_info "sudo is needed ONLY to install optional system packages (${pkgs[*]}) via your package manager."
+                log_info "Hermes Agent itself does not require or retain root access."
+                read -p "Install ${description}? [Y/n] " -n 1 -r < /dev/tty
                echo
                if [[ $REPLY =~ ^[Yy]$ ]] || [[ -z $REPLY ]]; then
                    if sudo DEBIAN_FRONTEND=noninteractive NEEDRESTART_MODE=a $install_cmd < /dev/tty; then
@@ -688,7 +691,9 @@ install_deps() {
                    sudo DEBIAN_FRONTEND=noninteractive NEEDRESTART_MODE=a apt-get update -qq && sudo DEBIAN_FRONTEND=noninteractive NEEDRESTART_MODE=a apt-get install -y -qq build-essential python3-dev libffi-dev >/dev/null 2>&1 || true
                    log_success "Build tools installed"
                else
-                    read -p "Install build tools (build-essential, python3-dev)? (requires sudo) [Y/n] " -n 1 -r < /dev/tty
+                    log_info "sudo is needed ONLY to install build tools (build-essential, python3-dev, libffi-dev) via apt."
+                    log_info "Hermes Agent itself does not require or retain root access."
+                    read -p "Install build tools? [Y/n] " -n 1 -r < /dev/tty
                    echo
                    if [[ $REPLY =~ ^[Yy]$ ]] || [[ -z $REPLY ]]; then
                        sudo DEBIAN_FRONTEND=noninteractive NEEDRESTART_MODE=a apt-get update -qq && sudo DEBIAN_FRONTEND=noninteractive NEEDRESTART_MODE=a apt-get install -y -qq build-essential python3-dev libffi-dev >/dev/null 2>&1 || true
@@ -908,6 +913,8 @@ install_node_deps() {
                cd "$INSTALL_DIR" && npx playwright install chromium 2>/dev/null || true
                ;;
            *)
+                log_info "Playwright may request sudo to install browser system dependencies (shared libraries)."
+                log_info "This is standard Playwright setup — Hermes itself does not require root access."
                cd "$INSTALL_DIR" && npx playwright install --with-deps chromium 2>/dev/null || true
                ;;
        esac
@@ -5,12 +5,26 @@ description: "Production pipeline for ASCII art video — any format. Converts v

 # ASCII Video Production Pipeline

-Full production pipeline for rendering any content as colored ASCII character video.
+## Creative Standard
+
+This is visual art. ASCII characters are the medium; cinema is the standard.
+
+**Before writing a single line of code**, articulate the creative concept. What is the mood? What visual story does this tell? What makes THIS project different from every other ASCII video? The user's prompt is a starting point — interpret it with creative ambition, not literal transcription.
+
+**First-render excellence is non-negotiable.** The output must be visually striking without requiring revision rounds. If something looks generic, flat, or like "AI-generated ASCII art," it is wrong — rethink the creative concept before shipping.
+
+**Go beyond the reference vocabulary.** The effect catalogs, shader presets, and palette libraries in the references are a starting vocabulary. For every project, combine, modify, and invent new patterns. The catalog is a palette of paints — you write the painting.
+
+**Be proactively creative.** Extend the skill's vocabulary when the project calls for it. If the references don't have what the vision demands, build it. Include at least one visual moment the user didn't ask for but will appreciate — a transition, an effect, a color choice that elevates the whole piece.
+
+**Cohesive aesthetic over technical correctness.** All scenes in a video must feel connected by a unifying visual language — shared color temperature, related character palettes, consistent motion vocabulary. A technically correct video where every scene uses a random different effect is an aesthetic failure.
+
+**Dense, layered, considered.** Every frame should reward viewing. Never flat black backgrounds. Always multi-grid composition. Always per-scene variation. Always intentional color.

 ## Modes

-| Mode | Input | Output | Read |
-|------|-------|--------|------|
+| Mode | Input | Output | Reference |
+|------|-------|--------|-----------|
 | **Video-to-ASCII** | Video file | ASCII recreation of source footage | `references/inputs.md` § Video Sampling |
 | **Audio-reactive** | Audio file | Generative visuals driven by audio features | `references/inputs.md` § Audio Analysis |
 | **Generative** | None (or seed params) | Procedural ASCII animation | `references/effects.md` |
@@ -20,210 +34,154 @@ Full production pipeline for rendering any content as colored ASCII character vi

 ## Stack

-Single self-contained Python script per project. No GPU.
+Single self-contained Python script per project. No GPU required.

 | Layer | Tool | Purpose |
 |-------|------|---------|
 | Core | Python 3.10+, NumPy | Math, array ops, vectorized effects |
-| Signal | SciPy | FFT, peak detection (audio modes only) |
-| Imaging | Pillow (PIL) | Font rasterization, video frame decoding, image I/O |
-| Video I/O | ffmpeg (CLI) | Decode input, encode output segments, mux audio, mix tracks |
-| Parallel | concurrent.futures / multiprocessing | N workers for batch/clip rendering |
-| TTS | ElevenLabs API (or similar) | Generate narration clips for quote/testimonial videos |
-| Optional | OpenCV | Video frame sampling, edge detection, optical flow |
+| Signal | SciPy | FFT, peak detection (audio modes) |
+| Imaging | Pillow (PIL) | Font rasterization, frame decoding, image I/O |
+| Video I/O | ffmpeg (CLI) | Decode input, encode output, mux audio |
+| Parallel | concurrent.futures | N workers for batch/clip rendering |
+| TTS | ElevenLabs API (optional) | Generate narration clips |
+| Optional | OpenCV | Video frame sampling, edge detection |

-## Pipeline Architecture (v2)
+## Pipeline Architecture

-Every mode follows the same 6-stage pipeline. See `references/architecture.md` for implementation details, `references/scenes.md` for scene protocol, and `references/composition.md` for multi-grid composition and tonemap.
+Every mode follows the same 6-stage pipeline:

 ```
-┌─────────┐   ┌──────────┐   ┌───────────┐   ┌──────────┐   ┌─────────┐   ┌────────┐
-│ 1.INPUT  │→│ 2.ANALYZE │→│ 3.SCENE_FN │→│ 4.TONEMAP │→│ 5.SHADE  │→│ 6.ENCODE│
-│ load src │  │ features  │  │ → canvas   │  │ normalize │  │ post-fx  │  │ → video │
-└─────────┘   └──────────┘   └───────────┘   └──────────┘   └─────────┘   └────────┘
+INPUT → ANALYZE → SCENE_FN → TONEMAP → SHADE → ENCODE
 ```

 1. **INPUT** — Load/decode source material (video frames, audio samples, images, or nothing)
 2. **ANALYZE** — Extract per-frame features (audio bands, video luminance/edges, motion vectors)
-3. **SCENE_FN** — Scene function renders directly to pixel canvas (`uint8 H,W,3`). May internally compose multiple character grids via `_render_vf()` + pixel blend modes. See `references/composition.md`
-4. **TONEMAP** — Percentile-based adaptive brightness normalization with per-scene gamma. Replaces linear brightness multipliers. See `references/composition.md` § Adaptive Tonemap
-5. **SHADE** — Apply post-processing `ShaderChain` + `FeedbackBuffer`. See `references/shaders.md`
+3. **SCENE_FN** — Scene function renders to pixel canvas (`uint8 H,W,3`). Composes multiple character grids via `_render_vf()` + pixel blend modes. See `references/composition.md`
+4. **TONEMAP** — Percentile-based adaptive brightness normalization. See `references/composition.md` § Adaptive Tonemap
+5. **SHADE** — Post-processing via `ShaderChain` + `FeedbackBuffer`. See `references/shaders.md`
 6. **ENCODE** — Pipe raw RGB frames to ffmpeg for H.264/GIF encoding

 ## Creative Direction

-**Every project should look and feel different.** The references provide a vocabulary of building blocks — don't copy them verbatim. Combine, modify, and invent.
-
-### Aesthetic Dimensions to Vary
+### Aesthetic Dimensions

 | Dimension | Options | Reference |
 |-----------|---------|-----------|
-| **Character palette** | Density ramps, block elements, symbols, scripts (katakana, Greek, runes, braille), dots, project-specific | `architecture.md` § Character Palettes |
-| **Color strategy** | HSV (angle/distance/time/value mapped), OKLAB/OKLCH (perceptually uniform), discrete RGB palettes, auto-generated harmony (complementary/triadic/analogous/tetradic), monochrome, temperature | `architecture.md` § Color System |
-| **Color tint** | Warm, cool, amber, matrix green, neon pink, sepia, ice, blood, void, sunset | `shaders.md` § Color Grade |
-| **Background texture** | Sine fields, fBM noise, domain warp, voronoi cells, reaction-diffusion, cellular automata, video source | `effects.md` § Background Fills, Noise-Based Fields, Simulation-Based Fields |
-| **Primary effects** | Rings, spirals, tunnel, vortex, waves, interference, aurora, ripple, fire, strange attractors, SDFs (geometric shapes with smooth booleans) | `effects.md` § Radial / Wave / Fire / SDF-Based Fields |
-| **Particles** | Energy sparks, snow, rain, bubbles, runes, binary data, orbits, gravity wells, flocking boids, flow-field followers, trail-drawing particles | `effects.md` § Particle Systems |
-| **Shader mood** | Retro CRT, clean modern, glitch art, cinematic, dreamy, harsh industrial, psychedelic | `shaders.md` § Design Philosophy |
+| **Character palette** | Density ramps, block elements, symbols, scripts (katakana, Greek, runes, braille), project-specific | `architecture.md` § Palettes |
+| **Color strategy** | HSV, OKLAB/OKLCH, discrete RGB palettes, auto-generated harmony, monochrome, temperature | `architecture.md` § Color System |
+| **Background texture** | Sine fields, fBM noise, domain warp, voronoi, reaction-diffusion, cellular automata, video | `effects.md` |
+| **Primary effects** | Rings, spirals, tunnel, vortex, waves, interference, aurora, fire, SDFs, strange attractors | `effects.md` |
+| **Particles** | Sparks, snow, rain, bubbles, runes, orbits, flocking boids, flow-field followers, trails | `effects.md` § Particles |
+| **Shader mood** | Retro CRT, clean modern, glitch art, cinematic, dreamy, industrial, psychedelic | `shaders.md` |
 | **Grid density** | xs(8px) through xxl(40px), mixed per layer | `architecture.md` § Grid System |
-| **Font** | Menlo, Monaco, Courier, SF Mono, JetBrains Mono, Fira Code, IBM Plex | `architecture.md` § Font Selection |
-| **Coordinate space** | Cartesian, polar, tiled, rotated, skewed, fisheye, twisted, Möbius, domain-warped | `effects.md` § Coordinate Transforms |
-| **Mirror mode** | None, horizontal, vertical, quad, diagonal, kaleidoscope | `shaders.md` § Mirror Effects |
-| **Masking** | Circle, rect, ring, gradient, text stencil, value-field-as-mask, animated iris/wipe/dissolve | `composition.md` § Masking |
-| **Temporal motion** | Static, audio-reactive, eased keyframes, morphing between fields, temporal noise (smooth in-place evolution) | `effects.md` § Temporal Coherence |
-| **Transition style** | Crossfade, wipe (directional/radial), dissolve, glitch cut, iris open/close, mask-based reveal | `shaders.md` § Transitions, `composition.md` § Animated Masks |
-| **Aspect ratio** | Landscape (16:9), portrait (9:16), square (1:1), ultrawide (21:9) | `architecture.md` § Resolution Presets |
+| **Coordinate space** | Cartesian, polar, tiled, rotated, fisheye, Möbius, domain-warped | `effects.md` § Transforms |
+| **Feedback** | Zoom tunnel, rainbow trails, ghostly echo, rotating mandala, color evolution | `composition.md` § Feedback |
+| **Masking** | Circle, ring, gradient, text stencil, animated iris/wipe/dissolve | `composition.md` § Masking |
+| **Transitions** | Crossfade, wipe, dissolve, glitch cut, iris, mask-based reveal | `shaders.md` § Transitions |

 ### Per-Section Variation

-Never use the same config for the entire video. For each section/scene/quote:
- Choose a **different background effect** (or compose 2-3)
- Choose a **different character palette** (match the mood)
- Choose a **different color strategy** (or at minimum a different hue)
- Vary **shader intensity** (more bloom during peaks, more grain during quiet)
- Use **different particle types** if particles are active
+Never use the same config for the entire video. For each section/scene:
+- **Different background effect** (or compose 2-3)
+- **Different character palette** (match the mood)
+- **Different color strategy** (or at minimum a different hue)
+- **Vary shader intensity** (more bloom during peaks, more grain during quiet)
+- **Different particle types** if particles are active

 ### Project-Specific Invention

 For every project, invent at least one of:
 - A custom character palette matching the theme
- A custom background effect (combine/modify existing ones)
+- A custom background effect (combine/modify existing building blocks)
 - A custom color palette (discrete RGB set matching the brand/mood)
 - A custom particle character set
+- A novel scene transition or visual moment
+
+Don't just pick from the catalog. The catalog is vocabulary — you write the poem.

 ## Workflow

-### Step 1: Determine Mode and Gather Requirements
+### Step 1: Creative Vision
+
+Before any code, articulate the creative concept:
+
+- **Mood/atmosphere**: What should the viewer feel? Energetic, meditative, chaotic, elegant, ominous?
+- **Visual story**: What happens over the duration? Build tension? Transform? Dissolve?
+- **Color world**: Warm/cool? Monochrome? Neon? Earth tones? What's the dominant hue?
+- **Character texture**: Dense data? Sparse stars? Organic dots? Geometric blocks?
+- **What makes THIS different**: What's the one thing that makes this project unique?
+- **Emotional arc**: How do scenes progress? Open with energy, build to climax, resolve?
+
+Map the user's prompt to aesthetic choices. A "chill lo-fi visualizer" demands different everything from a "glitch cyberpunk data stream."
+
+### Step 2: Technical Design

-Establish with user:
- **Input source** — file path, format, duration
 - **Mode** — which of the 6 modes above
- **Sections** — time-mapped style changes (timestamps → effect names)
- **Resolution** — landscape 1920x1080 (default), portrait 1080x1920, square 1080x1080 @ 24fps; GIFs typically 640x360 @ 15fps
- **Style direction** — dense/sparse, bright/dark, chaotic/minimal, color palette
- **Text/branding** — easter eggs, overlays, credits, themed character sets
- **Output format** — MP4 (default), GIF, PNG sequence
- **Aspect ratio** — landscape (16:9), portrait (9:16 for TikTok/Reels/Stories), square (1:1 for IG feed)
-
-### Step 2: Detect Hardware and Set Quality
-
-Before building the script, detect the user's hardware and set appropriate defaults. See `references/optimization.md` § Hardware Detection.
-
-```python
-hw = detect_hardware()
-profile = quality_profile(hw, target_duration, user_quality_pref)
-log(f"Hardware: {hw['cpu_count']} cores, {hw['mem_gb']:.1f}GB RAM")
-log(f"Render: {profile['vw']}x{profile['vh']} @{profile['fps']}fps, {profile['workers']} workers")
-```
-
-Never hardcode worker counts, resolution, or CRF. Always detect and adapt.
+- **Resolution** — landscape 1920x1080 (default), portrait 1080x1920, square 1080x1080 @ 24fps
+- **Hardware detection** — auto-detect cores/RAM, set quality profile. See `references/optimization.md`
+- **Sections** — map timestamps to scene functions, each with its own effect/palette/color/shader config
+- **Output format** — MP4 (default), GIF (640x360 @ 15fps), PNG sequence

 ### Step 3: Build the Script

-Write as a single Python file. Major components:
+Single Python file. Components (with references):

-1. **Hardware detection + quality profile** — see `references/optimization.md`
-2. **Input loader** — mode-dependent; see `references/inputs.md`
-3. **Feature analyzer** — audio FFT, video luminance, or pass-through
-4. **Grid + renderer** — multi-density character grids with bitmap cache; `_render_vf()` helper for value/hue field → canvas
-5. **Character palettes** — multiple palettes chosen per project theme; see `references/architecture.md`
-6. **Color system** — HSV + discrete RGB palettes as needed; see `references/architecture.md`
-7. **Scene functions** — each returns `canvas (uint8 H,W,3)` directly. May compose multiple grids internally via pixel blend modes. See `references/scenes.md` + `references/composition.md`
-8. **Tonemap** — adaptive brightness normalization with per-scene gamma; see `references/composition.md`
-9. **Shader pipeline** — `ShaderChain` + `FeedbackBuffer` per-section config; see `references/shaders.md`
-10. **Scene table + dispatcher** — maps time ranges to scene functions + shader/feedback configs; see `references/scenes.md`
-11. **Parallel encoder** — N-worker batch clip rendering with ffmpeg pipes
+1. **Hardware detection + quality profile** — `references/optimization.md`
+2. **Input loader** — mode-dependent; `references/inputs.md`
+3. **Feature analyzer** — audio FFT, video luminance, or synthetic
+4. **Grid + renderer** — multi-density grids with bitmap cache; `references/architecture.md`
+5. **Character palettes** — multiple per project; `references/architecture.md` § Palettes
+6. **Color system** — HSV + discrete RGB + harmony generation; `references/architecture.md` § Color
+7. **Scene functions** — each returns `canvas (uint8 H,W,3)`; `references/scenes.md`
+8. **Tonemap** — adaptive brightness normalization; `references/composition.md`
+9. **Shader pipeline** — `ShaderChain` + `FeedbackBuffer`; `references/shaders.md`
+10. **Scene table + dispatcher** — time → scene function + config; `references/scenes.md`
+11. **Parallel encoder** — N-worker clip rendering with ffmpeg pipes
 12. **Main** — orchestrate full pipeline

-### Step 4: Handle Critical Bugs
+### Step 4: Quality Verification

-#### Font Cell Height (macOS Pillow)
+- **Test frames first**: render single frames at key timestamps before full render
+- **Brightness check**: `canvas.mean() > 8` for all ASCII content. If dark, lower gamma
+- **Visual coherence**: do all scenes feel like they belong to the same video?
+- **Creative vision check**: does the output match the concept from Step 1? If it looks generic, go back

-`textbbox()` returns wrong height. Use `font.getmetrics()`:
+## Critical Implementation Notes

-```python
-ascent, descent = font.getmetrics()
-cell_height = ascent + descent  # correct
-```
+### Brightness — Use `tonemap()`, Not Linear Multipliers

-#### ffmpeg Pipe Deadlock
-
-Never use `stderr=subprocess.PIPE` with long-running ffmpeg. Redirect to file:
-
-```python
-stderr_fh = open(err_path, "w")
-pipe = subprocess.Popen(cmd, stdin=subprocess.PIPE, stdout=subprocess.DEVNULL, stderr=stderr_fh)
-```
-
-#### Brightness — Use `tonemap()`, Not Linear Multipliers
-
-ASCII on black is inherently dark. This is the #1 visual issue. **Do NOT use linear `* N` brightness multipliers** — they clip highlights and wash out the image. Instead, use the **adaptive tonemap** function from `references/composition.md`:
+This is the #1 visual issue. ASCII on black is inherently dark. **Never use `canvas * N` multipliers** — they clip highlights. Use adaptive tonemap:

 ```python
 def tonemap(canvas, gamma=0.75):
-    """Percentile-based adaptive normalization + gamma. Replaces all brightness multipliers."""
    f = canvas.astype(np.float32)
-    lo = np.percentile(f, 1)          # black point (1st percentile)
-    hi = np.percentile(f, 99.5)       # white point (99.5th percentile)
-    if hi - lo < 1: hi = lo + 1
-    f = (f - lo) / (hi - lo)
-    f = np.clip(f, 0, 1) ** gamma     # gamma < 1 = brighter mids
+    lo, hi = np.percentile(f[::4, ::4], [1, 99.5])
+    if hi - lo < 10: hi = lo + 10
+    f = np.clip((f - lo) / (hi - lo), 0, 1) ** gamma
    return (f * 255).astype(np.uint8)
 ```

-Pipeline ordering: `scene_fn() → tonemap() → FeedbackBuffer → ShaderChain → ffmpeg`
+Pipeline: `scene_fn() → tonemap() → FeedbackBuffer → ShaderChain → ffmpeg`

-Per-scene gamma overrides for destructive effects:
- Default: `gamma=0.75`
- Solarize scenes: `gamma=0.55` (solarize darkens above-threshold pixels)
- Posterize scenes: `gamma=0.50` (quantization loses brightness range)
- Already-bright scenes: `gamma=0.85`
+Per-scene gamma: default 0.75, solarize 0.55, posterize 0.50, bright scenes 0.85. Use `screen` blend (not `overlay`) for dark layers.

-Additional brightness best practices:
- Dense animated backgrounds — never flat black, always fill the grid
- Vignette minimum clamped to 0.15 (not 0.12)
- Bloom threshold lowered to 130 (not 170) so more pixels contribute to glow
- Use `screen` blend mode (not `overlay`) when compositing dark ASCII layers — overlay squares dark values: `2 * 0.12 * 0.12 = 0.03`
+### Font Cell Height

-#### Font Compatibility
+macOS Pillow: `textbbox()` returns wrong height. Use `font.getmetrics()`: `cell_height = ascent + descent`. See `references/troubleshooting.md`.

-Not all Unicode characters render in all fonts. Validate palettes at init:
-```python
-for c in palette:
-    img = Image.new("L", (20, 20), 0)
-    ImageDraw.Draw(img).text((0, 0), c, fill=255, font=font)
-    if np.array(img).max() == 0:
-        log(f"WARNING: char '{c}' (U+{ord(c):04X}) not in font, removing from palette")
-```
+### ffmpeg Pipe Deadlock

-### Step 4b: Per-Clip Architecture (for segmented videos)
+Never `stderr=subprocess.PIPE` with long-running ffmpeg — buffer fills at 64KB and deadlocks. Redirect to file. See `references/troubleshooting.md`.

-When the video has discrete segments (quotes, scenes, chapters), render each as a separate clip file. This enables:
- Re-rendering individual clips without touching the rest (`--clip q05`)
- Faster iteration on specific sections
- Easy reordering or trimming in post
+### Font Compatibility

-```python
-segments = [
-    {"id": "intro", "start": 0.0, "end": 5.0, "type": "intro"},
-    {"id": "q00", "start": 5.0, "end": 12.0, "type": "quote", "qi": 0, ...},
-    {"id": "t00", "start": 12.0, "end": 13.5, "type": "transition", ...},
-    {"id": "outro", "start": 208.0, "end": 211.6, "type": "outro"},
-]
+Not all Unicode chars render in all fonts. Validate palettes at init — render each char, check for blank output. See `references/troubleshooting.md`.

-from concurrent.futures import ProcessPoolExecutor, as_completed
-with ProcessPoolExecutor(max_workers=hw["workers"]) as pool:
-    futures = {pool.submit(render_clip, seg, features, path): seg["id"]
-               for seg, path in clip_args}
-    for fut in as_completed(futures):
-        fut.result()
-```
+### Per-Clip Architecture

-CLI: `--clip q00 t00 q01` to re-render specific clips, `--list` to show segments, `--skip-render` to re-stitch only.
+For segmented videos (quotes, scenes, chapters), render each as a separate clip file for parallel rendering and selective re-rendering. See `references/scenes.md`.

-### Step 5: Render and Iterate
-
-Performance targets per frame:
+## Performance Targets

 | Component | Budget |
 |-----------|--------|
@@ -233,24 +191,15 @@ Performance targets per frame:
 | Shader pipeline | 5-25ms |
 | **Total** | ~100-200ms/frame |

-**Fast iteration**: render single test frames to check brightness/layout before full render:
-```python
-canvas = render_single_frame(frame_index, features, renderer)
-Image.fromarray(canvas).save("test.png")
-```
-
-**Brightness verification**: sample 5-10 frames across video, check `mean > 8` for ASCII content.
-
 ## References

 | File | Contents |
 |------|----------|
-| `references/architecture.md` | Grid system (landscape/portrait/square resolution presets), font selection, character palettes (library of 20+), color system (HSV + OKLAB/OKLCH + discrete RGB + color harmony generation + perceptual gradient interpolation), `_render_vf()` helper, compositing, v2 effect function contract |
-| `references/inputs.md` | All input sources: audio analysis, video sampling, image conversion, text/lyrics, TTS integration (ElevenLabs, voice assignment, audio mixing) |
-| `references/effects.md` | Effect building blocks: 20+ value field generators (trig, noise/fBM, domain warp, voronoi, reaction-diffusion, cellular automata, strange attractors, SDFs), 8 hue field generators, coordinate transforms (rotate/tile/polar/Möbius), temporal coherence (easing, keyframes, morphing), radial/wave/fire effects, advanced particles (flocking, flow fields, trails), composing guide |
-| `references/shaders.md` | 38 shader implementations (geometry, channel, color, glow, noise, pattern, tone, glitch, mirror), `ShaderChain` class, full `_apply_shader_step()` dispatch, audio-reactive scaling, transitions, tint presets |
-| `references/composition.md` | **v2 core**: pixel blend modes (20 modes with implementations), multi-grid composition, `_render_vf()` helper, adaptive `tonemap()`, per-scene gamma, `FeedbackBuffer` with spatial transforms, `PixelBlendStack`, masking/stencil system (shape masks, text stencils, animated masks, boolean ops) |
-| `references/scenes.md` | **v2 scene protocol**: scene function contract (local time convention), `Renderer` class, `SCENES` table structure, `render_clip()` loop, beat-synced cutting, parallel rendering + pickling constraints, 4 complete scene examples, scene design checklist |
-| `references/design-patterns.md` | **Scene composition patterns**: layer hierarchy (bg/content/accent), directional parameter arcs vs oscillation, scene concepts and visual metaphors, counter-rotating dual systems, wave collision, progressive fragmentation, entropy/consumption, staggered layer entry (crescendo), scene ordering |
-| `references/troubleshooting.md` | NumPy broadcasting traps, blend mode pitfalls, multiprocessing/pickling issues, brightness diagnostics, ffmpeg deadlocks, font issues, performance bottlenecks, common mistakes |
-| `references/optimization.md` | Hardware detection, adaptive quality profiles (draft/preview/production/max), CLI integration, vectorized effect patterns, parallel rendering, memory management |
+| `references/architecture.md` | Grid system, resolution presets, font selection, character palettes (20+), color system (HSV + OKLAB + discrete RGB + harmony generation), `_render_vf()` helper, GridLayer class |
+| `references/composition.md` | Pixel blend modes (20 modes), `blend_canvas()`, multi-grid composition, adaptive `tonemap()`, `FeedbackBuffer`, `PixelBlendStack`, masking/stencil system |
+| `references/effects.md` | Effect building blocks: value field generators, hue fields, noise/fBM/domain warp, voronoi, reaction-diffusion, cellular automata, SDFs, strange attractors, particle systems, coordinate transforms, temporal coherence |
+| `references/shaders.md` | `ShaderChain`, `_apply_shader_step()` dispatch, 38 shader catalog, audio-reactive scaling, transitions, tint presets, output format encoding, terminal rendering |
+| `references/scenes.md` | Scene protocol, `Renderer` class, `SCENES` table, `render_clip()`, beat-synced cutting, parallel rendering, design patterns (layer hierarchy, directional arcs, visual metaphors, compositional techniques), complete scene examples at every complexity level, scene design checklist |
+| `references/inputs.md` | Audio analysis (FFT, bands, beats), video sampling, image conversion, text/lyrics, TTS integration (ElevenLabs, voice assignment, audio mixing) |
+| `references/optimization.md` | Hardware detection, quality profiles, vectorized patterns, parallel rendering, memory management, performance budgets |
+| `references/troubleshooting.md` | NumPy broadcasting traps, blend mode pitfalls, multiprocessing/pickling, brightness diagnostics, ffmpeg issues, font problems, common mistakes |
@@ -1,14 +1,6 @@
 # Architecture Reference

-**Cross-references:**
- Effect building blocks (value fields, noise, SDFs, particles): `effects.md`
- `_render_vf()`, blend modes, tonemap, masking: `composition.md`
- Scene protocol, render_clip, SCENES table: `scenes.md`
- Shader pipeline, feedback buffer, output encoding: `shaders.md`
- Complete scene examples: `examples.md`
- Input sources (audio analysis, video, TTS): `inputs.md`
- Performance tuning, hardware detection: `optimization.md`
- Common bugs (broadcasting, font, encoding): `troubleshooting.md`
+> **See also:** composition.md · effects.md · scenes.md · shaders.md · inputs.md · optimization.md · troubleshooting.md

 ## Grid System

@@ -2,13 +2,7 @@

 The composable system is the core of visual complexity. It operates at three levels: pixel-level blend modes, multi-grid composition, and adaptive brightness management. This document covers all three, plus the masking/stencil system for spatial control.

-**Cross-references:**
- Grid system, palettes, color (HSV + OKLAB): `architecture.md`
- Effect building blocks (value fields, hue fields, particles): `effects.md`
- Scene protocol, render_clip, SCENES table: `scenes.md`
- Shader pipeline, feedback buffer: `shaders.md`
- Complete scene examples with blend/mask usage: `examples.md`
- Blend mode pitfalls (overlay crush, division by zero): `troubleshooting.md`
+> **See also:** architecture.md · effects.md · scenes.md · shaders.md · troubleshooting.md

 ## Pixel-Level Blend Modes

@@ -1,193 +0,0 @@
-# Scene Design Patterns
-
-**Cross-references:**
- Scene protocol, SCENES table: `scenes.md`
- Blend modes, multi-grid composition, tonemap: `composition.md`
- Effect building blocks (value fields, noise, SDFs): `effects.md`
- Shader pipeline, feedback buffer: `shaders.md`
- Complete scene examples: `examples.md`
-
-Higher-order patterns for composing scenes that feel intentional rather than random. These patterns use the existing building blocks (value fields, blend modes, shaders, feedback) but organize them with compositional intent.
-
-## Layer Hierarchy
-
-Every scene should have clear visual layers with distinct roles:
-
-| Layer | Grid | Brightness | Purpose |
-|-------|------|-----------|---------|
-| **Background** | xs or sm (dense) | 0.1–0.25 | Atmosphere, texture. Never competes with content. |
-| **Content** | md (balanced) | 0.4–0.8 | The main visual idea. Carries the scene's concept. |
-| **Accent** | lg or sm (sparse) | 0.5–1.0 (sparse coverage) | Highlights, punctuation, sparse bright points. |
-
-The background sets mood. The content layer is what the scene *is about*. The accent adds visual interest without overwhelming.
-
-```python
-def fx_example(r, f, t, S):
-    local = t
-    progress = min(local / 5.0, 1.0)
-
-    g_bg = r.get_grid("sm")
-    g_main = r.get_grid("md")
-    g_accent = r.get_grid("lg")
-
-    # --- Background: dim atmosphere ---
-    bg_val = vf_smooth_noise(g_bg, f, t * 0.3, S, octaves=2, bri=0.15)
-    # ... render bg to canvas
-
-    # --- Content: the main visual idea ---
-    content_val = vf_spiral(g_main, f, t, S, n_arms=n_arms, tightness=tightness)
-    # ... render content on top of canvas
-
-    # --- Accent: sparse highlights ---
-    accent_val = vf_noise_static(g_accent, f, t, S, density=0.05)
-    # ... render accent on top
-
-    return canvas
-```
-
-## Directional Parameter Arcs
-
-Parameters should *go somewhere* over the scene's duration — not oscillate aimlessly with `sin(t * N)`.
-
-**Bad:** `twist = 3.0 + 2.0 * math.sin(t * 0.6)` — wobbles back and forth, feels aimless.
-
-**Good:** `twist = 2.0 + progress * 5.0` — starts gentle, ends intense. The scene *builds*.
-
-Use `progress = min(local / duration, 1.0)` (0→1 over the scene) to drive directional change:
-
-| Pattern | Formula | Feel |
-|---------|---------|------|
-| Linear ramp | `progress * range` | Steady buildup |
-| Ease-out | `1 - (1 - progress) ** 2` | Fast start, gentle finish |
-| Ease-in | `progress ** 2` | Slow start, accelerating |
-| Step reveal | `np.clip((progress - 0.5) / 0.25, 0, 1)` | Nothing until 50%, then fades in |
-| Build + plateau | `min(1.0, progress * 1.5)` | Reaches full at 67%, holds |
-
-Oscillation is fine for *secondary* parameters (saturation shimmer, hue drift). But the *defining* parameter of the scene should have a direction.
-
-### Examples of Directional Arcs
-
-| Scene concept | Parameter | Arc |
-|--------------|-----------|-----|
-| Emergence | Ring radius | 0 → max (ease-out) |
-| Shatter | Voronoi cell count | 8 → 38 (linear) |
-| Descent | Tunnel speed | 2.0 → 10.0 (linear) |
-| Mandala | Shape complexity | ring → +polygon → +star → +rosette (step reveals) |
-| Crescendo | Layer count | 1 → 7 (staggered entry) |
-| Entropy | Geometry visibility | 1.0 → 0.0 (consumed) |
-
-## Scene Concepts
-
-Each scene should be built around a *visual idea*, not an effect name.
-
-**Bad:** "fx_plasma_cascade" — named after the effect. No concept.
-**Good:** "fx_emergence" — a point of light expands into a field. The name tells you *what happens*.
-
-Good scene concepts have:
-1. A **visual metaphor** (emergence, descent, collision, entropy)
-2. A **directional arc** (things change from A to B, not oscillate)
-3. **Motivated layer choices** (each layer serves the concept)
-4. **Motivated feedback** (transform direction matches the metaphor)
-
-| Concept | Metaphor | Feedback transform | Why |
-|---------|----------|-------------------|-----|
-| Emergence | Birth, expansion | zoom-out | Past frames expand outward |
-| Descent | Falling, acceleration | zoom-in | Past frames rush toward center |
-| Inferno | Rising fire | shift-up | Past frames rise with the flames |
-| Entropy | Decay, dissolution | none | Clean, no persistence — things disappear |
-| Crescendo | Accumulation | zoom + hue_shift | Everything compounds and shifts |
-
-## Compositional Techniques
-
-### Counter-Rotating Dual Systems
-
-Two instances of the same effect rotating in opposite directions create visual interference:
-
-```python
-# Primary spiral (clockwise)
-s1_val = vf_spiral(g_main, f, t * 1.5, S, n_arms=n_arms_1, tightness=tightness_1)
-
-# Counter-rotating spiral (counter-clockwise via negative time)
-s2_val = vf_spiral(g_accent, f, -t * 1.2, S, n_arms=n_arms_2, tightness=tightness_2)
-
-# Screen blend creates bright interference at crossing points
-canvas = blend_canvas(canvas_with_s1, c2, "screen", 0.7)
-```
-
-Works with spirals, vortexes, rings. The counter-rotation creates constantly shifting interference patterns.
-
-### Wave Collision
-
-Two wave fronts converging from opposite sides, meeting at a collision point:
-
-```python
-collision_phase = abs(progress - 0.5) * 2  # 1→0→1 (0 at collision)
-
-# Wave A approaches from left
-offset_a = (1 - progress) * g.cols * 0.4
-wave_a = np.sin((g.cc + offset_a) * 0.08 + t * 2) * 0.5 + 0.5
-
-# Wave B approaches from right
-offset_b = -(1 - progress) * g.cols * 0.4
-wave_b = np.sin((g.cc + offset_b) * 0.08 - t * 2) * 0.5 + 0.5
-
-# Interference peaks at collision
-combined = wave_a * 0.5 + wave_b * 0.5 + np.abs(wave_a - wave_b) * (1 - collision_phase) * 0.5
-```
-
-### Progressive Fragmentation
-
-Voronoi with cell count increasing over time — visual shattering:
-
-```python
-n_pts = int(8 + progress * 30)  # 8 cells → 38 cells
-# Pre-generate enough points, slice to n_pts
-px = base_x[:n_pts] + np.sin(t * 0.3 + np.arange(n_pts) * 0.7) * (3 + progress * 3)
-```
-
-The edge glow width can also increase with progress to emphasize the cracks.
-
-### Entropy / Consumption
-
-A clean geometric pattern being overtaken by an organic process:
-
-```python
-# Geometry fades out
-geo_val = clean_pattern * max(0.05, 1.0 - progress * 0.9)
-
-# Organic process grows in
-rd_val = vf_reaction_diffusion(g, f, t, S) * min(1.0, progress * 1.5)
-
-# Render geometry first, organic on top — organic consumes geometry
-```
-
-### Staggered Layer Entry (Crescendo)
-
-Layers enter one at a time, building to overwhelming density:
-
-```python
-def layer_strength(enter_t, ramp=1.5):
-    """0.0 until enter_t, ramps to 1.0 over ramp seconds."""
-    return max(0.0, min(1.0, (local - enter_t) / ramp))
-
-# Layer 1: always present
-s1 = layer_strength(0.0)
-# Layer 2: enters at 2s
-s2 = layer_strength(2.0)
-# Layer 3: enters at 4s
-s3 = layer_strength(4.0)
-# ... etc
-
-# Each layer uses a different effect, grid, palette, and blend mode
-# Screen blend between layers so they accumulate light
-```
-
-For a 15-second crescendo, 7 layers entering every 2 seconds works well. Use different blend modes (screen for most, add for energy, colordodge for the final wash).
-
-## Scene Ordering
-
-For a multi-scene reel or video:
- **Vary mood between adjacent scenes** — don't put two calm scenes next to each other
- **Randomize order** rather than grouping by type — prevents "effect demo" feel
- **End on the strongest scene** — crescendo or something with a clear payoff
- **Open with energy** — grab attention in the first 2 seconds
@@ -2,13 +2,7 @@

 Effect building blocks that produce visual patterns. In v2, these are used **inside scene functions** that return a pixel canvas directly. The building blocks below operate on grid coordinate arrays and produce `(chars, colors)` or value/hue fields that the scene function renders to canvas via `_render_vf()`.

-**Cross-references:**
- Grid system, palettes, color: `architecture.md`
- `_render_vf()`, blend modes, tonemap, masking: `composition.md`
- Scene protocol, render_clip, SCENES table: `scenes.md`
- Shader pipeline, feedback buffer: `shaders.md`
- Complete scene examples using these effects: `examples.md`
- Common bugs (broadcasting, clipping): `troubleshooting.md`
+> **See also:** architecture.md · composition.md · scenes.md · shaders.md · troubleshooting.md

 ## Design Philosophy

@@ -109,142 +103,7 @@ def bg_cellular(g, f, t, n_centers=12, hue=0.5, bri=0.6, pal=PAL_BLOCKS):

 ---

-## Radial Effects
-
-### Concentric Rings
-Bass/sub-driven pulsing rings from center. Scale ring count and thickness with bass energy.
-```python
-def eff_rings(g, f, t, hue=0.5, n_base=6, pal=PAL_DEFAULT):
-    n_rings = int(n_base + f["sub_r"] * 25 + f["bass"] * 10)
-    spacing = 2 + f["bass_r"] * 7 + f["rms"] * 3
-    ring_cv = np.zeros((g.rows, g.cols), dtype=np.float32)
-    for ri in range(n_rings):
-        rad = (ri+1) * spacing + f["bdecay"] * 15
-        wobble = f["mid_r"]*5*np.sin(g.angle*3 + t*4) + f["hi_r"]*3*np.sin(g.angle*7 - t*6)
-        rd = np.abs(g.dist - rad - wobble)
-        th = 1 + f["sub"] * 3
-        ring_cv = np.maximum(ring_cv, np.clip((1 - rd/th) * (0.4 + f["bass"]*0.8), 0, 1))
-    # Color by angle + distance for rainbow rings
-    h = g.angle/(2*np.pi) + g.dist*0.005 + f["sub_r"]*0.2
-    return ring_cv, h
-```
-
-### Radial Rays
-```python
-def eff_rays(g, f, t, n_base=8, hue=0.5):
-    n_rays = int(n_base + f["hi_r"] * 25)
-    ray = np.clip(np.cos(g.angle*n_rays + t*3) * f["bdecay"]*0.6 * (1-g.dist_n), 0, 0.7)
-    return ray
-```
-
-### Spiral Arms (Logarithmic)
-```python
-def eff_spiral(g, f, t, n_arms=3, tightness=2.5, hue=0.5):
-    arm_cv = np.zeros((g.rows, g.cols), dtype=np.float32)
-    for ai in range(n_arms):
-        offset = ai * 2*np.pi / n_arms
-        log_r = np.log(g.dist + 1) * tightness
-        arm_phase = g.angle + offset - log_r + t * 0.8
-        arm_val = np.clip(np.cos(arm_phase * n_arms) * 0.6 + 0.2, 0, 1)
-        arm_val *= (0.4 + f["rms"]*0.6) * np.clip(1 - g.dist_n*0.5, 0.2, 1)
-        arm_cv = np.maximum(arm_cv, arm_val)
-    return arm_cv
-```
-
-### Center Glow / Pulse
-```python
-def eff_glow(g, f, t, intensity=0.6, spread=2.0):
-    return np.clip(intensity * np.exp(-g.dist_n * spread) * (0.5 + f["rms"]*2 + np.sin(t*1.2)*0.2), 0, 0.9)
-```
-
-### Tunnel / Depth
-```python
-def eff_tunnel(g, f, t, speed=3.0, complexity=6):
-    tunnel_d = 1.0 / (g.dist_n + 0.1)
-    v1 = np.sin(tunnel_d*2 - t*speed) * 0.45 + 0.55
-    v2 = np.sin(g.angle*complexity + tunnel_d*1.5 - t*2) * 0.35 + 0.55
-    return v1 * 0.5 + v2 * 0.5
-```
-
-### Vortex (Rotating Distortion)
-```python
-def eff_vortex(g, f, t, twist=3.0, pulse=True):
-    """Twisting radial pattern -- distance modulates angle."""
-    twisted = g.angle + g.dist_n * twist * np.sin(t * 0.5)
-    val = np.sin(twisted * 4 - t * 2) * 0.5 + 0.5
-    if pulse:
-        val *= 0.5 + f.get("bass", 0.3) * 0.8
-    return np.clip(val, 0, 1)
-```
-
---
-
-## Wave Effects
-
-### Multi-Band Frequency Waves
-Each frequency band draws its own wave at different spatial/temporal frequencies:
-```python
-def eff_freq_waves(g, f, t, bands=None):
-    if bands is None:
-        bands = [("sub",0.06,1.2,0.0), ("bass",0.10,2.0,0.08), ("lomid",0.15,3.0,0.16),
-                 ("mid",0.22,4.5,0.25), ("himid",0.32,6.5,0.4), ("hi",0.45,8.5,0.55)]
-    mid = g.rows / 2.0
-    composite = np.zeros((g.rows, g.cols), dtype=np.float32)
-    for band_key, sf, tf, hue_base in bands:
-        amp = f.get(band_key, 0.3) * g.rows * 0.4
-        y_wave = mid - np.sin(g.cc*sf + t*tf) * amp
-        y_wave += np.sin(g.cc*sf*2.3 + t*tf*1.7) * amp * 0.2  # harmonic
-        dist = np.abs(g.rr - y_wave)
-        thickness = 2 + f.get(band_key, 0.3) * 5
-        intensity = np.clip((1 - dist/thickness) * f.get(band_key, 0.3) * 1.5, 0, 1)
-        composite = np.maximum(composite, intensity)
-    return composite
-```
-
-### Interference Pattern
-6-8 overlapping sine waves creating moire-like patterns:
-```python
-def eff_interference(g, f, t, n_waves=5):
-    """Parametric interference -- vary n_waves for complexity."""
-    # Each wave has different orientation, frequency, and feature driver
-    drivers = ["mid_r", "himid_r", "bass_r", "lomid_r", "hi_r"]
-    vals = np.zeros((g.rows, g.cols), dtype=np.float32)
-    for i in range(min(n_waves, len(drivers))):
-        angle = i * np.pi / n_waves  # spread orientations
-        freq = 0.06 + i * 0.03
-        sp = 0.5 + i * 0.3
-        proj = g.cc * np.cos(angle) + g.rr * np.sin(angle)
-        vals += np.sin(proj * freq + t * sp) * f.get(drivers[i], 0.3) * 2.5
-    return np.clip(vals * 0.12 + 0.45, 0.1, 1)
-```
-
-### Aurora / Horizontal Bands
-```python
-def eff_aurora(g, f, t, hue=0.4, n_bands=3):
-    val = np.zeros((g.rows, g.cols), dtype=np.float32)
-    for i in range(n_bands):
-        freq_r = 0.08 + i * 0.04
-        freq_c = 0.012 + i * 0.008
-        sp_r = 0.7 + i * 0.3
-        sp_c = 0.18 + i * 0.12
-        val += np.sin(g.rr*freq_r + t*sp_r) * np.sin(g.cc*freq_c + t*sp_c) * (0.6 / n_bands)
-    return np.clip(val * (f.get("lomid_r", 0.3)*3 + 0.2), 0, 0.7)
-```
-
-### Ripple (Point-Source Waves)
-```python
-def eff_ripple(g, f, t, sources=None, freq=0.3, damping=0.02):
-    """Concentric ripples from point sources. Sources = [(row_frac, col_frac), ...]"""
-    if sources is None:
-        sources = [(0.5, 0.5)]  # center
-    val = np.zeros((g.rows, g.cols), dtype=np.float32)
-    for ry, rx in sources:
-        dy = g.rr - g.rows * ry
-        dx = g.cc - g.cols * rx
-        d = np.sqrt(dy**2 + dx**2)
-        val += np.sin(d * freq - t * 4) * np.exp(-d * damping) * 0.5
-    return np.clip(val + 0.5, 0, 1)
-```
+> **Note:** The v1 `eff_rings`, `eff_rays`, `eff_spiral`, `eff_glow`, `eff_tunnel`, `eff_vortex`, `eff_freq_waves`, `eff_interference`, `eff_aurora`, and `eff_ripple` functions are superseded by the `vf_*` value field generators below (used via `_render_vf()`). The `vf_*` versions integrate with the multi-grid composition pipeline and are preferred for all new scenes.

 ---

@@ -1967,3 +1826,40 @@ def scene_complex(r, f, t, S):
 ```

 Vary the **value field combo**, **hue field**, **palette**, **blend modes**, **feedback config**, and **shader chain** per section for maximum visual variety. With 12 value fields × 8 hue fields × 14 palettes × 20 blend modes × 7 feedback transforms × 38 shaders, the combinations are effectively infinite.
+
+---
+
+## Combining Effects — Creative Guide
+
+The catalog above is vocabulary. Here's how to compose it into something that looks intentional.
+
+### Layering for Depth
+Every scene should have at least two layers at different grid densities:
+- **Background** (sm or xs): dense, dim texture that prevents flat black. fBM, smooth noise, or domain warp at low brightness (bri=0.15-0.25).
+- **Content** (md): the main visual — rings, voronoi, spirals, tunnel. Full brightness.
+- **Accent** (lg or xl): sparse highlights — particles, text stencil, glow pulse. Screen-blended on top.
+
+### Interesting Effect Pairs
+| Pair | Blend | Why it works |
+|------|-------|-------------|
+| fBM + voronoi edges | `screen` | Organic fills the cells, edges add structure |
+| Domain warp + plasma | `difference` | Psychedelic organic interference |
+| Tunnel + vortex | `screen` | Depth perspective + rotational energy |
+| Spiral + interference | `exclusion` | Moire patterns from different spatial frequencies |
+| Reaction-diffusion + fire | `add` | Living organic base + dynamic foreground |
+| SDF geometry + domain warp | `screen` | Clean shapes floating in organic texture |
+
+### Effects as Masks
+Any value field can be used as a mask for another effect via `mask_from_vf()`:
+- Voronoi cells masking fire (fire visible only inside cells)
+- fBM masking a solid color layer (organic color clouds)
+- SDF shapes masking a reaction-diffusion field
+- Animated iris/wipe revealing one effect over another
+
+### Inventing New Effects
+For every project, create at least one effect that isn't in the catalog:
+- **Combine two vf_* functions** with math: `np.clip(vf_fbm(...) * vf_rings(...), 0, 1)`
+- **Apply coordinate transforms** before evaluation: `vf_plasma(twisted_grid, ...)`
+- **Use one field to modulate another's parameters**: `vf_spiral(..., tightness=2 + vf_fbm(...) * 5)`
+- **Stack time offsets**: render the same field at `t` and `t - 0.5`, difference-blend for motion trails
+- **Mirror a value field** through an SDF boundary for kaleidoscopic geometry
@@ -1,416 +0,0 @@
-# Scene Examples
-
-**Cross-references:**
- Grid system, palettes, color (HSV + OKLAB): `architecture.md`
- Effect building blocks (value fields, noise, SDFs, particles): `effects.md`
- `_render_vf()`, blend modes, tonemap, masking: `composition.md`
- Scene protocol, render_clip, SCENES table: `scenes.md`
- Shader pipeline, feedback buffer, ShaderChain: `shaders.md`
- Input sources (audio features, video features): `inputs.md`
- Performance tuning: `optimization.md`
- Common bugs: `troubleshooting.md`
-
-Copy-paste-ready scene functions at increasing complexity. Each is a complete, working v2 scene function that returns a pixel canvas. See `scenes.md` for the scene protocol and `composition.md` for blend modes and tonemap.
-
---
-
-## Minimal — Single Grid, Single Effect
-
-### Breathing Plasma
-
-One grid, one value field, one hue field. The simplest possible scene.
-
-```python
-def fx_breathing_plasma(r, f, t, S):
-    """Plasma field with time-cycling hue. Audio modulates brightness."""
-    canvas = _render_vf(r, "md",
-        lambda g, f, t, S: vf_plasma(g, f, t, S) * 1.3,
-        hf_time_cycle(0.08), PAL_DENSE, f, t, S, sat=0.8)
-    return canvas
-```
-
-### Reaction-Diffusion Coral
-
-Single grid, simulation-based field. Evolves organically over time.
-
-```python
-def fx_coral(r, f, t, S):
-    """Gray-Scott reaction-diffusion — coral branching pattern.
-    Slow-evolving, organic. Best for ambient/chill sections."""
-    canvas = _render_vf(r, "sm",
-        lambda g, f, t, S: vf_reaction_diffusion(g, f, t, S,
-            feed=0.037, kill=0.060, steps_per_frame=6, init_mode="center"),
-        hf_distance(0.55, 0.015), PAL_DOTS, f, t, S, sat=0.7)
-    return canvas
-```
-
-### SDF Geometry
-
-Geometric shapes from SDFs. Clean, precise, graphic.
-
-```python
-def fx_sdf_rings(r, f, t, S):
-    """Concentric SDF rings with smooth pulsing."""
-    def val_fn(g, f, t, S):
-        d1 = sdf_ring(g, radius=0.15 + f.get("bass", 0.3) * 0.05, thickness=0.015)
-        d2 = sdf_ring(g, radius=0.25 + f.get("mid", 0.3) * 0.05, thickness=0.012)
-        d3 = sdf_ring(g, radius=0.35 + f.get("hi", 0.3) * 0.04, thickness=0.010)
-        combined = sdf_smooth_union(sdf_smooth_union(d1, d2, 0.05), d3, 0.05)
-        return sdf_glow(combined, falloff=0.08) * (0.5 + f.get("rms", 0.3) * 0.8)
-    canvas = _render_vf(r, "md", val_fn, hf_angle(0.0), PAL_STARS, f, t, S, sat=0.85)
-    return canvas
-```
-
---
-
-## Standard — Two Grids + Blend
-
-### Tunnel Through Noise
-
-Two grids at different densities, screen blended. The fine noise texture shows through the coarser tunnel characters.
-
-```python
-def fx_tunnel_noise(r, f, t, S):
-    """Tunnel depth on md grid + fBM noise on sm grid, screen blended."""
-    canvas_a = _render_vf(r, "md",
-        lambda g, f, t, S: vf_tunnel(g, f, t, S, speed=4.0, complexity=8) * 1.2,
-        hf_distance(0.5, 0.02), PAL_BLOCKS, f, t, S, sat=0.7)
-
-    canvas_b = _render_vf(r, "sm",
-        lambda g, f, t, S: vf_fbm(g, f, t, S, octaves=4, freq=0.05, speed=0.15) * 1.3,
-        hf_time_cycle(0.06), PAL_RUNE, f, t, S, sat=0.6)
-
-    return blend_canvas(canvas_a, canvas_b, "screen", 0.7)
-```
-
-### Voronoi Cells + Spiral Overlay
-
-Voronoi cell edges with a spiral arm pattern overlaid.
-
-```python
-def fx_voronoi_spiral(r, f, t, S):
-    """Voronoi edge detection on md + logarithmic spiral on lg."""
-    canvas_a = _render_vf(r, "md",
-        lambda g, f, t, S: vf_voronoi(g, f, t, S,
-            n_cells=15, mode="edge", edge_width=2.0, speed=0.4),
-        hf_angle(0.2), PAL_CIRCUIT, f, t, S, sat=0.75)
-
-    canvas_b = _render_vf(r, "lg",
-        lambda g, f, t, S: vf_spiral(g, f, t, S, n_arms=4, tightness=3.0) * 1.2,
-        hf_distance(0.1, 0.03), PAL_BLOCKS, f, t, S, sat=0.9)
-
-    return blend_canvas(canvas_a, canvas_b, "exclusion", 0.6)
-```
-
-### Domain-Warped fBM
-
-Two layers of the same fBM, one domain-warped, difference-blended for psychedelic organic texture.
-
-```python
-def fx_organic_warp(r, f, t, S):
-    """Clean fBM vs domain-warped fBM, difference blended."""
-    canvas_a = _render_vf(r, "sm",
-        lambda g, f, t, S: vf_fbm(g, f, t, S, octaves=5, freq=0.04, speed=0.1),
-        hf_plasma(0.2), PAL_DENSE, f, t, S, sat=0.6)
-
-    canvas_b = _render_vf(r, "md",
-        lambda g, f, t, S: vf_domain_warp(g, f, t, S,
-            warp_strength=20.0, freq=0.05, speed=0.15),
-        hf_time_cycle(0.05), PAL_BRAILLE, f, t, S, sat=0.7)
-
-    return blend_canvas(canvas_a, canvas_b, "difference", 0.7)
-```
-
---
-
-## Complex — Three Grids + Conditional + Feedback
-
-### Psychedelic Cathedral
-
-Three-grid composition with beat-triggered kaleidoscope and feedback zoom tunnel. The most visually complex pattern.
-
-```python
-def fx_cathedral(r, f, t, S):
-    """Three-layer cathedral: interference + rings + noise, kaleidoscope on beat,
-    feedback zoom tunnel."""
-    # Layer 1: interference pattern on sm grid
-    canvas_a = _render_vf(r, "sm",
-        lambda g, f, t, S: vf_interference(g, f, t, S, n_waves=7) * 1.3,
-        hf_angle(0.0), PAL_MATH, f, t, S, sat=0.8)
-
-    # Layer 2: pulsing rings on md grid
-    canvas_b = _render_vf(r, "md",
-        lambda g, f, t, S: vf_rings(g, f, t, S, n_base=10, spacing_base=3) * 1.4,
-        hf_distance(0.3, 0.02), PAL_STARS, f, t, S, sat=0.9)
-
-    # Layer 3: temporal noise on lg grid (slow morph)
-    canvas_c = _render_vf(r, "lg",
-        lambda g, f, t, S: vf_temporal_noise(g, f, t, S,
-            freq=0.04, t_freq=0.2, octaves=3),
-        hf_time_cycle(0.12), PAL_BLOCKS, f, t, S, sat=0.7)
-
-    # Blend: A screen B, then difference with C
-    result = blend_canvas(canvas_a, canvas_b, "screen", 0.8)
-    result = blend_canvas(result, canvas_c, "difference", 0.5)
-
-    # Beat-triggered kaleidoscope
-    if f.get("bdecay", 0) > 0.3:
-        folds = 6 if f.get("sub_r", 0.3) > 0.4 else 8
-        result = sh_kaleidoscope(result.copy(), folds=folds)
-
-    return result
-
-# Scene table entry with feedback:
-# {"start": 30.0, "end": 50.0, "name": "cathedral", "fx": fx_cathedral,
-#  "gamma": 0.65, "shaders": [("bloom", {"thr": 110}), ("chromatic", {"amt": 4}),
-#                              ("vignette", {"s": 0.2}), ("grain", {"amt": 8})],
-#  "feedback": {"decay": 0.75, "blend": "screen", "opacity": 0.35,
-#               "transform": "zoom", "transform_amt": 0.012, "hue_shift": 0.015}}
-```
-
-### Masked Reaction-Diffusion with Attractor Overlay
-
-Reaction-diffusion visible only through an animated iris mask, with a strange attractor density field underneath.
-
-```python
-def fx_masked_life(r, f, t, S):
-    """Attractor base + reaction-diffusion visible through iris mask + particles."""
-    g_sm = r.get_grid("sm")
-    g_md = r.get_grid("md")
-
-    # Layer 1: strange attractor density field (background)
-    canvas_bg = _render_vf(r, "sm",
-        lambda g, f, t, S: vf_strange_attractor(g, f, t, S,
-            attractor="clifford", n_points=30000),
-        hf_time_cycle(0.04), PAL_DOTS, f, t, S, sat=0.5)
-
-    # Layer 2: reaction-diffusion (foreground, will be masked)
-    canvas_rd = _render_vf(r, "md",
-        lambda g, f, t, S: vf_reaction_diffusion(g, f, t, S,
-            feed=0.046, kill=0.063, steps_per_frame=4, init_mode="ring"),
-        hf_angle(0.15), PAL_HALFFILL, f, t, S, sat=0.85)
-
-    # Animated iris mask — opens over first 5 seconds of scene
-    scene_start = S.get("_scene_start", t)
-    if "_scene_start" not in S:
-        S["_scene_start"] = t
-    mask = mask_iris(g_md, t, scene_start, scene_start + 5.0,
-                     max_radius=0.6)
-    canvas_rd = apply_mask_canvas(canvas_rd, mask, bg_canvas=canvas_bg)
-
-    # Layer 3: flow-field particles following the R-D gradient
-    rd_field = vf_reaction_diffusion(g_sm, f, t, S,
-        feed=0.046, kill=0.063, steps_per_frame=0)  # read without stepping
-    ch_p, co_p = update_flow_particles(S, g_sm, f, rd_field,
-        n=300, speed=0.8, char_set=list("·•◦∘°"))
-    canvas_p = g_sm.render(ch_p, co_p)
-
-    result = blend_canvas(canvas_rd, canvas_p, "add", 0.7)
-    return result
-```
-
-### Morphing Field Sequence with Eased Keyframes
-
-Demonstrates temporal coherence: smooth morphing between effects with keyframed parameters.
-
-```python
-def fx_morphing_journey(r, f, t, S):
-    """Morphs through 4 value fields over 20 seconds with eased transitions.
-    Parameters (twist, arm count) also keyframed."""
-    # Keyframed twist parameter
-    twist = keyframe(t, [(0, 1.0), (5, 5.0), (10, 2.0), (15, 8.0), (20, 1.0)],
-                     ease_fn=ease_in_out_cubic, loop=True)
-
-    # Sequence of value fields with 2s crossfade
-    fields = [
-        lambda g, f, t, S: vf_plasma(g, f, t, S),
-        lambda g, f, t, S: vf_vortex(g, f, t, S, twist=twist),
-        lambda g, f, t, S: vf_fbm(g, f, t, S, octaves=5, freq=0.04),
-        lambda g, f, t, S: vf_domain_warp(g, f, t, S, warp_strength=15),
-    ]
-    durations = [5.0, 5.0, 5.0, 5.0]
-
-    val_fn = lambda g, f, t, S: vf_sequence(g, f, t, S, fields, durations,
-                                             crossfade=2.0)
-
-    # Render with slowly rotating hue
-    canvas = _render_vf(r, "md", val_fn, hf_time_cycle(0.06),
-                        PAL_DENSE, f, t, S, sat=0.8)
-
-    # Second layer: tiled version of same sequence at smaller grid
-    tiled_fn = lambda g, f, t, S: vf_sequence(
-        make_tgrid(g, *uv_tile(g, 3, 3, mirror=True)),
-        f, t, S, fields, durations, crossfade=2.0)
-    canvas_b = _render_vf(r, "sm", tiled_fn, hf_angle(0.1),
-                          PAL_RUNE, f, t, S, sat=0.6)
-
-    return blend_canvas(canvas, canvas_b, "screen", 0.5)
-```
-
---
-
-## Specialized — Unique State Patterns
-
-### Game of Life with Ghost Trails
-
-Cellular automaton with analog fade trails. Beat injects random cells.
-
-```python
-def fx_life(r, f, t, S):
-    """Conway's Game of Life with fading ghost trails.
-    Beat events inject random live cells for disruption."""
-    canvas = _render_vf(r, "sm",
-        lambda g, f, t, S: vf_game_of_life(g, f, t, S,
-            rule="life", steps_per_frame=1, fade=0.92, density=0.25),
-        hf_fixed(0.33), PAL_BLOCKS, f, t, S, sat=0.8)
-
-    # Overlay: coral automaton on lg grid for chunky texture
-    canvas_b = _render_vf(r, "lg",
-        lambda g, f, t, S: vf_game_of_life(g, f, t, S,
-            rule="coral", steps_per_frame=1, fade=0.85, density=0.15, seed=99),
-        hf_time_cycle(0.1), PAL_HATCH, f, t, S, sat=0.6)
-
-    return blend_canvas(canvas, canvas_b, "screen", 0.5)
-```
-
-### Boids Flock Over Voronoi
-
-Emergent swarm movement over a cellular background.
-
-```python
-def fx_boid_swarm(r, f, t, S):
-    """Flocking boids over animated voronoi cells."""
-    # Background: voronoi cells
-    canvas_bg = _render_vf(r, "md",
-        lambda g, f, t, S: vf_voronoi(g, f, t, S,
-            n_cells=20, mode="distance", speed=0.2),
-        hf_distance(0.4, 0.02), PAL_CIRCUIT, f, t, S, sat=0.5)
-
-    # Foreground: boids
-    g = r.get_grid("md")
-    ch_b, co_b = update_boids(S, g, f, n_boids=150, perception=6.0,
-                              max_speed=1.5, char_set=list("▸▹►▻→⟶"))
-    canvas_boids = g.render(ch_b, co_b)
-
-    # Trails for the boids
-    # (boid positions are stored in S["boid_x"], S["boid_y"])
-    S["px"] = list(S.get("boid_x", []))
-    S["py"] = list(S.get("boid_y", []))
-    ch_t, co_t = draw_particle_trails(S, g, max_trail=6, fade=0.6)
-    canvas_trails = g.render(ch_t, co_t)
-
-    result = blend_canvas(canvas_bg, canvas_trails, "add", 0.3)
-    result = blend_canvas(result, canvas_boids, "add", 0.9)
-    return result
-```
-
-### Fire Rising Through SDF Text Stencil
-
-Fire effect visible only through text letterforms.
-
-```python
-def fx_fire_text(r, f, t, S):
-    """Fire columns visible through text stencil. Text acts as window."""
-    g = r.get_grid("lg")
-
-    # Full-screen fire (will be masked)
-    canvas_fire = _render_vf(r, "sm",
-        lambda g, f, t, S: np.clip(
-            vf_fbm(g, f, t, S, octaves=4, freq=0.08, speed=0.8) *
-            (1.0 - g.rr / g.rows) *  # fade toward top
-            (0.6 + f.get("bass", 0.3) * 0.8), 0, 1),
-        hf_fixed(0.05), PAL_BLOCKS, f, t, S, sat=0.9)  # fire hue
-
-    # Background: dark domain warp
-    canvas_bg = _render_vf(r, "md",
-        lambda g, f, t, S: vf_domain_warp(g, f, t, S,
-            warp_strength=8, freq=0.03, speed=0.05) * 0.3,
-        hf_fixed(0.6), PAL_DENSE, f, t, S, sat=0.4)
-
-    # Text stencil mask
-    mask = mask_text(g, "FIRE", row_frac=0.45)
-    # Expand vertically for multi-row coverage
-    for offset in range(-2, 3):
-        shifted = mask_text(g, "FIRE", row_frac=0.45 + offset / g.rows)
-        mask = mask_union(mask, shifted)
-
-    canvas_masked = apply_mask_canvas(canvas_fire, mask, bg_canvas=canvas_bg)
-    return canvas_masked
-```
-
-### Portrait Mode: Vertical Rain + Quote
-
-Optimized for 9:16. Uses vertical space for long rain trails and stacked text.
-
-```python
-def fx_portrait_rain_quote(r, f, t, S):
-    """Portrait-optimized: matrix rain (long vertical trails) with stacked quote.
-    Designed for 1080x1920 (9:16)."""
-    g = r.get_grid("md")  # ~112x100 in portrait
-
-    # Matrix rain — long trails benefit from portrait's extra rows
-    ch, co, S = eff_matrix_rain(g, f, t, S,
-        hue=0.33, bri=0.6, pal=PAL_KATA, speed_base=0.4, speed_beat=2.5)
-    canvas_rain = g.render(ch, co)
-
-    # Tunnel depth underneath for texture
-    canvas_tunnel = _render_vf(r, "sm",
-        lambda g, f, t, S: vf_tunnel(g, f, t, S, speed=3.0, complexity=6) * 0.8,
-        hf_fixed(0.33), PAL_BLOCKS, f, t, S, sat=0.5)
-
-    result = blend_canvas(canvas_tunnel, canvas_rain, "screen", 0.8)
-
-    # Quote text — portrait layout: short lines, many of them
-    g_text = r.get_grid("lg")  # ~90x80 in portrait
-    quote_lines = layout_text_portrait(
-        "The code is the art and the art is the code",
-        max_chars_per_line=20)
-    # Center vertically
-    block_start = (g_text.rows - len(quote_lines)) // 2
-    ch_t = np.full((g_text.rows, g_text.cols), " ", dtype="U1")
-    co_t = np.zeros((g_text.rows, g_text.cols, 3), dtype=np.uint8)
-    total_chars = sum(len(l) for l in quote_lines)
-    progress = min(1.0, (t - S.get("_scene_start", t)) / 3.0)
-    if "_scene_start" not in S: S["_scene_start"] = t
-    render_typewriter(ch_t, co_t, quote_lines, block_start, g_text.cols,
-                      progress, total_chars, (200, 255, 220), t)
-    canvas_text = g_text.render(ch_t, co_t)
-
-    result = blend_canvas(result, canvas_text, "add", 0.9)
-    return result
-```
-
---
-
-## Scene Table Template
-
-Wire scenes into a complete video:
-
-```python
-SCENES = [
-    {"start": 0.0,  "end": 5.0,  "name": "coral",
-     "fx": fx_coral, "grid": "sm", "gamma": 0.70,
-     "shaders": [("bloom", {"thr": 110}), ("vignette", {"s": 0.2})],
-     "feedback": {"decay": 0.8, "blend": "screen", "opacity": 0.3,
-                  "transform": "zoom", "transform_amt": 0.01}},
-
-    {"start": 5.0,  "end": 15.0, "name": "tunnel_noise",
-     "fx": fx_tunnel_noise, "grid": "md", "gamma": 0.75,
-     "shaders": [("chromatic", {"amt": 3}), ("bloom", {"thr": 120}),
-                 ("scanlines", {"intensity": 0.06}), ("grain", {"amt": 8})],
-     "feedback": None},
-
-    {"start": 15.0, "end": 35.0, "name": "cathedral",
-     "fx": fx_cathedral, "grid": "sm", "gamma": 0.65,
-     "shaders": [("bloom", {"thr": 100}), ("chromatic", {"amt": 5}),
-                 ("color_wobble", {"amt": 0.2}), ("vignette", {"s": 0.18})],
-     "feedback": {"decay": 0.75, "blend": "screen", "opacity": 0.35,
-                  "transform": "zoom", "transform_amt": 0.012, "hue_shift": 0.015}},
-
-    {"start": 35.0, "end": 50.0, "name": "morphing",
-     "fx": fx_morphing_journey, "grid": "md", "gamma": 0.70,
-     "shaders": [("bloom", {"thr": 110}), ("grain", {"amt": 6})],
-     "feedback": {"decay": 0.7, "blend": "screen", "opacity": 0.25,
-                  "transform": "rotate_cw", "transform_amt": 0.003}},
-]
-```
@@ -1,13 +1,6 @@
 # Input Sources

-**Cross-references:**
- Grid system, resolution presets: `architecture.md`
- Effect building blocks (audio-reactive modulation): `effects.md`
- Scene protocol, SCENES table (feature routing): `scenes.md`
- Shader pipeline, output encoding: `shaders.md`
- Performance tuning (audio chunking, WAV caching): `optimization.md`
- Common bugs (sample rate, dtype, silence handling): `troubleshooting.md`
- Complete scene examples with feature usage: `examples.md`
+> **See also:** architecture.md · effects.md · scenes.md · shaders.md · optimization.md · troubleshooting.md

 ## Audio Analysis

@@ -1,14 +1,6 @@
 # Optimization Reference

-**Cross-references:**
- Grid system, resolution presets, portrait GridLayer: `architecture.md`
- Effect building blocks (pre-computation strategies): `effects.md`
- `_render_vf()`, tonemap (subsampled percentile): `composition.md`
- Scene protocol, render_clip: `scenes.md`
- Shader pipeline, encoding (ffmpeg flags): `shaders.md`
- Input sources (audio chunking, WAV extraction): `inputs.md`
- Common bugs (memory, OOM, frame drops): `troubleshooting.md`
- Complete scene examples: `examples.md`
+> **See also:** architecture.md · composition.md · scenes.md · shaders.md · inputs.md · troubleshooting.md

 ## Hardware Detection

@@ -1,18 +1,214 @@
-# Scene System Reference
+# Scene System & Creative Composition

-**Cross-references:**
- Grid system, palettes, color (HSV + OKLAB): `architecture.md`
- Effect building blocks (value fields, noise, SDFs, particles): `effects.md`
- `_render_vf()`, blend modes, tonemap, masking: `composition.md`
- Shader pipeline, feedback buffer, ShaderChain: `shaders.md`
- Complete scene examples at every complexity level: `examples.md`
- Input sources (audio features, video features): `inputs.md`
- Performance tuning, portrait CLI: `optimization.md`
- Common bugs (state leaks, frame drops): `troubleshooting.md`
+> **See also:** architecture.md · composition.md · effects.md · shaders.md
+
+## Scene Design Philosophy
+
+Scenes are storytelling units, not effect demos. Every scene needs:
+- A **concept** — what is happening visually? Not "plasma + rings" but "emergence from void" or "crystallization"
+- An **arc** — how does it change over its duration? Build, decay, transform, reveal?
+- A **role** — how does it serve the larger video narrative? Opening tension, peak energy, resolution?
+
+The design patterns below provide compositional techniques. The scene examples show them in practice at increasing complexity. The protocol section covers the technical contract.
+
+Good scene design starts with the concept, then selects effects and parameters that serve it. The design patterns section shows *how* to compose layers intentionally. The examples section shows complete working scenes at every complexity level. The protocol section covers the technical contract that all scenes must follow.
+
+---
+
+## Scene Design Patterns
+
+Higher-order patterns for composing scenes that feel intentional rather than random. These patterns use the existing building blocks (value fields, blend modes, shaders, feedback) but organize them with compositional intent.
+
+## Layer Hierarchy
+
+Every scene should have clear visual layers with distinct roles:
+
+| Layer | Grid | Brightness | Purpose |
+|-------|------|-----------|---------|
+| **Background** | xs or sm (dense) | 0.1–0.25 | Atmosphere, texture. Never competes with content. |
+| **Content** | md (balanced) | 0.4–0.8 | The main visual idea. Carries the scene's concept. |
+| **Accent** | lg or sm (sparse) | 0.5–1.0 (sparse coverage) | Highlights, punctuation, sparse bright points. |
+
+The background sets mood. The content layer is what the scene *is about*. The accent adds visual interest without overwhelming.
+
+```python
+def fx_example(r, f, t, S):
+    local = t
+    progress = min(local / 5.0, 1.0)
+
+    g_bg = r.get_grid("sm")
+    g_main = r.get_grid("md")
+    g_accent = r.get_grid("lg")
+
+    # --- Background: dim atmosphere ---
+    bg_val = vf_smooth_noise(g_bg, f, t * 0.3, S, octaves=2, bri=0.15)
+    # ... render bg to canvas
+
+    # --- Content: the main visual idea ---
+    content_val = vf_spiral(g_main, f, t, S, n_arms=n_arms, tightness=tightness)
+    # ... render content on top of canvas
+
+    # --- Accent: sparse highlights ---
+    accent_val = vf_noise_static(g_accent, f, t, S, density=0.05)
+    # ... render accent on top
+
+    return canvas
+```
+
+## Directional Parameter Arcs
+
+Parameters should *go somewhere* over the scene's duration — not oscillate aimlessly with `sin(t * N)`.
+
+**Bad:** `twist = 3.0 + 2.0 * math.sin(t * 0.6)` — wobbles back and forth, feels aimless.
+
+**Good:** `twist = 2.0 + progress * 5.0` — starts gentle, ends intense. The scene *builds*.
+
+Use `progress = min(local / duration, 1.0)` (0→1 over the scene) to drive directional change:
+
+| Pattern | Formula | Feel |
+|---------|---------|------|
+| Linear ramp | `progress * range` | Steady buildup |
+| Ease-out | `1 - (1 - progress) ** 2` | Fast start, gentle finish |
+| Ease-in | `progress ** 2` | Slow start, accelerating |
+| Step reveal | `np.clip((progress - 0.5) / 0.25, 0, 1)` | Nothing until 50%, then fades in |
+| Build + plateau | `min(1.0, progress * 1.5)` | Reaches full at 67%, holds |
+
+Oscillation is fine for *secondary* parameters (saturation shimmer, hue drift). But the *defining* parameter of the scene should have a direction.
+
+### Examples of Directional Arcs
+
+| Scene concept | Parameter | Arc |
+|--------------|-----------|-----|
+| Emergence | Ring radius | 0 → max (ease-out) |
+| Shatter | Voronoi cell count | 8 → 38 (linear) |
+| Descent | Tunnel speed | 2.0 → 10.0 (linear) |
+| Mandala | Shape complexity | ring → +polygon → +star → +rosette (step reveals) |
+| Crescendo | Layer count | 1 → 7 (staggered entry) |
+| Entropy | Geometry visibility | 1.0 → 0.0 (consumed) |
+
+## Scene Concepts
+
+Each scene should be built around a *visual idea*, not an effect name.
+
+**Bad:** "fx_plasma_cascade" — named after the effect. No concept.
+**Good:** "fx_emergence" — a point of light expands into a field. The name tells you *what happens*.
+
+Good scene concepts have:
+1. A **visual metaphor** (emergence, descent, collision, entropy)
+2. A **directional arc** (things change from A to B, not oscillate)
+3. **Motivated layer choices** (each layer serves the concept)
+4. **Motivated feedback** (transform direction matches the metaphor)
+
+| Concept | Metaphor | Feedback transform | Why |
+|---------|----------|-------------------|-----|
+| Emergence | Birth, expansion | zoom-out | Past frames expand outward |
+| Descent | Falling, acceleration | zoom-in | Past frames rush toward center |
+| Inferno | Rising fire | shift-up | Past frames rise with the flames |
+| Entropy | Decay, dissolution | none | Clean, no persistence — things disappear |
+| Crescendo | Accumulation | zoom + hue_shift | Everything compounds and shifts |
+
+## Compositional Techniques
+
+### Counter-Rotating Dual Systems
+
+Two instances of the same effect rotating in opposite directions create visual interference:
+
+```python
+# Primary spiral (clockwise)
+s1_val = vf_spiral(g_main, f, t * 1.5, S, n_arms=n_arms_1, tightness=tightness_1)
+
+# Counter-rotating spiral (counter-clockwise via negative time)
+s2_val = vf_spiral(g_accent, f, -t * 1.2, S, n_arms=n_arms_2, tightness=tightness_2)
+
+# Screen blend creates bright interference at crossing points
+canvas = blend_canvas(canvas_with_s1, c2, "screen", 0.7)
+```
+
+Works with spirals, vortexes, rings. The counter-rotation creates constantly shifting interference patterns.
+
+### Wave Collision
+
+Two wave fronts converging from opposite sides, meeting at a collision point:
+
+```python
+collision_phase = abs(progress - 0.5) * 2  # 1→0→1 (0 at collision)
+
+# Wave A approaches from left
+offset_a = (1 - progress) * g.cols * 0.4
+wave_a = np.sin((g.cc + offset_a) * 0.08 + t * 2) * 0.5 + 0.5
+
+# Wave B approaches from right
+offset_b = -(1 - progress) * g.cols * 0.4
+wave_b = np.sin((g.cc + offset_b) * 0.08 - t * 2) * 0.5 + 0.5
+
+# Interference peaks at collision
+combined = wave_a * 0.5 + wave_b * 0.5 + np.abs(wave_a - wave_b) * (1 - collision_phase) * 0.5
+```
+
+### Progressive Fragmentation
+
+Voronoi with cell count increasing over time — visual shattering:
+
+```python
+n_pts = int(8 + progress * 30)  # 8 cells → 38 cells
+# Pre-generate enough points, slice to n_pts
+px = base_x[:n_pts] + np.sin(t * 0.3 + np.arange(n_pts) * 0.7) * (3 + progress * 3)
+```
+
+The edge glow width can also increase with progress to emphasize the cracks.
+
+### Entropy / Consumption
+
+A clean geometric pattern being overtaken by an organic process:
+
+```python
+# Geometry fades out
+geo_val = clean_pattern * max(0.05, 1.0 - progress * 0.9)
+
+# Organic process grows in
+rd_val = vf_reaction_diffusion(g, f, t, S) * min(1.0, progress * 1.5)
+
+# Render geometry first, organic on top — organic consumes geometry
+```
+
+### Staggered Layer Entry (Crescendo)
+
+Layers enter one at a time, building to overwhelming density:
+
+```python
+def layer_strength(enter_t, ramp=1.5):
+    """0.0 until enter_t, ramps to 1.0 over ramp seconds."""
+    return max(0.0, min(1.0, (local - enter_t) / ramp))
+
+# Layer 1: always present
+s1 = layer_strength(0.0)
+# Layer 2: enters at 2s
+s2 = layer_strength(2.0)
+# Layer 3: enters at 4s
+s3 = layer_strength(4.0)
+# ... etc
+
+# Each layer uses a different effect, grid, palette, and blend mode
+# Screen blend between layers so they accumulate light
+```
+
+For a 15-second crescendo, 7 layers entering every 2 seconds works well. Use different blend modes (screen for most, add for energy, colordodge for the final wash).
+
+## Scene Ordering
+
+For a multi-scene reel or video:
+- **Vary mood between adjacent scenes** — don't put two calm scenes next to each other
+- **Randomize order** rather than grouping by type — prevents "effect demo" feel
+- **End on the strongest scene** — crescendo or something with a clear payoff
+- **Open with energy** — grab attention in the first 2 seconds
+
+---
+
+## Scene Protocol

 Scenes are the top-level creative unit. Each scene is a time-bounded segment with its own effect function, shader chain, feedback configuration, and tone-mapping gamma.

-## Scene Protocol (v2)
+### Scene Protocol (v2)

 ### Function Signature

@@ -404,3 +600,412 @@ For each scene:
 7. **Configure feedback** for trailing/recursive looks — or None for clean cuts
 8. **Set gamma** if using destructive shaders (solarize, posterize)
 9. **Test with --test-frame** at the scene's midpoint before full render
+
+---
+
+## Scene Examples
+
+Copy-paste-ready scene functions at increasing complexity. Each is a complete, working v2 scene function that returns a pixel canvas. See the Scene Protocol section above for the scene protocol and `composition.md` for blend modes and tonemap.
+
+---
+
+### Minimal — Single Grid, Single Effect
+
+### Breathing Plasma
+
+One grid, one value field, one hue field. The simplest possible scene.
+
+```python
+def fx_breathing_plasma(r, f, t, S):
+    """Plasma field with time-cycling hue. Audio modulates brightness."""
+    canvas = _render_vf(r, "md",
+        lambda g, f, t, S: vf_plasma(g, f, t, S) * 1.3,
+        hf_time_cycle(0.08), PAL_DENSE, f, t, S, sat=0.8)
+    return canvas
+```
+
+### Reaction-Diffusion Coral
+
+Single grid, simulation-based field. Evolves organically over time.
+
+```python
+def fx_coral(r, f, t, S):
+    """Gray-Scott reaction-diffusion — coral branching pattern.
+    Slow-evolving, organic. Best for ambient/chill sections."""
+    canvas = _render_vf(r, "sm",
+        lambda g, f, t, S: vf_reaction_diffusion(g, f, t, S,
+            feed=0.037, kill=0.060, steps_per_frame=6, init_mode="center"),
+        hf_distance(0.55, 0.015), PAL_DOTS, f, t, S, sat=0.7)
+    return canvas
+```
+
+### SDF Geometry
+
+Geometric shapes from SDFs. Clean, precise, graphic.
+
+```python
+def fx_sdf_rings(r, f, t, S):
+    """Concentric SDF rings with smooth pulsing."""
+    def val_fn(g, f, t, S):
+        d1 = sdf_ring(g, radius=0.15 + f.get("bass", 0.3) * 0.05, thickness=0.015)
+        d2 = sdf_ring(g, radius=0.25 + f.get("mid", 0.3) * 0.05, thickness=0.012)
+        d3 = sdf_ring(g, radius=0.35 + f.get("hi", 0.3) * 0.04, thickness=0.010)
+        combined = sdf_smooth_union(sdf_smooth_union(d1, d2, 0.05), d3, 0.05)
+        return sdf_glow(combined, falloff=0.08) * (0.5 + f.get("rms", 0.3) * 0.8)
+    canvas = _render_vf(r, "md", val_fn, hf_angle(0.0), PAL_STARS, f, t, S, sat=0.85)
+    return canvas
+```
+
+---
+
+### Standard — Two Grids + Blend
+
+### Tunnel Through Noise
+
+Two grids at different densities, screen blended. The fine noise texture shows through the coarser tunnel characters.
+
+```python
+def fx_tunnel_noise(r, f, t, S):
+    """Tunnel depth on md grid + fBM noise on sm grid, screen blended."""
+    canvas_a = _render_vf(r, "md",
+        lambda g, f, t, S: vf_tunnel(g, f, t, S, speed=4.0, complexity=8) * 1.2,
+        hf_distance(0.5, 0.02), PAL_BLOCKS, f, t, S, sat=0.7)
+
+    canvas_b = _render_vf(r, "sm",
+        lambda g, f, t, S: vf_fbm(g, f, t, S, octaves=4, freq=0.05, speed=0.15) * 1.3,
+        hf_time_cycle(0.06), PAL_RUNE, f, t, S, sat=0.6)
+
+    return blend_canvas(canvas_a, canvas_b, "screen", 0.7)
+```
+
+### Voronoi Cells + Spiral Overlay
+
+Voronoi cell edges with a spiral arm pattern overlaid.
+
+```python
+def fx_voronoi_spiral(r, f, t, S):
+    """Voronoi edge detection on md + logarithmic spiral on lg."""
+    canvas_a = _render_vf(r, "md",
+        lambda g, f, t, S: vf_voronoi(g, f, t, S,
+            n_cells=15, mode="edge", edge_width=2.0, speed=0.4),
+        hf_angle(0.2), PAL_CIRCUIT, f, t, S, sat=0.75)
+
+    canvas_b = _render_vf(r, "lg",
+        lambda g, f, t, S: vf_spiral(g, f, t, S, n_arms=4, tightness=3.0) * 1.2,
+        hf_distance(0.1, 0.03), PAL_BLOCKS, f, t, S, sat=0.9)
+
+    return blend_canvas(canvas_a, canvas_b, "exclusion", 0.6)
+```
+
+### Domain-Warped fBM
+
+Two layers of the same fBM, one domain-warped, difference-blended for psychedelic organic texture.
+
+```python
+def fx_organic_warp(r, f, t, S):
+    """Clean fBM vs domain-warped fBM, difference blended."""
+    canvas_a = _render_vf(r, "sm",
+        lambda g, f, t, S: vf_fbm(g, f, t, S, octaves=5, freq=0.04, speed=0.1),
+        hf_plasma(0.2), PAL_DENSE, f, t, S, sat=0.6)
+
+    canvas_b = _render_vf(r, "md",
+        lambda g, f, t, S: vf_domain_warp(g, f, t, S,
+            warp_strength=20.0, freq=0.05, speed=0.15),
+        hf_time_cycle(0.05), PAL_BRAILLE, f, t, S, sat=0.7)
+
+    return blend_canvas(canvas_a, canvas_b, "difference", 0.7)
+```
+
+---
+
+### Complex — Three Grids + Conditional + Feedback
+
+### Psychedelic Cathedral
+
+Three-grid composition with beat-triggered kaleidoscope and feedback zoom tunnel. The most visually complex pattern.
+
+```python
+def fx_cathedral(r, f, t, S):
+    """Three-layer cathedral: interference + rings + noise, kaleidoscope on beat,
+    feedback zoom tunnel."""
+    # Layer 1: interference pattern on sm grid
+    canvas_a = _render_vf(r, "sm",
+        lambda g, f, t, S: vf_interference(g, f, t, S, n_waves=7) * 1.3,
+        hf_angle(0.0), PAL_MATH, f, t, S, sat=0.8)
+
+    # Layer 2: pulsing rings on md grid
+    canvas_b = _render_vf(r, "md",
+        lambda g, f, t, S: vf_rings(g, f, t, S, n_base=10, spacing_base=3) * 1.4,
+        hf_distance(0.3, 0.02), PAL_STARS, f, t, S, sat=0.9)
+
+    # Layer 3: temporal noise on lg grid (slow morph)
+    canvas_c = _render_vf(r, "lg",
+        lambda g, f, t, S: vf_temporal_noise(g, f, t, S,
+            freq=0.04, t_freq=0.2, octaves=3),
+        hf_time_cycle(0.12), PAL_BLOCKS, f, t, S, sat=0.7)
+
+    # Blend: A screen B, then difference with C
+    result = blend_canvas(canvas_a, canvas_b, "screen", 0.8)
+    result = blend_canvas(result, canvas_c, "difference", 0.5)
+
+    # Beat-triggered kaleidoscope
+    if f.get("bdecay", 0) > 0.3:
+        folds = 6 if f.get("sub_r", 0.3) > 0.4 else 8
+        result = sh_kaleidoscope(result.copy(), folds=folds)
+
+    return result
+
+# Scene table entry with feedback:
+# {"start": 30.0, "end": 50.0, "name": "cathedral", "fx": fx_cathedral,
+#  "gamma": 0.65, "shaders": [("bloom", {"thr": 110}), ("chromatic", {"amt": 4}),
+#                              ("vignette", {"s": 0.2}), ("grain", {"amt": 8})],
+#  "feedback": {"decay": 0.75, "blend": "screen", "opacity": 0.35,
+#               "transform": "zoom", "transform_amt": 0.012, "hue_shift": 0.015}}
+```
+
+### Masked Reaction-Diffusion with Attractor Overlay
+
+Reaction-diffusion visible only through an animated iris mask, with a strange attractor density field underneath.
+
+```python
+def fx_masked_life(r, f, t, S):
+    """Attractor base + reaction-diffusion visible through iris mask + particles."""
+    g_sm = r.get_grid("sm")
+    g_md = r.get_grid("md")
+
+    # Layer 1: strange attractor density field (background)
+    canvas_bg = _render_vf(r, "sm",
+        lambda g, f, t, S: vf_strange_attractor(g, f, t, S,
+            attractor="clifford", n_points=30000),
+        hf_time_cycle(0.04), PAL_DOTS, f, t, S, sat=0.5)
+
+    # Layer 2: reaction-diffusion (foreground, will be masked)
+    canvas_rd = _render_vf(r, "md",
+        lambda g, f, t, S: vf_reaction_diffusion(g, f, t, S,
+            feed=0.046, kill=0.063, steps_per_frame=4, init_mode="ring"),
+        hf_angle(0.15), PAL_HALFFILL, f, t, S, sat=0.85)
+
+    # Animated iris mask — opens over first 5 seconds of scene
+    scene_start = S.get("_scene_start", t)
+    if "_scene_start" not in S:
+        S["_scene_start"] = t
+    mask = mask_iris(g_md, t, scene_start, scene_start + 5.0,
+                     max_radius=0.6)
+    canvas_rd = apply_mask_canvas(canvas_rd, mask, bg_canvas=canvas_bg)
+
+    # Layer 3: flow-field particles following the R-D gradient
+    rd_field = vf_reaction_diffusion(g_sm, f, t, S,
+        feed=0.046, kill=0.063, steps_per_frame=0)  # read without stepping
+    ch_p, co_p = update_flow_particles(S, g_sm, f, rd_field,
+        n=300, speed=0.8, char_set=list("·•◦∘°"))
+    canvas_p = g_sm.render(ch_p, co_p)
+
+    result = blend_canvas(canvas_rd, canvas_p, "add", 0.7)
+    return result
+```
+
+### Morphing Field Sequence with Eased Keyframes
+
+Demonstrates temporal coherence: smooth morphing between effects with keyframed parameters.
+
+```python
+def fx_morphing_journey(r, f, t, S):
+    """Morphs through 4 value fields over 20 seconds with eased transitions.
+    Parameters (twist, arm count) also keyframed."""
+    # Keyframed twist parameter
+    twist = keyframe(t, [(0, 1.0), (5, 5.0), (10, 2.0), (15, 8.0), (20, 1.0)],
+                     ease_fn=ease_in_out_cubic, loop=True)
+
+    # Sequence of value fields with 2s crossfade
+    fields = [
+        lambda g, f, t, S: vf_plasma(g, f, t, S),
+        lambda g, f, t, S: vf_vortex(g, f, t, S, twist=twist),
+        lambda g, f, t, S: vf_fbm(g, f, t, S, octaves=5, freq=0.04),
+        lambda g, f, t, S: vf_domain_warp(g, f, t, S, warp_strength=15),
+    ]
+    durations = [5.0, 5.0, 5.0, 5.0]
+
+    val_fn = lambda g, f, t, S: vf_sequence(g, f, t, S, fields, durations,
+                                             crossfade=2.0)
+
+    # Render with slowly rotating hue
+    canvas = _render_vf(r, "md", val_fn, hf_time_cycle(0.06),
+                        PAL_DENSE, f, t, S, sat=0.8)
+
+    # Second layer: tiled version of same sequence at smaller grid
+    tiled_fn = lambda g, f, t, S: vf_sequence(
+        make_tgrid(g, *uv_tile(g, 3, 3, mirror=True)),
+        f, t, S, fields, durations, crossfade=2.0)
+    canvas_b = _render_vf(r, "sm", tiled_fn, hf_angle(0.1),
+                          PAL_RUNE, f, t, S, sat=0.6)
+
+    return blend_canvas(canvas, canvas_b, "screen", 0.5)
+```
+
+---
+
+### Specialized — Unique State Patterns
+
+### Game of Life with Ghost Trails
+
+Cellular automaton with analog fade trails. Beat injects random cells.
+
+```python
+def fx_life(r, f, t, S):
+    """Conway's Game of Life with fading ghost trails.
+    Beat events inject random live cells for disruption."""
+    canvas = _render_vf(r, "sm",
+        lambda g, f, t, S: vf_game_of_life(g, f, t, S,
+            rule="life", steps_per_frame=1, fade=0.92, density=0.25),
+        hf_fixed(0.33), PAL_BLOCKS, f, t, S, sat=0.8)
+
+    # Overlay: coral automaton on lg grid for chunky texture
+    canvas_b = _render_vf(r, "lg",
+        lambda g, f, t, S: vf_game_of_life(g, f, t, S,
+            rule="coral", steps_per_frame=1, fade=0.85, density=0.15, seed=99),
+        hf_time_cycle(0.1), PAL_HATCH, f, t, S, sat=0.6)
+
+    return blend_canvas(canvas, canvas_b, "screen", 0.5)
+```
+
+### Boids Flock Over Voronoi
+
+Emergent swarm movement over a cellular background.
+
+```python
+def fx_boid_swarm(r, f, t, S):
+    """Flocking boids over animated voronoi cells."""
+    # Background: voronoi cells
+    canvas_bg = _render_vf(r, "md",
+        lambda g, f, t, S: vf_voronoi(g, f, t, S,
+            n_cells=20, mode="distance", speed=0.2),
+        hf_distance(0.4, 0.02), PAL_CIRCUIT, f, t, S, sat=0.5)
+
+    # Foreground: boids
+    g = r.get_grid("md")
+    ch_b, co_b = update_boids(S, g, f, n_boids=150, perception=6.0,
+                              max_speed=1.5, char_set=list("▸▹►▻→⟶"))
+    canvas_boids = g.render(ch_b, co_b)
+
+    # Trails for the boids
+    # (boid positions are stored in S["boid_x"], S["boid_y"])
+    S["px"] = list(S.get("boid_x", []))
+    S["py"] = list(S.get("boid_y", []))
+    ch_t, co_t = draw_particle_trails(S, g, max_trail=6, fade=0.6)
+    canvas_trails = g.render(ch_t, co_t)
+
+    result = blend_canvas(canvas_bg, canvas_trails, "add", 0.3)
+    result = blend_canvas(result, canvas_boids, "add", 0.9)
+    return result
+```
+
+### Fire Rising Through SDF Text Stencil
+
+Fire effect visible only through text letterforms.
+
+```python
+def fx_fire_text(r, f, t, S):
+    """Fire columns visible through text stencil. Text acts as window."""
+    g = r.get_grid("lg")
+
+    # Full-screen fire (will be masked)
+    canvas_fire = _render_vf(r, "sm",
+        lambda g, f, t, S: np.clip(
+            vf_fbm(g, f, t, S, octaves=4, freq=0.08, speed=0.8) *
+            (1.0 - g.rr / g.rows) *  # fade toward top
+            (0.6 + f.get("bass", 0.3) * 0.8), 0, 1),
+        hf_fixed(0.05), PAL_BLOCKS, f, t, S, sat=0.9)  # fire hue
+
+    # Background: dark domain warp
+    canvas_bg = _render_vf(r, "md",
+        lambda g, f, t, S: vf_domain_warp(g, f, t, S,
+            warp_strength=8, freq=0.03, speed=0.05) * 0.3,
+        hf_fixed(0.6), PAL_DENSE, f, t, S, sat=0.4)
+
+    # Text stencil mask
+    mask = mask_text(g, "FIRE", row_frac=0.45)
+    # Expand vertically for multi-row coverage
+    for offset in range(-2, 3):
+        shifted = mask_text(g, "FIRE", row_frac=0.45 + offset / g.rows)
+        mask = mask_union(mask, shifted)
+
+    canvas_masked = apply_mask_canvas(canvas_fire, mask, bg_canvas=canvas_bg)
+    return canvas_masked
+```
+
+### Portrait Mode: Vertical Rain + Quote
+
+Optimized for 9:16. Uses vertical space for long rain trails and stacked text.
+
+```python
+def fx_portrait_rain_quote(r, f, t, S):
+    """Portrait-optimized: matrix rain (long vertical trails) with stacked quote.
+    Designed for 1080x1920 (9:16)."""
+    g = r.get_grid("md")  # ~112x100 in portrait
+
+    # Matrix rain — long trails benefit from portrait's extra rows
+    ch, co, S = eff_matrix_rain(g, f, t, S,
+        hue=0.33, bri=0.6, pal=PAL_KATA, speed_base=0.4, speed_beat=2.5)
+    canvas_rain = g.render(ch, co)
+
+    # Tunnel depth underneath for texture
+    canvas_tunnel = _render_vf(r, "sm",
+        lambda g, f, t, S: vf_tunnel(g, f, t, S, speed=3.0, complexity=6) * 0.8,
+        hf_fixed(0.33), PAL_BLOCKS, f, t, S, sat=0.5)
+
+    result = blend_canvas(canvas_tunnel, canvas_rain, "screen", 0.8)
+
+    # Quote text — portrait layout: short lines, many of them
+    g_text = r.get_grid("lg")  # ~90x80 in portrait
+    quote_lines = layout_text_portrait(
+        "The code is the art and the art is the code",
+        max_chars_per_line=20)
+    # Center vertically
+    block_start = (g_text.rows - len(quote_lines)) // 2
+    ch_t = np.full((g_text.rows, g_text.cols), " ", dtype="U1")
+    co_t = np.zeros((g_text.rows, g_text.cols, 3), dtype=np.uint8)
+    total_chars = sum(len(l) for l in quote_lines)
+    progress = min(1.0, (t - S.get("_scene_start", t)) / 3.0)
+    if "_scene_start" not in S: S["_scene_start"] = t
+    render_typewriter(ch_t, co_t, quote_lines, block_start, g_text.cols,
+                      progress, total_chars, (200, 255, 220), t)
+    canvas_text = g_text.render(ch_t, co_t)
+
+    result = blend_canvas(result, canvas_text, "add", 0.9)
+    return result
+```
+
+---
+
+### Scene Table Template
+
+Wire scenes into a complete video:
+
+```python
+SCENES = [
+    {"start": 0.0,  "end": 5.0,  "name": "coral",
+     "fx": fx_coral, "grid": "sm", "gamma": 0.70,
+     "shaders": [("bloom", {"thr": 110}), ("vignette", {"s": 0.2})],
+     "feedback": {"decay": 0.8, "blend": "screen", "opacity": 0.3,
+                  "transform": "zoom", "transform_amt": 0.01}},
+
+    {"start": 5.0,  "end": 15.0, "name": "tunnel_noise",
+     "fx": fx_tunnel_noise, "grid": "md", "gamma": 0.75,
+     "shaders": [("chromatic", {"amt": 3}), ("bloom", {"thr": 120}),
+                 ("scanlines", {"intensity": 0.06}), ("grain", {"amt": 8})],
+     "feedback": None},
+
+    {"start": 15.0, "end": 35.0, "name": "cathedral",
+     "fx": fx_cathedral, "grid": "sm", "gamma": 0.65,
+     "shaders": [("bloom", {"thr": 100}), ("chromatic", {"amt": 5}),
+                 ("color_wobble", {"amt": 0.2}), ("vignette", {"s": 0.18})],
+     "feedback": {"decay": 0.75, "blend": "screen", "opacity": 0.35,
+                  "transform": "zoom", "transform_amt": 0.012, "hue_shift": 0.015}},
+
+    {"start": 35.0, "end": 50.0, "name": "morphing",
+     "fx": fx_morphing_journey, "grid": "md", "gamma": 0.70,
+     "shaders": [("bloom", {"thr": 110}), ("grain", {"amt": 6})],
+     "feedback": {"decay": 0.7, "blend": "screen", "opacity": 0.25,
+                  "transform": "rotate_cw", "transform_amt": 0.003}},
+]
+```
@@ -2,14 +2,9 @@

 Post-processing effects applied to the pixel canvas (`numpy uint8 array, shape (H,W,3)`) after character rendering and before encoding. Also covers **pixel-level blend modes**, **feedback buffers**, and the **ShaderChain** compositor.

-**Cross-references:**
- Grid system, palettes, color (HSV + OKLAB): `architecture.md`
- Effect building blocks (value fields, noise, SDFs): `effects.md`
- `_render_vf()`, blend modes, tonemap, masking: `composition.md`
- Scene protocol, render_clip, SCENES table: `scenes.md`
- Complete scene examples with shader usage: `examples.md`
- Performance tuning (frame budget, worker count): `optimization.md`
- Encoding pitfalls (ffmpeg flags, color space): `troubleshooting.md`
+> **See also:** composition.md (blend modes, tonemap) · effects.md · scenes.md · architecture.md · optimization.md · troubleshooting.md
+>
+> **Blend modes:** For the 20 pixel blend modes and `blend_canvas()`, see `composition.md`. All blending uses `blend_canvas(base, top, mode, opacity)`.

 ## Design Philosophy

@@ -1,14 +1,19 @@
 # Troubleshooting Reference

-**Cross-references:**
- Grid system, palettes, font selection: `architecture.md`
- Effect building blocks (value fields, noise, SDFs): `effects.md`
- `_render_vf()`, blend modes, tonemap: `composition.md`
- Scene protocol, render_clip, SCENES table: `scenes.md`
- Shader pipeline, feedback buffer, encoding: `shaders.md`
- Input sources (audio, video, TTS): `inputs.md`
- Performance tuning, hardware detection: `optimization.md`
- Complete scene examples: `examples.md`
+> **See also:** composition.md · architecture.md · shaders.md · scenes.md · optimization.md
+
+## Quick Diagnostic
+
+| Symptom | Likely Cause | Fix |
+|---------|-------------|-----|
+| All black output | tonemap gamma too high or no effects rendering | Lower gamma to 0.5, check scene_fn returns non-zero canvas |
+| Washed out / too bright | Linear brightness multiplier instead of tonemap | Replace `canvas * N` with `tonemap(canvas, gamma=0.75)` |
+| ffmpeg hangs mid-render | stderr=subprocess.PIPE deadlock | Redirect stderr to file |
+| "read-only" array error | broadcast_to view without .copy() | Add `.copy()` after broadcast_to |
+| PicklingError | Lambda or closure in SCENES table | Define all fx_* at module level |
+| Random dark holes in output | Font missing Unicode glyphs | Validate palettes at init |
+| Audio-visual desync | Frame timing accumulation | Use integer frame counter, compute t fresh each frame |
+| Single-color flat output | Hue field shape mismatch | Ensure h,s,v arrays all (rows,cols) before hsv2rgb |

 Common bugs, gotchas, and platform-specific issues encountered during ASCII video development.

@@ -339,3 +344,22 @@ val = np.clip(vf_plasma(g, f, t, S) * 1.5, 0, 1)
 ```

 The `_render_vf()` helper clips automatically, but if you're building custom scenes, clip explicitly.
+
+## Brightness Best Practices
+
+- Dense animated backgrounds — never flat black, always fill the grid
+- Vignette minimum clamped to 0.15 (not 0.12)
+- Bloom threshold 130 (not 170) so more pixels contribute to glow
+- Use `screen` blend mode (not `overlay`) for dark ASCII layers — overlay squares dark values: `2 * 0.12 * 0.12 = 0.03`
+- FeedbackBuffer decay minimum 0.5 — below that, feedback disappears too fast to see
+- Value field floor: `vf * 0.8 + 0.05` ensures no cell is truly zero
+- Per-scene gamma overrides: default 0.75, solarize 0.55, posterize 0.50, bright scenes 0.85
+- Test frames early: render single frames at key timestamps before committing to full render
+
+**Quick checklist before full render:**
+1. Render 3 test frames (start, middle, end)
+2. Check `canvas.mean() > 8` after tonemap
+3. Check no scene is visually flat black
+4. Verify per-section variation (different bg/palette/color per scene)
+5. Confirm shader chain includes bloom (threshold 130)
+6. Confirm vignette strength ≤ 0.25
@@ -295,3 +295,97 @@ class TestOnConnect:
        mock_conn = MagicMock(spec=acp.Client)
        agent.on_connect(mock_conn)
        assert agent._conn is mock_conn
+
+
+# ---------------------------------------------------------------------------
+# Slash commands
+# ---------------------------------------------------------------------------
+
+
+class TestSlashCommands:
+    """Test slash command dispatch in the ACP adapter."""
+
+    def _make_state(self, mock_manager):
+        state = mock_manager.create_session(cwd="/tmp")
+        state.agent.model = "test-model"
+        state.agent.provider = "openrouter"
+        state.model = "test-model"
+        return state
+
+    def test_help_lists_commands(self, agent, mock_manager):
+        state = self._make_state(mock_manager)
+        result = agent._handle_slash_command("/help", state)
+        assert result is not None
+        assert "/help" in result
+        assert "/model" in result
+        assert "/tools" in result
+        assert "/reset" in result
+
+    def test_model_shows_current(self, agent, mock_manager):
+        state = self._make_state(mock_manager)
+        result = agent._handle_slash_command("/model", state)
+        assert "test-model" in result
+
+    def test_context_empty(self, agent, mock_manager):
+        state = self._make_state(mock_manager)
+        state.history = []
+        result = agent._handle_slash_command("/context", state)
+        assert "empty" in result.lower()
+
+    def test_context_with_messages(self, agent, mock_manager):
+        state = self._make_state(mock_manager)
+        state.history = [
+            {"role": "user", "content": "hello"},
+            {"role": "assistant", "content": "hi"},
+        ]
+        result = agent._handle_slash_command("/context", state)
+        assert "2 messages" in result
+        assert "user: 1" in result
+
+    def test_reset_clears_history(self, agent, mock_manager):
+        state = self._make_state(mock_manager)
+        state.history = [{"role": "user", "content": "hello"}]
+        result = agent._handle_slash_command("/reset", state)
+        assert "cleared" in result.lower()
+        assert len(state.history) == 0
+
+    def test_version(self, agent, mock_manager):
+        state = self._make_state(mock_manager)
+        result = agent._handle_slash_command("/version", state)
+        assert HERMES_VERSION in result
+
+    def test_unknown_command_returns_none(self, agent, mock_manager):
+        state = self._make_state(mock_manager)
+        result = agent._handle_slash_command("/nonexistent", state)
+        assert result is None
+
+    @pytest.mark.asyncio
+    async def test_slash_command_intercepted_in_prompt(self, agent, mock_manager):
+        """Slash commands should be handled without calling the LLM."""
+        new_resp = await agent.new_session(cwd="/tmp")
+        mock_conn = AsyncMock(spec=acp.Client)
+        agent._conn = mock_conn
+
+        prompt = [TextContentBlock(type="text", text="/help")]
+        resp = await agent.prompt(prompt=prompt, session_id=new_resp.session_id)
+
+        assert resp.stop_reason == "end_turn"
+        mock_conn.session_update.assert_called_once()
+
+    @pytest.mark.asyncio
+    async def test_unknown_slash_falls_through_to_llm(self, agent, mock_manager):
+        """Unknown /commands should be sent to the LLM, not intercepted."""
+        new_resp = await agent.new_session(cwd="/tmp")
+        mock_conn = AsyncMock(spec=acp.Client)
+        agent._conn = mock_conn
+
+        # Mock run_in_executor to avoid actually running the agent
+        with patch("asyncio.get_running_loop") as mock_loop:
+            mock_loop.return_value.run_in_executor = AsyncMock(return_value={
+                "final_response": "I processed /foo",
+                "messages": [],
+            })
+            prompt = [TextContentBlock(type="text", text="/foo bar")]
+            resp = await agent.prompt(prompt=prompt, session_id=new_resp.session_id)
+
+        assert resp.stop_reason == "end_turn"
@@ -26,6 +26,12 @@ def _isolate_hermes_home(tmp_path, monkeypatch):
    (fake_home / "memories").mkdir()
    (fake_home / "skills").mkdir()
    monkeypatch.setenv("HERMES_HOME", str(fake_home))
+    # Reset plugin singleton so tests don't leak plugins from ~/.hermes/plugins/
+    try:
+        import hermes_cli.plugins as _plugins_mod
+        monkeypatch.setattr(_plugins_mod, "_plugin_manager", None)
+    except Exception:
+        pass
    # Tests should not inherit the agent's current gateway/messaging surface.
    # Individual tests that need gateway behavior set these explicitly.
    monkeypatch.delenv("HERMES_SESSION_PLATFORM", raising=False)
@@ -304,17 +304,34 @@ class TestMarkJobRun:


 class TestGetDueJobs:
-    def test_past_due_returned(self, tmp_cron_dir):
+    def test_past_due_within_window_returned(self, tmp_cron_dir):
+        """Jobs less than 2 minutes late are still considered due (not stale)."""
        job = create_job(prompt="Due now", schedule="every 1h")
-        # Force next_run_at to the past
+        # Force next_run_at to just 1 minute ago (within the 2-min window)
        jobs = load_jobs()
-        jobs[0]["next_run_at"] = (datetime.now() - timedelta(minutes=5)).isoformat()
+        jobs[0]["next_run_at"] = (datetime.now() - timedelta(seconds=60)).isoformat()
        save_jobs(jobs)

        due = get_due_jobs()
        assert len(due) == 1
        assert due[0]["id"] == job["id"]

+    def test_stale_past_due_skipped(self, tmp_cron_dir):
+        """Recurring jobs more than 2 minutes late are fast-forwarded, not fired."""
+        job = create_job(prompt="Stale", schedule="every 1h")
+        # Force next_run_at to 5 minutes ago (beyond the 2-min window)
+        jobs = load_jobs()
+        jobs[0]["next_run_at"] = (datetime.now() - timedelta(minutes=5)).isoformat()
+        save_jobs(jobs)
+
+        due = get_due_jobs()
+        assert len(due) == 0
+        # next_run_at should be fast-forwarded to the future
+        updated = get_job(job["id"])
+        from cron.jobs import _ensure_aware, _hermes_now
+        next_dt = _ensure_aware(datetime.fromisoformat(updated["next_run_at"]))
+        assert next_dt > _hermes_now()
+
    def test_future_not_returned(self, tmp_cron_dir):
        create_job(prompt="Not yet", schedule="every 1h")
        due = get_due_jobs()
@@ -65,6 +65,14 @@ class TestHandleBackgroundCommand:
        assert "Usage:" in result
        assert "/background" in result

+    @pytest.mark.asyncio
+    async def test_bg_alias_no_prompt_shows_usage(self):
+        """Running /bg with no prompt shows usage."""
+        runner = _make_runner()
+        event = _make_event(text="/bg")
+        result = await runner._handle_background_command(event)
+        assert "Usage:" in result
+
    @pytest.mark.asyncio
    async def test_empty_prompt_shows_usage(self):
        """Running /background with only whitespace shows usage."""
@@ -264,11 +272,14 @@ class TestBackgroundInHelp:
        assert "/background" in result

    def test_background_is_known_command(self):
-        """The /background command is in the _known_commands set."""
-        from gateway.run import GatewayRunner
-        import inspect
-        source = inspect.getsource(GatewayRunner._handle_message)
-        assert '"background"' in source
+        """The /background command is in GATEWAY_KNOWN_COMMANDS."""
+        from hermes_cli.commands import GATEWAY_KNOWN_COMMANDS
+        assert "background" in GATEWAY_KNOWN_COMMANDS
+
+    def test_bg_alias_is_known_command(self):
+        """The /bg alias is in GATEWAY_KNOWN_COMMANDS."""
+        from hermes_cli.commands import GATEWAY_KNOWN_COMMANDS
+        assert "bg" in GATEWAY_KNOWN_COMMANDS


 # ---------------------------------------------------------------------------
@@ -284,6 +295,11 @@ class TestBackgroundInCLICommands:
        from hermes_cli.commands import COMMANDS
        assert "/background" in COMMANDS

+    def test_bg_alias_in_commands_dict(self):
+        """The /bg alias is in the COMMANDS dict."""
+        from hermes_cli.commands import COMMANDS
+        assert "/bg" in COMMANDS
+
    def test_background_in_session_category(self):
        """The /background command is in the Session category."""
        from hermes_cli.commands import COMMANDS_BY_CATEGORY
@@ -0,0 +1,83 @@
+"""Tests for Discord thread participation persistence.
+
+Verifies that _bot_participated_threads survives adapter restarts by
+being persisted to ~/.hermes/discord_threads.json.
+"""
+
+import json
+import os
+from unittest.mock import patch
+
+import pytest
+
+
+class TestDiscordThreadPersistence:
+    """Thread IDs are saved to disk and reloaded on init."""
+
+    def _make_adapter(self, tmp_path):
+        """Build a minimal DiscordAdapter with HERMES_HOME pointed at tmp_path."""
+        from gateway.config import PlatformConfig
+        from gateway.platforms.discord import DiscordAdapter
+
+        config = PlatformConfig(enabled=True, token="test-token")
+        with patch.dict(os.environ, {"HERMES_HOME": str(tmp_path)}):
+            return DiscordAdapter(config=config)
+
+    def test_starts_empty_when_no_state_file(self, tmp_path):
+        adapter = self._make_adapter(tmp_path)
+        assert adapter._bot_participated_threads == set()
+
+    def test_track_thread_persists_to_disk(self, tmp_path):
+        adapter = self._make_adapter(tmp_path)
+        with patch.dict(os.environ, {"HERMES_HOME": str(tmp_path)}):
+            adapter._track_thread("111")
+            adapter._track_thread("222")
+
+        state_file = tmp_path / "discord_threads.json"
+        assert state_file.exists()
+        saved = json.loads(state_file.read_text())
+        assert set(saved) == {"111", "222"}
+
+    def test_threads_survive_restart(self, tmp_path):
+        """Threads tracked by one adapter instance are visible to the next."""
+        adapter1 = self._make_adapter(tmp_path)
+        with patch.dict(os.environ, {"HERMES_HOME": str(tmp_path)}):
+            adapter1._track_thread("aaa")
+            adapter1._track_thread("bbb")
+
+        adapter2 = self._make_adapter(tmp_path)
+        assert "aaa" in adapter2._bot_participated_threads
+        assert "bbb" in adapter2._bot_participated_threads
+
+    def test_duplicate_track_does_not_double_save(self, tmp_path):
+        adapter = self._make_adapter(tmp_path)
+        with patch.dict(os.environ, {"HERMES_HOME": str(tmp_path)}):
+            adapter._track_thread("111")
+            adapter._track_thread("111")  # no-op
+
+        saved = json.loads((tmp_path / "discord_threads.json").read_text())
+        assert saved.count("111") == 1
+
+    def test_caps_at_max_tracked_threads(self, tmp_path):
+        adapter = self._make_adapter(tmp_path)
+        adapter._MAX_TRACKED_THREADS = 5
+        with patch.dict(os.environ, {"HERMES_HOME": str(tmp_path)}):
+            for i in range(10):
+                adapter._track_thread(str(i))
+
+        assert len(adapter._bot_participated_threads) == 5
+
+    def test_corrupted_state_file_falls_back_to_empty(self, tmp_path):
+        state_file = tmp_path / "discord_threads.json"
+        state_file.write_text("not valid json{{{")
+        adapter = self._make_adapter(tmp_path)
+        assert adapter._bot_participated_threads == set()
+
+    def test_missing_hermes_home_does_not_crash(self, tmp_path):
+        """Load/save tolerate missing directories."""
+        fake_home = tmp_path / "nonexistent" / "deep"
+        with patch.dict(os.environ, {"HERMES_HOME": str(fake_home)}):
+            from gateway.platforms.discord import DiscordAdapter
+            # _load should return empty set, not crash
+            threads = DiscordAdapter._load_participated_threads()
+            assert threads == set()
@@ -0,0 +1,317 @@
+"""
+Tests for extract_local_files() — auto-detection of bare local file paths
+in model response text for native media delivery.
+
+Covers: path matching, code-block exclusion, URL rejection, tilde expansion,
+deduplication, text cleanup, and extension routing.
+
+Based on PR #1636 by sudoingX (salvaged + hardened).
+"""
+
+import os
+from unittest.mock import patch
+
+import pytest
+
+from gateway.platforms.base import BasePlatformAdapter
+
+
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+
+def _extract(content: str, existing_files: set[str] | None = None):
+    """
+    Run extract_local_files with os.path.isfile mocked to return True
+    for any path in *existing_files* (expanded form).  If *existing_files*
+    is None every path passes.
+    """
+    existing = existing_files
+
+    def fake_isfile(p):
+        if existing is None:
+            return True
+        return p in existing
+
+    def fake_expanduser(p):
+        if p.startswith("~/"):
+            return "/home/user" + p[1:]
+        return p
+
+    with patch("os.path.isfile", side_effect=fake_isfile), \
+         patch("os.path.expanduser", side_effect=fake_expanduser):
+        return BasePlatformAdapter.extract_local_files(content)
+
+
+# ---------------------------------------------------------------------------
+# Basic detection
+# ---------------------------------------------------------------------------
+
+class TestBasicDetection:
+
+    def test_absolute_path_image(self):
+        paths, cleaned = _extract("Here is the screenshot /root/screenshots/game.png enjoy")
+        assert paths == ["/root/screenshots/game.png"]
+        assert "/root/screenshots/game.png" not in cleaned
+        assert "Here is the screenshot" in cleaned
+
+    def test_tilde_path_image(self):
+        paths, cleaned = _extract("Check out ~/photos/cat.jpg for the cat")
+        assert paths == ["/home/user/photos/cat.jpg"]
+        assert "~/photos/cat.jpg" not in cleaned
+
+    def test_video_extensions(self):
+        for ext in (".mp4", ".mov", ".avi", ".mkv", ".webm"):
+            text = f"Video at /tmp/clip{ext} here"
+            paths, _ = _extract(text)
+            assert len(paths) == 1, f"Failed for {ext}"
+            assert paths[0] == f"/tmp/clip{ext}"
+
+    def test_image_extensions(self):
+        for ext in (".png", ".jpg", ".jpeg", ".gif", ".webp"):
+            text = f"Image at /tmp/pic{ext} here"
+            paths, _ = _extract(text)
+            assert len(paths) == 1, f"Failed for {ext}"
+            assert paths[0] == f"/tmp/pic{ext}"
+
+    def test_case_insensitive_extension(self):
+        paths, _ = _extract("See /tmp/PHOTO.PNG and /tmp/vid.MP4 now")
+        assert len(paths) == 2
+
+    def test_multiple_paths(self):
+        text = "First /tmp/a.png then /tmp/b.jpg and /tmp/c.mp4 done"
+        paths, cleaned = _extract(text)
+        assert len(paths) == 3
+        assert "/tmp/a.png" in paths
+        assert "/tmp/b.jpg" in paths
+        assert "/tmp/c.mp4" in paths
+        for p in paths:
+            assert p not in cleaned
+
+    def test_path_at_line_start(self):
+        paths, _ = _extract("/var/data/image.png")
+        assert paths == ["/var/data/image.png"]
+
+    def test_path_at_end_of_line(self):
+        paths, _ = _extract("saved to /var/data/image.png")
+        assert paths == ["/var/data/image.png"]
+
+    def test_path_with_dots_in_directory(self):
+        paths, _ = _extract("See /opt/my.app/assets/logo.png here")
+        assert paths == ["/opt/my.app/assets/logo.png"]
+
+    def test_path_with_hyphens(self):
+        paths, _ = _extract("File at /tmp/my-screenshot-2024.png done")
+        assert paths == ["/tmp/my-screenshot-2024.png"]
+
+
+# ---------------------------------------------------------------------------
+# Non-existent files are skipped
+# ---------------------------------------------------------------------------
+
+class TestIsfileGuard:
+
+    def test_nonexistent_path_skipped(self):
+        """Paths that don't exist on disk are not extracted."""
+        paths, cleaned = _extract(
+            "See /tmp/nope.png here",
+            existing_files=set(),  # nothing exists
+        )
+        assert paths == []
+        assert "/tmp/nope.png" in cleaned  # not stripped
+
+    def test_only_existing_paths_extracted(self):
+        """Mix of existing and non-existing — only existing are returned."""
+        paths, cleaned = _extract(
+            "A /tmp/real.png and /tmp/fake.jpg end",
+            existing_files={"/tmp/real.png"},
+        )
+        assert paths == ["/tmp/real.png"]
+        assert "/tmp/real.png" not in cleaned
+        assert "/tmp/fake.jpg" in cleaned
+
+
+# ---------------------------------------------------------------------------
+# URL false-positive prevention
+# ---------------------------------------------------------------------------
+
+class TestURLRejection:
+
+    def test_https_url_not_matched(self):
+        """Paths embedded in HTTP URLs must not be extracted."""
+        paths, cleaned = _extract("Visit https://example.com/images/photo.png for details")
+        # The regex lookbehind should prevent matching the URL's path segment
+        # Even if it did match, isfile would be False for /images/photo.png
+        # (we mock isfile to True-for-all here, so the lookbehind is the guard)
+        assert paths == []
+        assert "https://example.com/images/photo.png" in cleaned
+
+    def test_http_url_not_matched(self):
+        paths, _ = _extract("See http://cdn.example.com/assets/banner.jpg here")
+        assert paths == []
+
+    def test_file_url_not_matched(self):
+        paths, _ = _extract("Open file:///home/user/doc.png in browser")
+        # file:// has :// before /home so lookbehind blocks it
+        assert paths == []
+
+
+# ---------------------------------------------------------------------------
+# Code block exclusion
+# ---------------------------------------------------------------------------
+
+class TestCodeBlockExclusion:
+
+    def test_fenced_code_block_skipped(self):
+        text = "Here's how:\n```python\nimg = open('/tmp/image.png')\n```\nDone."
+        paths, cleaned = _extract(text)
+        assert paths == []
+        assert "/tmp/image.png" in cleaned  # not stripped
+
+    def test_inline_code_skipped(self):
+        text = "Use the path `/tmp/image.png` in your config"
+        paths, cleaned = _extract(text)
+        assert paths == []
+        assert "`/tmp/image.png`" in cleaned
+
+    def test_path_outside_code_block_still_matched(self):
+        text = (
+            "```\ncode: /tmp/inside.png\n```\n"
+            "But this one is real: /tmp/outside.png"
+        )
+        paths, _ = _extract(text, existing_files={"/tmp/outside.png"})
+        assert paths == ["/tmp/outside.png"]
+
+    def test_mixed_inline_code_and_bare_path(self):
+        text = "Config uses `/etc/app/bg.png` but output is /tmp/result.jpg"
+        paths, cleaned = _extract(text, existing_files={"/tmp/result.jpg"})
+        assert paths == ["/tmp/result.jpg"]
+        assert "`/etc/app/bg.png`" in cleaned
+        assert "/tmp/result.jpg" not in cleaned
+
+    def test_multiline_fenced_block(self):
+        text = (
+            "```bash\n"
+            "cp /source/a.png /dest/b.png\n"
+            "mv /source/c.mp4 /dest/d.mp4\n"
+            "```\n"
+            "Files are ready."
+        )
+        paths, _ = _extract(text)
+        assert paths == []
+
+
+# ---------------------------------------------------------------------------
+# Deduplication
+# ---------------------------------------------------------------------------
+
+class TestDeduplication:
+
+    def test_duplicate_paths_deduplicated(self):
+        text = "See /tmp/img.png and also /tmp/img.png again"
+        paths, _ = _extract(text)
+        assert paths == ["/tmp/img.png"]
+
+    def test_tilde_and_expanded_same_file(self):
+        """~/photos/a.png and /home/user/photos/a.png are the same file."""
+        text = "See ~/photos/a.png and /home/user/photos/a.png here"
+        paths, _ = _extract(text, existing_files={"/home/user/photos/a.png"})
+        assert len(paths) == 1
+        assert paths[0] == "/home/user/photos/a.png"
+
+
+# ---------------------------------------------------------------------------
+# Text cleanup
+# ---------------------------------------------------------------------------
+
+class TestTextCleanup:
+
+    def test_path_removed_from_text(self):
+        paths, cleaned = _extract("Before /tmp/x.png after")
+        assert "Before" in cleaned
+        assert "after" in cleaned
+        assert "/tmp/x.png" not in cleaned
+
+    def test_excessive_blank_lines_collapsed(self):
+        text = "Before\n\n\n/tmp/x.png\n\n\nAfter"
+        _, cleaned = _extract(text)
+        assert "\n\n\n" not in cleaned
+
+    def test_no_paths_text_unchanged(self):
+        text = "This is a normal response with no file paths."
+        paths, cleaned = _extract(text)
+        assert paths == []
+        assert cleaned == text
+
+    def test_tilde_form_cleaned_from_text(self):
+        """The raw ~/... form should be removed, not the expanded /home/user/... form."""
+        text = "Output saved to ~/result.png for review"
+        paths, cleaned = _extract(text)
+        assert paths == ["/home/user/result.png"]
+        assert "~/result.png" not in cleaned
+
+    def test_only_path_in_text(self):
+        """If the response is just a path, cleaned text is empty."""
+        paths, cleaned = _extract("/tmp/screenshot.png")
+        assert paths == ["/tmp/screenshot.png"]
+        assert cleaned == ""
+
+
+# ---------------------------------------------------------------------------
+# Edge cases
+# ---------------------------------------------------------------------------
+
+class TestEdgeCases:
+
+    def test_empty_string(self):
+        paths, cleaned = _extract("")
+        assert paths == []
+        assert cleaned == ""
+
+    def test_no_media_extensions(self):
+        """Non-media extensions should not be matched."""
+        paths, _ = _extract("See /tmp/data.csv and /tmp/script.py and /tmp/notes.txt")
+        assert paths == []
+
+    def test_path_with_spaces_not_matched(self):
+        """Paths with spaces are intentionally not matched (avoids false positives)."""
+        paths, _ = _extract("File at /tmp/my file.png here")
+        assert paths == []
+
+    def test_windows_path_not_matched(self):
+        """Windows-style paths should not match."""
+        paths, _ = _extract("See C:\\Users\\test\\image.png")
+        assert paths == []
+
+    def test_relative_path_not_matched(self):
+        """Relative paths like ./image.png should not match."""
+        paths, _ = _extract("File at ./screenshots/image.png here")
+        assert paths == []
+
+    def test_bare_filename_not_matched(self):
+        """Just 'image.png' without a path should not match."""
+        paths, _ = _extract("Open image.png to see")
+        assert paths == []
+
+    def test_path_followed_by_punctuation(self):
+        """Path followed by comma, period, paren should still match."""
+        for suffix in [",", ".", ")", ":", ";"]:
+            text = f"See /tmp/img.png{suffix} details"
+            paths, _ = _extract(text)
+            assert len(paths) == 1, f"Failed with suffix '{suffix}'"
+
+    def test_path_in_parentheses(self):
+        paths, _ = _extract("(see /tmp/img.png)")
+        assert paths == ["/tmp/img.png"]
+
+    def test_path_in_quotes(self):
+        paths, _ = _extract('The file is "/tmp/img.png" right here')
+        assert paths == ["/tmp/img.png"]
+
+    def test_deep_nested_path(self):
+        paths, _ = _extract("At /a/b/c/d/e/f/g/h/image.png end")
+        assert paths == ["/a/b/c/d/e/f/g/h/image.png"]
+
+
+if __name__ == "__main__":
+    pytest.main([__file__, "-v"])
@@ -0,0 +1,156 @@
+"""Tests for PII redaction in gateway session context prompts."""
+
+from gateway.session import (
+    SessionContext,
+    SessionSource,
+    build_session_context_prompt,
+    _hash_id,
+    _hash_sender_id,
+    _hash_chat_id,
+    _looks_like_phone,
+)
+from gateway.config import Platform, HomeChannel
+
+
+# ---------------------------------------------------------------------------
+# Low-level helpers
+# ---------------------------------------------------------------------------
+
+class TestHashHelpers:
+    def test_hash_id_deterministic(self):
+        assert _hash_id("12345") == _hash_id("12345")
+
+    def test_hash_id_12_hex_chars(self):
+        h = _hash_id("user-abc")
+        assert len(h) == 12
+        assert all(c in "0123456789abcdef" for c in h)
+
+    def test_hash_sender_id_prefix(self):
+        assert _hash_sender_id("12345").startswith("user_")
+        assert len(_hash_sender_id("12345")) == 17  # "user_" + 12
+
+    def test_hash_chat_id_preserves_prefix(self):
+        result = _hash_chat_id("telegram:12345")
+        assert result.startswith("telegram:")
+        assert "12345" not in result
+
+    def test_hash_chat_id_no_prefix(self):
+        result = _hash_chat_id("12345")
+        assert len(result) == 12
+        assert "12345" not in result
+
+    def test_looks_like_phone(self):
+        assert _looks_like_phone("+15551234567")
+        assert _looks_like_phone("15551234567")
+        assert _looks_like_phone("+1-555-123-4567")
+        assert not _looks_like_phone("alice")
+        assert not _looks_like_phone("user-123")
+        assert not _looks_like_phone("")
+
+
+# ---------------------------------------------------------------------------
+# Integration: build_session_context_prompt
+# ---------------------------------------------------------------------------
+
+def _make_context(
+    user_id="user-123",
+    user_name=None,
+    chat_id="telegram:99999",
+    platform=Platform.TELEGRAM,
+    home_channels=None,
+):
+    source = SessionSource(
+        platform=platform,
+        chat_id=chat_id,
+        chat_type="dm",
+        user_id=user_id,
+        user_name=user_name,
+    )
+    return SessionContext(
+        source=source,
+        connected_platforms=[platform],
+        home_channels=home_channels or {},
+    )
+
+
+class TestBuildSessionContextPromptRedaction:
+    def test_no_redaction_by_default(self):
+        ctx = _make_context(user_id="user-123")
+        prompt = build_session_context_prompt(ctx)
+        assert "user-123" in prompt
+
+    def test_user_id_hashed_when_redact_pii(self):
+        ctx = _make_context(user_id="user-123")
+        prompt = build_session_context_prompt(ctx, redact_pii=True)
+        assert "user-123" not in prompt
+        assert "user_" in prompt  # hashed ID present
+
+    def test_user_name_not_redacted(self):
+        ctx = _make_context(user_id="user-123", user_name="Alice")
+        prompt = build_session_context_prompt(ctx, redact_pii=True)
+        assert "Alice" in prompt
+        # user_id should not appear when user_name is present (name takes priority)
+        assert "user-123" not in prompt
+
+    def test_home_channel_id_hashed(self):
+        hc = {
+            Platform.TELEGRAM: HomeChannel(
+                platform=Platform.TELEGRAM,
+                chat_id="telegram:99999",
+                name="Home Chat",
+            )
+        }
+        ctx = _make_context(home_channels=hc)
+        prompt = build_session_context_prompt(ctx, redact_pii=True)
+        assert "99999" not in prompt
+        assert "telegram:" in prompt  # prefix preserved
+        assert "Home Chat" in prompt  # name not redacted
+
+    def test_home_channel_id_preserved_without_redaction(self):
+        hc = {
+            Platform.TELEGRAM: HomeChannel(
+                platform=Platform.TELEGRAM,
+                chat_id="telegram:99999",
+                name="Home Chat",
+            )
+        }
+        ctx = _make_context(home_channels=hc)
+        prompt = build_session_context_prompt(ctx, redact_pii=False)
+        assert "99999" in prompt
+
+    def test_redaction_is_deterministic(self):
+        ctx = _make_context(user_id="+15551234567")
+        prompt1 = build_session_context_prompt(ctx, redact_pii=True)
+        prompt2 = build_session_context_prompt(ctx, redact_pii=True)
+        assert prompt1 == prompt2
+
+    def test_different_ids_produce_different_hashes(self):
+        ctx1 = _make_context(user_id="user-A")
+        ctx2 = _make_context(user_id="user-B")
+        p1 = build_session_context_prompt(ctx1, redact_pii=True)
+        p2 = build_session_context_prompt(ctx2, redact_pii=True)
+        assert p1 != p2
+
+    def test_discord_ids_not_redacted_even_with_flag(self):
+        """Discord needs real IDs for <@user_id> mentions."""
+        ctx = _make_context(user_id="123456789", platform=Platform.DISCORD)
+        prompt = build_session_context_prompt(ctx, redact_pii=True)
+        assert "123456789" in prompt
+
+    def test_whatsapp_ids_redacted(self):
+        ctx = _make_context(user_id="+15551234567", platform=Platform.WHATSAPP)
+        prompt = build_session_context_prompt(ctx, redact_pii=True)
+        assert "+15551234567" not in prompt
+        assert "user_" in prompt
+
+    def test_signal_ids_redacted(self):
+        ctx = _make_context(user_id="+15551234567", platform=Platform.SIGNAL)
+        prompt = build_session_context_prompt(ctx, redact_pii=True)
+        assert "+15551234567" not in prompt
+        assert "user_" in prompt
+
+    def test_slack_ids_not_redacted(self):
+        """Slack may need IDs for mentions too."""
+        ctx = _make_context(user_id="U12345ABC", platform=Platform.SLACK)
+        prompt = build_session_context_prompt(ctx, redact_pii=True)
+        assert "U12345ABC" in prompt
@@ -0,0 +1,89 @@
+import pytest
+
+from gateway.config import GatewayConfig, Platform, PlatformConfig
+from gateway.platforms.base import BasePlatformAdapter
+from gateway.run import GatewayRunner
+from gateway.status import read_runtime_status
+
+
+class _RetryableFailureAdapter(BasePlatformAdapter):
+    def __init__(self):
+        super().__init__(PlatformConfig(enabled=True, token="***"), Platform.TELEGRAM)
+
+    async def connect(self) -> bool:
+        self._set_fatal_error(
+            "telegram_connect_error",
+            "Telegram startup failed: temporary DNS resolution failure.",
+            retryable=True,
+        )
+        return False
+
+    async def disconnect(self) -> None:
+        self._mark_disconnected()
+
+    async def send(self, chat_id, content, reply_to=None, metadata=None):
+        raise NotImplementedError
+
+    async def get_chat_info(self, chat_id):
+        return {"id": chat_id}
+
+
+class _DisabledAdapter(BasePlatformAdapter):
+    def __init__(self):
+        super().__init__(PlatformConfig(enabled=False, token="***"), Platform.TELEGRAM)
+
+    async def connect(self) -> bool:
+        raise AssertionError("connect should not be called for disabled platforms")
+
+    async def disconnect(self) -> None:
+        self._mark_disconnected()
+
+    async def send(self, chat_id, content, reply_to=None, metadata=None):
+        raise NotImplementedError
+
+    async def get_chat_info(self, chat_id):
+        return {"id": chat_id}
+
+
+@pytest.mark.asyncio
+async def test_runner_returns_failure_for_retryable_startup_errors(monkeypatch, tmp_path):
+    monkeypatch.setenv("HERMES_HOME", str(tmp_path))
+    config = GatewayConfig(
+        platforms={
+            Platform.TELEGRAM: PlatformConfig(enabled=True, token="***")
+        },
+        sessions_dir=tmp_path / "sessions",
+    )
+    runner = GatewayRunner(config)
+
+    monkeypatch.setattr(runner, "_create_adapter", lambda platform, platform_config: _RetryableFailureAdapter())
+
+    ok = await runner.start()
+
+    assert ok is False
+    assert runner.should_exit_cleanly is False
+    state = read_runtime_status()
+    assert state["gateway_state"] == "startup_failed"
+    assert "temporary DNS resolution failure" in state["exit_reason"]
+    assert state["platforms"]["telegram"]["state"] == "fatal"
+    assert state["platforms"]["telegram"]["error_code"] == "telegram_connect_error"
+
+
+@pytest.mark.asyncio
+async def test_runner_allows_cron_only_mode_when_no_platforms_are_enabled(monkeypatch, tmp_path):
+    monkeypatch.setenv("HERMES_HOME", str(tmp_path))
+    config = GatewayConfig(
+        platforms={
+            Platform.TELEGRAM: PlatformConfig(enabled=False, token="***")
+        },
+        sessions_dir=tmp_path / "sessions",
+    )
+    runner = GatewayRunner(config)
+
+    ok = await runner.start()
+
+    assert ok is True
+    assert runner.should_exit_cleanly is False
+    assert runner.adapters == {}
+    state = read_runtime_status()
+    assert state["gateway_state"] == "running"
@@ -44,6 +44,26 @@ class TestGatewayPidState:


 class TestGatewayRuntimeStatus:
+    def test_write_runtime_status_overwrites_stale_pid_on_restart(self, tmp_path, monkeypatch):
+        """Regression: setdefault() preserved stale PID from previous process (#1631)."""
+        monkeypatch.setenv("HERMES_HOME", str(tmp_path))
+
+        # Simulate a previous gateway run that left a state file with a stale PID
+        state_path = tmp_path / "gateway_state.json"
+        state_path.write_text(json.dumps({
+            "pid": 99999,
+            "start_time": 1000.0,
+            "kind": "hermes-gateway",
+            "platforms": {},
+            "updated_at": "2025-01-01T00:00:00Z",
+        }))
+
+        status.write_runtime_status(gateway_state="running")
+
+        payload = status.read_runtime_status()
+        assert payload["pid"] == os.getpid(), "PID should be overwritten, not preserved via setdefault"
+        assert payload["start_time"] != 1000.0, "start_time should be overwritten on restart"
+
    def test_write_runtime_status_records_platform_failure(self, tmp_path, monkeypatch):
        monkeypatch.setenv("HERMES_HOME", str(tmp_path))

@@ -100,6 +100,39 @@ async def test_polling_conflict_stops_polling_and_notifies_handler(monkeypatch):
    fatal_handler.assert_awaited_once()


+@pytest.mark.asyncio
+async def test_connect_marks_retryable_fatal_error_for_startup_network_failure(monkeypatch):
+    adapter = TelegramAdapter(PlatformConfig(enabled=True, token="***"))
+
+    monkeypatch.setattr(
+        "gateway.status.acquire_scoped_lock",
+        lambda scope, identity, metadata=None: (True, None),
+    )
+    monkeypatch.setattr(
+        "gateway.status.release_scoped_lock",
+        lambda scope, identity: None,
+    )
+
+    builder = MagicMock()
+    builder.token.return_value = builder
+    app = SimpleNamespace(
+        bot=SimpleNamespace(),
+        updater=SimpleNamespace(),
+        add_handler=MagicMock(),
+        initialize=AsyncMock(side_effect=RuntimeError("Temporary failure in name resolution")),
+        start=AsyncMock(),
+    )
+    builder.build.return_value = app
+    monkeypatch.setattr("gateway.platforms.telegram.Application", SimpleNamespace(builder=MagicMock(return_value=builder)))
+
+    ok = await adapter.connect()
+
+    assert ok is False
+    assert adapter.fatal_error_code == "telegram_connect_error"
+    assert adapter.fatal_error_retryable is True
+    assert "Temporary failure in name resolution" in adapter.fatal_error_message
+
+
@pytest.mark.asyncio
 async def test_disconnect_skips_inactive_updater_and_app(monkeypatch):
    adapter = TelegramAdapter(PlatformConfig(enabled=True, token="***"))
@@ -475,16 +475,15 @@ class TestDiscordPlayTtsSkip:
 class TestVoiceInHelp:

    def test_voice_in_help_output(self):
-        from gateway.run import GatewayRunner
-        import inspect
-        source = inspect.getsource(GatewayRunner._handle_help_command)
-        assert "/voice" in source
+        """The gateway help text includes /voice (generated from registry)."""
+        from hermes_cli.commands import gateway_help_lines
+        help_text = "\n".join(gateway_help_lines())
+        assert "/voice" in help_text

    def test_voice_is_known_command(self):
-        from gateway.run import GatewayRunner
-        import inspect
-        source = inspect.getsource(GatewayRunner._handle_message)
-        assert '"voice"' in source
+        """The /voice command is in GATEWAY_KNOWN_COMMANDS."""
+        from hermes_cli.commands import GATEWAY_KNOWN_COMMANDS
+        assert "voice" in GATEWAY_KNOWN_COMMANDS


 # =====================================================================
@@ -1,19 +1,22 @@
-"""Tests for shared slash command definitions and autocomplete."""
+"""Tests for the central command registry and autocomplete."""

 from prompt_toolkit.completion import CompleteEvent
 from prompt_toolkit.document import Document

-from hermes_cli.commands import COMMANDS, SlashCommandCompleter
-
-
-# All commands that must be present in the shared COMMANDS dict.
-EXPECTED_COMMANDS = {
-    "/help", "/tools", "/toolsets", "/model", "/provider", "/prompt",
-    "/personality", "/clear", "/history", "/new", "/reset", "/retry",
-    "/undo", "/save", "/config", "/cron", "/skills", "/platforms",
-    "/verbose", "/reasoning", "/compress", "/title", "/usage", "/insights", "/paste",
-    "/reload-mcp", "/rollback", "/background", "/skin", "/voice", "/quit",
-}
+from hermes_cli.commands import (
+    COMMAND_REGISTRY,
+    COMMANDS,
+    COMMANDS_BY_CATEGORY,
+    CommandDef,
+    GATEWAY_KNOWN_COMMANDS,
+    SUBCOMMANDS,
+    SlashCommandAutoSuggest,
+    SlashCommandCompleter,
+    gateway_help_lines,
+    resolve_command,
+    slack_subcommand_map,
+    telegram_bot_commands,
+)


 def _completions(completer: SlashCommandCompleter, text: str):
@@ -25,21 +28,200 @@ def _completions(completer: SlashCommandCompleter, text: str):
    )


-class TestCommands:
-    def test_shared_commands_include_cli_specific_entries(self):
-        """Entries that previously only existed in cli.py are now in the shared dict."""
-        assert COMMANDS["/paste"] == "Check clipboard for an image and attach it"
-        assert COMMANDS["/reload-mcp"] == "Reload MCP servers from config.yaml"
+# ---------------------------------------------------------------------------
+# CommandDef registry tests
+# ---------------------------------------------------------------------------

-    def test_all_expected_commands_present(self):
-        """Regression guard — every known command must appear in the shared dict."""
-        assert set(COMMANDS.keys()) == EXPECTED_COMMANDS
+class TestCommandRegistry:
+    def test_registry_is_nonempty(self):
+        assert len(COMMAND_REGISTRY) > 30
+
+    def test_every_entry_is_commanddef(self):
+        for entry in COMMAND_REGISTRY:
+            assert isinstance(entry, CommandDef), f"Unexpected type: {type(entry)}"
+
+    def test_no_duplicate_canonical_names(self):
+        names = [cmd.name for cmd in COMMAND_REGISTRY]
+        assert len(names) == len(set(names)), f"Duplicate names: {[n for n in names if names.count(n) > 1]}"
+
+    def test_no_alias_collides_with_canonical_name(self):
+        """An alias must not shadow another command's canonical name."""
+        canonical_names = {cmd.name for cmd in COMMAND_REGISTRY}
+        for cmd in COMMAND_REGISTRY:
+            for alias in cmd.aliases:
+                if alias in canonical_names:
+                    # reset -> new is intentional (reset IS an alias for new)
+                    target = next(c for c in COMMAND_REGISTRY if c.name == alias)
+                    # This should only happen if the alias points to the same entry
+                    assert resolve_command(alias).name == cmd.name or alias == cmd.name, \
+                        f"Alias '{alias}' of '{cmd.name}' shadows canonical '{target.name}'"
+
+    def test_every_entry_has_valid_category(self):
+        valid_categories = {"Session", "Configuration", "Tools & Skills", "Info", "Exit"}
+        for cmd in COMMAND_REGISTRY:
+            assert cmd.category in valid_categories, f"{cmd.name} has invalid category '{cmd.category}'"
+
+    def test_cli_only_and_gateway_only_are_mutually_exclusive(self):
+        for cmd in COMMAND_REGISTRY:
+            assert not (cmd.cli_only and cmd.gateway_only), \
+                f"{cmd.name} cannot be both cli_only and gateway_only"
+
+
+# ---------------------------------------------------------------------------
+# resolve_command tests
+# ---------------------------------------------------------------------------
+
+class TestResolveCommand:
+    def test_canonical_name_resolves(self):
+        assert resolve_command("help").name == "help"
+        assert resolve_command("background").name == "background"
+
+    def test_alias_resolves_to_canonical(self):
+        assert resolve_command("bg").name == "background"
+        assert resolve_command("reset").name == "new"
+        assert resolve_command("q").name == "quit"
+        assert resolve_command("exit").name == "quit"
+        assert resolve_command("gateway").name == "platforms"
+        assert resolve_command("set-home").name == "sethome"
+        assert resolve_command("reload_mcp").name == "reload-mcp"
+
+    def test_leading_slash_stripped(self):
+        assert resolve_command("/help").name == "help"
+        assert resolve_command("/bg").name == "background"
+
+    def test_unknown_returns_none(self):
+        assert resolve_command("nonexistent") is None
+        assert resolve_command("") is None
+
+
+# ---------------------------------------------------------------------------
+# Derived dicts (backwards compat)
+# ---------------------------------------------------------------------------
+
+class TestDerivedDicts:
+    def test_commands_dict_excludes_gateway_only(self):
+        """gateway_only commands should NOT appear in the CLI COMMANDS dict."""
+        for cmd in COMMAND_REGISTRY:
+            if cmd.gateway_only:
+                assert f"/{cmd.name}" not in COMMANDS, \
+                    f"gateway_only command /{cmd.name} should not be in COMMANDS"
+
+    def test_commands_dict_includes_all_cli_commands(self):
+        for cmd in COMMAND_REGISTRY:
+            if not cmd.gateway_only:
+                assert f"/{cmd.name}" in COMMANDS, \
+                    f"/{cmd.name} missing from COMMANDS dict"
+
+    def test_commands_dict_includes_aliases(self):
+        assert "/bg" in COMMANDS
+        assert "/reset" in COMMANDS
+        assert "/q" in COMMANDS
+        assert "/exit" in COMMANDS
+        assert "/reload_mcp" in COMMANDS
+        assert "/gateway" in COMMANDS
+
+    def test_commands_by_category_covers_all_categories(self):
+        registry_categories = {cmd.category for cmd in COMMAND_REGISTRY if not cmd.gateway_only}
+        assert set(COMMANDS_BY_CATEGORY.keys()) == registry_categories

    def test_every_command_has_nonempty_description(self):
        for cmd, desc in COMMANDS.items():
            assert isinstance(desc, str) and len(desc) > 0, f"{cmd} has empty description"


+# ---------------------------------------------------------------------------
+# Gateway helpers
+# ---------------------------------------------------------------------------
+
+class TestGatewayKnownCommands:
+    def test_excludes_cli_only(self):
+        for cmd in COMMAND_REGISTRY:
+            if cmd.cli_only:
+                assert cmd.name not in GATEWAY_KNOWN_COMMANDS, \
+                    f"cli_only command '{cmd.name}' should not be in GATEWAY_KNOWN_COMMANDS"
+
+    def test_includes_gateway_commands(self):
+        for cmd in COMMAND_REGISTRY:
+            if not cmd.cli_only:
+                assert cmd.name in GATEWAY_KNOWN_COMMANDS
+                for alias in cmd.aliases:
+                    assert alias in GATEWAY_KNOWN_COMMANDS
+
+    def test_bg_alias_in_gateway(self):
+        assert "bg" in GATEWAY_KNOWN_COMMANDS
+        assert "background" in GATEWAY_KNOWN_COMMANDS
+
+    def test_is_frozenset(self):
+        assert isinstance(GATEWAY_KNOWN_COMMANDS, frozenset)
+
+
+class TestGatewayHelpLines:
+    def test_returns_nonempty_list(self):
+        lines = gateway_help_lines()
+        assert len(lines) > 10
+
+    def test_excludes_cli_only_commands(self):
+        lines = gateway_help_lines()
+        joined = "\n".join(lines)
+        for cmd in COMMAND_REGISTRY:
+            if cmd.cli_only:
+                assert f"`/{cmd.name}" not in joined, \
+                    f"cli_only command /{cmd.name} should not be in gateway help"
+
+    def test_includes_alias_note_for_bg(self):
+        lines = gateway_help_lines()
+        bg_line = [l for l in lines if "/background" in l]
+        assert len(bg_line) == 1
+        assert "/bg" in bg_line[0]
+
+
+class TestTelegramBotCommands:
+    def test_returns_list_of_tuples(self):
+        cmds = telegram_bot_commands()
+        assert len(cmds) > 10
+        for name, desc in cmds:
+            assert isinstance(name, str)
+            assert isinstance(desc, str)
+
+    def test_no_hyphens_in_command_names(self):
+        """Telegram does not support hyphens in command names."""
+        for name, _ in telegram_bot_commands():
+            assert "-" not in name, f"Telegram command '{name}' contains a hyphen"
+
+    def test_excludes_cli_only(self):
+        names = {name for name, _ in telegram_bot_commands()}
+        for cmd in COMMAND_REGISTRY:
+            if cmd.cli_only:
+                tg_name = cmd.name.replace("-", "_")
+                assert tg_name not in names
+
+
+class TestSlackSubcommandMap:
+    def test_returns_dict(self):
+        mapping = slack_subcommand_map()
+        assert isinstance(mapping, dict)
+        assert len(mapping) > 10
+
+    def test_values_are_slash_prefixed(self):
+        for key, val in slack_subcommand_map().items():
+            assert val.startswith("/"), f"Slack mapping for '{key}' should start with /"
+
+    def test_includes_aliases(self):
+        mapping = slack_subcommand_map()
+        assert "bg" in mapping
+        assert "reset" in mapping
+
+    def test_excludes_cli_only(self):
+        mapping = slack_subcommand_map()
+        for cmd in COMMAND_REGISTRY:
+            if cmd.cli_only:
+                assert cmd.name not in mapping
+
+
+# ---------------------------------------------------------------------------
+# Autocomplete (SlashCommandCompleter)
+# ---------------------------------------------------------------------------
+
 class TestSlashCommandCompleter:
    # -- basic prefix completion -----------------------------------------

@@ -54,7 +236,7 @@ class TestSlashCommandCompleter:
    def test_builtin_completion_display_meta_shows_description(self):
        completions = _completions(SlashCommandCompleter(), "/help")
        assert len(completions) == 1
-        assert completions[0].display_meta_text == "Show this help message"
+        assert completions[0].display_meta_text == "Show available commands"

    # -- exact-match trailing space --------------------------------------

@@ -143,3 +325,182 @@ class TestSlashCommandCompleter:
        completions = _completions(completer, "/no-desc")
        assert len(completions) == 1
        assert "Skill command" in completions[0].display_meta_text
+
+
+# ── SUBCOMMANDS extraction ──────────────────────────────────────────────
+
+
+class TestSubcommands:
+    def test_explicit_subcommands_extracted(self):
+        """Commands with explicit subcommands on CommandDef are extracted."""
+        assert "/prompt" in SUBCOMMANDS
+        assert "clear" in SUBCOMMANDS["/prompt"]
+
+    def test_reasoning_has_subcommands(self):
+        assert "/reasoning" in SUBCOMMANDS
+        subs = SUBCOMMANDS["/reasoning"]
+        assert "high" in subs
+        assert "show" in subs
+        assert "hide" in subs
+
+    def test_voice_has_subcommands(self):
+        assert "/voice" in SUBCOMMANDS
+        assert "on" in SUBCOMMANDS["/voice"]
+        assert "off" in SUBCOMMANDS["/voice"]
+
+    def test_cron_has_subcommands(self):
+        assert "/cron" in SUBCOMMANDS
+        assert "list" in SUBCOMMANDS["/cron"]
+        assert "add" in SUBCOMMANDS["/cron"]
+
+    def test_commands_without_subcommands_not_in_dict(self):
+        """Plain commands should not appear in SUBCOMMANDS."""
+        assert "/help" not in SUBCOMMANDS
+        assert "/quit" not in SUBCOMMANDS
+        assert "/clear" not in SUBCOMMANDS
+
+
+# ── Subcommand tab completion ───────────────────────────────────────────
+
+
+class TestSubcommandCompletion:
+    def test_subcommand_completion_after_space(self):
+        """Typing '/reasoning ' then Tab should show subcommands."""
+        completions = _completions(SlashCommandCompleter(), "/reasoning ")
+        texts = {c.text for c in completions}
+        assert "high" in texts
+        assert "show" in texts
+
+    def test_subcommand_prefix_filters(self):
+        """Typing '/reasoning sh' should only show 'show'."""
+        completions = _completions(SlashCommandCompleter(), "/reasoning sh")
+        texts = {c.text for c in completions}
+        assert texts == {"show"}
+
+    def test_subcommand_exact_match_suppressed(self):
+        """Typing the full subcommand shouldn't re-suggest it."""
+        completions = _completions(SlashCommandCompleter(), "/reasoning show")
+        texts = {c.text for c in completions}
+        assert "show" not in texts
+
+    def test_no_subcommands_for_plain_command(self):
+        """Commands without subcommands yield nothing after space."""
+        completions = _completions(SlashCommandCompleter(), "/help ")
+        assert completions == []
+
+
+# ── Two-stage /model completion ─────────────────────────────────────────
+
+
+def _model_completer() -> SlashCommandCompleter:
+    """Build a completer with mock model/provider info."""
+    return SlashCommandCompleter(
+        model_completer_provider=lambda: {
+            "current_provider": "openrouter",
+            "providers": {
+                "anthropic": "Anthropic",
+                "openrouter": "OpenRouter",
+                "nous": "Nous Research",
+            },
+            "models_for": lambda p: {
+                "anthropic": ["claude-sonnet-4-20250514", "claude-opus-4-20250414"],
+                "openrouter": ["anthropic/claude-sonnet-4", "google/gemini-2.5-pro"],
+                "nous": ["hermes-3-llama-3.1-405b"],
+            }.get(p, []),
+        }
+    )
+
+
+class TestModelCompletion:
+    def test_stage1_shows_providers(self):
+        completions = _completions(_model_completer(), "/model ")
+        texts = {c.text for c in completions}
+        assert "anthropic:" in texts
+        assert "openrouter:" in texts
+        assert "nous:" in texts
+
+    def test_stage1_current_provider_last(self):
+        completions = _completions(_model_completer(), "/model ")
+        texts = [c.text for c in completions]
+        assert texts[-1] == "openrouter:"
+
+    def test_stage1_current_provider_labeled(self):
+        completions = _completions(_model_completer(), "/model ")
+        for c in completions:
+            if c.text == "openrouter:":
+                assert "current" in c.display_meta_text.lower()
+                break
+        else:
+            raise AssertionError("openrouter: not found in completions")
+
+    def test_stage1_prefix_filters(self):
+        completions = _completions(_model_completer(), "/model an")
+        texts = {c.text for c in completions}
+        assert texts == {"anthropic:"}
+
+    def test_stage2_shows_models(self):
+        completions = _completions(_model_completer(), "/model anthropic:")
+        texts = {c.text for c in completions}
+        assert "anthropic:claude-sonnet-4-20250514" in texts
+        assert "anthropic:claude-opus-4-20250414" in texts
+
+    def test_stage2_prefix_filters_models(self):
+        completions = _completions(_model_completer(), "/model anthropic:claude-s")
+        texts = {c.text for c in completions}
+        assert "anthropic:claude-sonnet-4-20250514" in texts
+        assert "anthropic:claude-opus-4-20250414" not in texts
+
+    def test_stage2_no_model_provider_returns_empty(self):
+        completions = _completions(SlashCommandCompleter(), "/model ")
+        assert completions == []
+
+
+# ── Ghost text (SlashCommandAutoSuggest) ────────────────────────────────
+
+
+def _suggestion(text: str, completer=None) -> str | None:
+    """Get ghost text suggestion for given input."""
+    suggest = SlashCommandAutoSuggest(completer=completer)
+    doc = Document(text=text)
+
+    class FakeBuffer:
+        pass
+
+    result = suggest.get_suggestion(FakeBuffer(), doc)
+    return result.text if result else None
+
+
+class TestGhostText:
+    def test_command_name_suggestion(self):
+        """/he → 'lp'"""
+        assert _suggestion("/he") == "lp"
+
+    def test_command_name_suggestion_reasoning(self):
+        """/rea → 'soning'"""
+        assert _suggestion("/rea") == "soning"
+
+    def test_no_suggestion_for_complete_command(self):
+        assert _suggestion("/help") is None
+
+    def test_subcommand_suggestion(self):
+        """/reasoning h → 'igh'"""
+        assert _suggestion("/reasoning h") == "igh"
+
+    def test_subcommand_suggestion_show(self):
+        """/reasoning sh → 'ow'"""
+        assert _suggestion("/reasoning sh") == "ow"
+
+    def test_no_suggestion_for_non_slash(self):
+        assert _suggestion("hello") is None
+
+    def test_model_stage1_ghost_text(self):
+        """/model a → 'nthropic:'"""
+        completer = _model_completer()
+        assert _suggestion("/model a", completer=completer) == "nthropic:"
+
+    def test_model_stage2_ghost_text(self):
+        """/model anthropic:cl → rest of first matching model"""
+        completer = _model_completer()
+        s = _suggestion("/model anthropic:cl", completer=completer)
+        assert s is not None
+        assert s.startswith("aude-")
@@ -12,9 +12,12 @@ from hermes_cli.config import (
    ensure_hermes_home,
    load_config,
    load_env,
+    migrate_config,
    save_config,
    save_env_value,
    save_env_value_secure,
+    sanitize_env_file,
+    _sanitize_env_lines,
 )


@@ -203,3 +206,142 @@ class TestSaveConfigAtomicity:
                raw = yaml.safe_load(f)
            assert raw["model"] == "test/atomic-model"
            assert raw["agent"]["max_turns"] == 77
+
+
+class TestSanitizeEnvLines:
+    """Tests for .env file corruption repair."""
+
+    def test_splits_concatenated_keys(self):
+        """Two KEY=VALUE pairs jammed on one line get split."""
+        lines = ["ANTHROPIC_API_KEY=sk-ant-xxxOPENAI_BASE_URL=https://api.openai.com/v1\n"]
+        result = _sanitize_env_lines(lines)
+        assert result == [
+            "ANTHROPIC_API_KEY=sk-ant-xxx\n",
+            "OPENAI_BASE_URL=https://api.openai.com/v1\n",
+        ]
+
+    def test_preserves_clean_file(self):
+        """A well-formed .env file passes through unchanged (modulo trailing newlines)."""
+        lines = [
+            "OPENROUTER_API_KEY=sk-or-xxx\n",
+            "FIRECRAWL_API_KEY=fc-xxx\n",
+            "# a comment\n",
+            "\n",
+        ]
+        result = _sanitize_env_lines(lines)
+        assert result == lines
+
+    def test_preserves_comments_and_blanks(self):
+        lines = ["# comment\n", "\n", "KEY=val\n"]
+        result = _sanitize_env_lines(lines)
+        assert result == lines
+
+    def test_adds_missing_trailing_newline(self):
+        """Lines missing trailing newline get one added."""
+        lines = ["FOO_BAR=baz"]
+        result = _sanitize_env_lines(lines)
+        assert result == ["FOO_BAR=baz\n"]
+
+    def test_three_concatenated_keys(self):
+        """Three known keys on one line all get separated."""
+        lines = ["FAL_KEY=111FIRECRAWL_API_KEY=222GITHUB_TOKEN=333\n"]
+        result = _sanitize_env_lines(lines)
+        assert result == [
+            "FAL_KEY=111\n",
+            "FIRECRAWL_API_KEY=222\n",
+            "GITHUB_TOKEN=333\n",
+        ]
+
+    def test_value_with_equals_sign_not_split(self):
+        """A value containing '=' shouldn't be falsely split (lowercase in value)."""
+        lines = ["OPENAI_BASE_URL=https://api.example.com/v1?key=abc123\n"]
+        result = _sanitize_env_lines(lines)
+        assert result == lines
+
+    def test_unknown_keys_not_split(self):
+        """Unknown key names on one line are NOT split (avoids false positives)."""
+        lines = ["CUSTOM_VAR=value123OTHER_THING=value456\n"]
+        result = _sanitize_env_lines(lines)
+        # Unknown keys stay on one line — no false split
+        assert len(result) == 1
+
+    def test_value_ending_with_digits_still_splits(self):
+        """Concatenation is detected even when value ends with digits."""
+        lines = ["OPENROUTER_API_KEY=sk-or-v1-abc123OPENAI_BASE_URL=https://api.openai.com/v1\n"]
+        result = _sanitize_env_lines(lines)
+        assert len(result) == 2
+        assert result[0].startswith("OPENROUTER_API_KEY=")
+        assert result[1].startswith("OPENAI_BASE_URL=")
+
+    def test_save_env_value_fixes_corruption_on_write(self, tmp_path):
+        """save_env_value sanitizes corrupted lines when writing a new key."""
+        env_file = tmp_path / ".env"
+        env_file.write_text(
+            "ANTHROPIC_API_KEY=sk-antOPENAI_BASE_URL=https://api.openai.com/v1\n"
+            "FAL_KEY=existing\n"
+        )
+        with patch.dict(os.environ, {"HERMES_HOME": str(tmp_path)}):
+            save_env_value("MESSAGING_CWD", "/tmp")
+
+            content = env_file.read_text()
+            lines = content.strip().split("\n")
+
+            # Corrupted line should be split, new key added
+            assert "ANTHROPIC_API_KEY=sk-ant" in lines
+            assert "OPENAI_BASE_URL=https://api.openai.com/v1" in lines
+            assert "MESSAGING_CWD=/tmp" in lines
+
+    def test_sanitize_env_file_returns_fix_count(self, tmp_path):
+        """sanitize_env_file reports how many entries were fixed."""
+        env_file = tmp_path / ".env"
+        env_file.write_text(
+            "FAL_KEY=good\n"
+            "OPENROUTER_API_KEY=valFIRECRAWL_API_KEY=val2\n"
+        )
+        with patch.dict(os.environ, {"HERMES_HOME": str(tmp_path)}):
+            fixes = sanitize_env_file()
+            assert fixes > 0
+
+            # Verify file is now clean
+            content = env_file.read_text()
+            assert "OPENROUTER_API_KEY=val\n" in content
+            assert "FIRECRAWL_API_KEY=val2\n" in content
+
+    def test_sanitize_env_file_noop_on_clean_file(self, tmp_path):
+        """No changes when file is already clean."""
+        env_file = tmp_path / ".env"
+        env_file.write_text("GOOD_KEY=good\nOTHER_KEY=other\n")
+        with patch.dict(os.environ, {"HERMES_HOME": str(tmp_path)}):
+            fixes = sanitize_env_file()
+            assert fixes == 0
+
+
+class TestAnthropicTokenMigration:
+    """Test that config version 8→9 clears ANTHROPIC_TOKEN."""
+
+    def _write_config_version(self, tmp_path, version):
+        config_path = tmp_path / "config.yaml"
+        import yaml
+        config_path.write_text(yaml.safe_dump({"_config_version": version}))
+
+    def test_clears_token_on_upgrade_to_v9(self, tmp_path):
+        """ANTHROPIC_TOKEN is cleared unconditionally when upgrading to v9."""
+        self._write_config_version(tmp_path, 8)
+        (tmp_path / ".env").write_text("ANTHROPIC_TOKEN=old-token\n")
+        with patch.dict(os.environ, {
+            "HERMES_HOME": str(tmp_path),
+            "ANTHROPIC_TOKEN": "old-token",
+        }):
+            migrate_config(interactive=False, quiet=True)
+            assert load_env().get("ANTHROPIC_TOKEN") == ""
+
+    def test_skips_on_version_9_or_later(self, tmp_path):
+        """Already at v9 — ANTHROPIC_TOKEN is not touched."""
+        self._write_config_version(tmp_path, 9)
+        (tmp_path / ".env").write_text("ANTHROPIC_TOKEN=current-token\n")
+        with patch.dict(os.environ, {
+            "HERMES_HOME": str(tmp_path),
+            "ANTHROPIC_TOKEN": "current-token",
+        }):
+            migrate_config(interactive=False, quiet=True)
+            assert load_env().get("ANTHROPIC_TOKEN") == "current-token"
@@ -1,11 +1,35 @@
 """Tests for gateway service management helpers."""

+import os
 from types import SimpleNamespace

 import hermes_cli.gateway as gateway_cli


 class TestSystemdServiceRefresh:
+    def test_systemd_install_repairs_outdated_unit_without_force(self, tmp_path, monkeypatch):
+        unit_path = tmp_path / "hermes-gateway.service"
+        unit_path.write_text("old unit\n", encoding="utf-8")
+
+        monkeypatch.setattr(gateway_cli, "get_systemd_unit_path", lambda system=False: unit_path)
+        monkeypatch.setattr(gateway_cli, "generate_systemd_unit", lambda system=False, run_as_user=None: "new unit\n")
+
+        calls = []
+
+        def fake_run(cmd, check=True, **kwargs):
+            calls.append(cmd)
+            return SimpleNamespace(returncode=0, stdout="", stderr="")
+
+        monkeypatch.setattr(gateway_cli.subprocess, "run", fake_run)
+
+        gateway_cli.systemd_install()
+
+        assert unit_path.read_text(encoding="utf-8") == "new unit\n"
+        assert calls[:2] == [
+            ["systemctl", "--user", "daemon-reload"],
+            ["systemctl", "--user", "enable", gateway_cli.get_service_name()],
+        ]
+
    def test_systemd_start_refreshes_outdated_unit(self, tmp_path, monkeypatch):
        unit_path = tmp_path / "hermes-gateway.service"
        unit_path.write_text("old unit\n", encoding="utf-8")
@@ -53,6 +77,23 @@ class TestSystemdServiceRefresh:
        ]


+class TestGeneratedSystemdUnits:
+    def test_user_unit_avoids_recursive_execstop_and_uses_extended_stop_timeout(self):
+        unit = gateway_cli.generate_systemd_unit(system=False)
+
+        assert "ExecStart=" in unit
+        assert "ExecStop=" not in unit
+        assert "TimeoutStopSec=60" in unit
+
+    def test_system_unit_avoids_recursive_execstop_and_uses_extended_stop_timeout(self):
+        unit = gateway_cli.generate_systemd_unit(system=True)
+
+        assert "ExecStart=" in unit
+        assert "ExecStop=" not in unit
+        assert "TimeoutStopSec=60" in unit
+        assert "WantedBy=multi-user.target" in unit
+
+
 class TestGatewayStopCleanup:
    def test_stop_sweeps_manual_gateway_processes_after_service_stop(self, tmp_path, monkeypatch):
        unit_path = tmp_path / "hermes-gateway.service"
@@ -78,6 +119,71 @@ class TestGatewayStopCleanup:
        assert kill_calls == [False]


+class TestLaunchdServiceRecovery:
+    def test_launchd_install_repairs_outdated_plist_without_force(self, tmp_path, monkeypatch):
+        plist_path = tmp_path / "ai.hermes.gateway.plist"
+        plist_path.write_text("<plist>old content</plist>", encoding="utf-8")
+
+        monkeypatch.setattr(gateway_cli, "get_launchd_plist_path", lambda: plist_path)
+
+        calls = []
+
+        def fake_run(cmd, check=False, **kwargs):
+            calls.append(cmd)
+            return SimpleNamespace(returncode=0, stdout="", stderr="")
+
+        monkeypatch.setattr(gateway_cli.subprocess, "run", fake_run)
+
+        gateway_cli.launchd_install()
+
+        assert "--replace" in plist_path.read_text(encoding="utf-8")
+        assert calls[:2] == [
+            ["launchctl", "unload", str(plist_path)],
+            ["launchctl", "load", str(plist_path)],
+        ]
+
+    def test_launchd_start_reloads_unloaded_job_and_retries(self, tmp_path, monkeypatch):
+        plist_path = tmp_path / "ai.hermes.gateway.plist"
+        plist_path.write_text(gateway_cli.generate_launchd_plist(), encoding="utf-8")
+
+        calls = []
+
+        def fake_run(cmd, check=False, **kwargs):
+            calls.append(cmd)
+            if cmd == ["launchctl", "start", "ai.hermes.gateway"] and calls.count(cmd) == 1:
+                raise gateway_cli.subprocess.CalledProcessError(3, cmd, stderr="Could not find service")
+            return SimpleNamespace(returncode=0, stdout="", stderr="")
+
+        monkeypatch.setattr(gateway_cli, "get_launchd_plist_path", lambda: plist_path)
+        monkeypatch.setattr(gateway_cli.subprocess, "run", fake_run)
+
+        gateway_cli.launchd_start()
+
+        assert calls == [
+            ["launchctl", "start", "ai.hermes.gateway"],
+            ["launchctl", "load", str(plist_path)],
+            ["launchctl", "start", "ai.hermes.gateway"],
+        ]
+
+    def test_launchd_status_reports_local_stale_plist_when_unloaded(self, tmp_path, monkeypatch, capsys):
+        plist_path = tmp_path / "ai.hermes.gateway.plist"
+        plist_path.write_text("<plist>old content</plist>", encoding="utf-8")
+
+        monkeypatch.setattr(gateway_cli, "get_launchd_plist_path", lambda: plist_path)
+        monkeypatch.setattr(
+            gateway_cli.subprocess,
+            "run",
+            lambda *args, **kwargs: SimpleNamespace(returncode=113, stdout="", stderr="Could not find service"),
+        )
+
+        gateway_cli.launchd_status()
+
+        output = capsys.readouterr().out
+        assert str(plist_path) in output
+        assert "stale" in output.lower()
+        assert "not loaded" in output.lower()
+
+
 class TestGatewayServiceDetection:
    def test_is_service_running_checks_system_scope_when_user_scope_is_inactive(self, monkeypatch):
        user_unit = SimpleNamespace(exists=lambda: True)
@@ -139,3 +245,109 @@ class TestGatewaySystemServiceRouting:
        gateway_cli.gateway_command(SimpleNamespace(gateway_command="status", deep=False, system=False))

        assert calls == [(False, False)]
+
+    def test_gateway_restart_does_not_fallback_to_foreground_when_launchd_restart_fails(self, tmp_path, monkeypatch):
+        plist_path = tmp_path / "ai.hermes.gateway.plist"
+        plist_path.write_text("plist\n", encoding="utf-8")
+
+        monkeypatch.setattr(gateway_cli, "is_linux", lambda: False)
+        monkeypatch.setattr(gateway_cli, "is_macos", lambda: True)
+        monkeypatch.setattr(gateway_cli, "get_launchd_plist_path", lambda: plist_path)
+        monkeypatch.setattr(
+            gateway_cli,
+            "launchd_restart",
+            lambda: (_ for _ in ()).throw(
+                gateway_cli.subprocess.CalledProcessError(5, ["launchctl", "start", "ai.hermes.gateway"])
+            ),
+        )
+
+        run_calls = []
+        monkeypatch.setattr(gateway_cli, "run_gateway", lambda verbose=False, replace=False: run_calls.append((verbose, replace)))
+        monkeypatch.setattr(gateway_cli, "kill_gateway_processes", lambda force=False: 0)
+
+        try:
+            gateway_cli.gateway_command(SimpleNamespace(gateway_command="restart", system=False))
+        except SystemExit as exc:
+            assert exc.code == 1
+        else:
+            raise AssertionError("Expected gateway_command to exit when service restart fails")
+
+        assert run_calls == []
+
+
+class TestEnsureUserSystemdEnv:
+    """Tests for _ensure_user_systemd_env() D-Bus session bus auto-detection."""
+
+    def test_sets_xdg_runtime_dir_when_missing(self, tmp_path, monkeypatch):
+        monkeypatch.delenv("XDG_RUNTIME_DIR", raising=False)
+        monkeypatch.delenv("DBUS_SESSION_BUS_ADDRESS", raising=False)
+        monkeypatch.setattr(os, "getuid", lambda: 42)
+
+        # Patch Path so /run/user/42 resolves to our tmp dir (which exists)
+        from pathlib import Path as RealPath
+
+        class FakePath(type(RealPath())):
+            def __new__(cls, *args):
+                p = str(args[0]) if args else ""
+                if p == "/run/user/42":
+                    return RealPath.__new__(cls, str(tmp_path))
+                return RealPath.__new__(cls, *args)
+
+        monkeypatch.setattr(gateway_cli, "Path", FakePath)
+
+        gateway_cli._ensure_user_systemd_env()
+
+        # Function sets the canonical string, not the fake path
+        assert os.environ.get("XDG_RUNTIME_DIR") == "/run/user/42"
+
+    def test_sets_dbus_address_when_bus_socket_exists(self, tmp_path, monkeypatch):
+        runtime = tmp_path / "runtime"
+        runtime.mkdir()
+        bus_socket = runtime / "bus"
+        bus_socket.touch()  # simulate the socket file
+
+        monkeypatch.setenv("XDG_RUNTIME_DIR", str(runtime))
+        monkeypatch.delenv("DBUS_SESSION_BUS_ADDRESS", raising=False)
+        monkeypatch.setattr(os, "getuid", lambda: 99)
+
+        gateway_cli._ensure_user_systemd_env()
+
+        assert os.environ["DBUS_SESSION_BUS_ADDRESS"] == f"unix:path={bus_socket}"
+
+    def test_preserves_existing_env_vars(self, monkeypatch):
+        monkeypatch.setenv("XDG_RUNTIME_DIR", "/custom/runtime")
+        monkeypatch.setenv("DBUS_SESSION_BUS_ADDRESS", "unix:path=/custom/bus")
+
+        gateway_cli._ensure_user_systemd_env()
+
+        assert os.environ["XDG_RUNTIME_DIR"] == "/custom/runtime"
+        assert os.environ["DBUS_SESSION_BUS_ADDRESS"] == "unix:path=/custom/bus"
+
+    def test_no_dbus_when_bus_socket_missing(self, tmp_path, monkeypatch):
+        runtime = tmp_path / "runtime"
+        runtime.mkdir()
+        # no bus socket created
+
+        monkeypatch.setenv("XDG_RUNTIME_DIR", str(runtime))
+        monkeypatch.delenv("DBUS_SESSION_BUS_ADDRESS", raising=False)
+        monkeypatch.setattr(os, "getuid", lambda: 99)
+
+        gateway_cli._ensure_user_systemd_env()
+
+        assert "DBUS_SESSION_BUS_ADDRESS" not in os.environ
+
+    def test_systemctl_cmd_calls_ensure_for_user_mode(self, monkeypatch):
+        calls = []
+        monkeypatch.setattr(gateway_cli, "_ensure_user_systemd_env", lambda: calls.append("called"))
+
+        result = gateway_cli._systemctl_cmd(system=False)
+        assert result == ["systemctl", "--user"]
+        assert calls == ["called"]
+
+    def test_systemctl_cmd_skips_ensure_for_system_mode(self, monkeypatch):
+        calls = []
+        monkeypatch.setattr(gateway_cli, "_ensure_user_systemd_env", lambda: calls.append("called"))
+
+        result = gateway_cli._systemctl_cmd(system=True)
+        assert result == ["systemctl"]
+        assert calls == []
@@ -0,0 +1,184 @@
+"""Tests for file path autocomplete in the CLI completer."""
+
+import os
+from unittest.mock import MagicMock
+
+import pytest
+from prompt_toolkit.document import Document
+from prompt_toolkit.formatted_text import to_plain_text
+
+from hermes_cli.commands import SlashCommandCompleter, _file_size_label
+
+
+def _display_names(completions):
+    """Extract plain-text display names from a list of Completion objects."""
+    return [to_plain_text(c.display) for c in completions]
+
+
+def _display_metas(completions):
+    """Extract plain-text display_meta from a list of Completion objects."""
+    return [to_plain_text(c.display_meta) if c.display_meta else "" for c in completions]
+
+
+@pytest.fixture
+def completer():
+    return SlashCommandCompleter()
+
+
+class TestExtractPathWord:
+    def test_relative_path(self):
+        assert SlashCommandCompleter._extract_path_word("look at ./src/main.py") == "./src/main.py"
+
+    def test_home_path(self):
+        assert SlashCommandCompleter._extract_path_word("edit ~/docs/") == "~/docs/"
+
+    def test_absolute_path(self):
+        assert SlashCommandCompleter._extract_path_word("read /etc/hosts") == "/etc/hosts"
+
+    def test_parent_path(self):
+        assert SlashCommandCompleter._extract_path_word("check ../config.yaml") == "../config.yaml"
+
+    def test_path_with_slash_in_middle(self):
+        assert SlashCommandCompleter._extract_path_word("open src/utils/helpers.py") == "src/utils/helpers.py"
+
+    def test_plain_word_not_path(self):
+        assert SlashCommandCompleter._extract_path_word("hello world") is None
+
+    def test_empty_string(self):
+        assert SlashCommandCompleter._extract_path_word("") is None
+
+    def test_single_word_no_slash(self):
+        assert SlashCommandCompleter._extract_path_word("README.md") is None
+
+    def test_word_after_space(self):
+        assert SlashCommandCompleter._extract_path_word("fix the bug in ./tools/") == "./tools/"
+
+    def test_just_dot_slash(self):
+        assert SlashCommandCompleter._extract_path_word("./") == "./"
+
+    def test_just_tilde_slash(self):
+        assert SlashCommandCompleter._extract_path_word("~/") == "~/"
+
+
+class TestPathCompletions:
+    def test_lists_current_directory(self, tmp_path):
+        (tmp_path / "file_a.py").touch()
+        (tmp_path / "file_b.txt").touch()
+        (tmp_path / "subdir").mkdir()
+
+        old_cwd = os.getcwd()
+        os.chdir(tmp_path)
+        try:
+            completions = list(SlashCommandCompleter._path_completions("./"))
+            names = _display_names(completions)
+            assert "file_a.py" in names
+            assert "file_b.txt" in names
+            assert "subdir/" in names
+        finally:
+            os.chdir(old_cwd)
+
+    def test_filters_by_prefix(self, tmp_path):
+        (tmp_path / "alpha.py").touch()
+        (tmp_path / "beta.py").touch()
+        (tmp_path / "alpha_test.py").touch()
+
+        completions = list(SlashCommandCompleter._path_completions(f"{tmp_path}/alpha"))
+        names = _display_names(completions)
+        assert "alpha.py" in names
+        assert "alpha_test.py" in names
+        assert "beta.py" not in names
+
+    def test_directories_have_trailing_slash(self, tmp_path):
+        (tmp_path / "mydir").mkdir()
+        (tmp_path / "myfile.txt").touch()
+
+        completions = list(SlashCommandCompleter._path_completions(f"{tmp_path}/"))
+        names = _display_names(completions)
+        metas = _display_metas(completions)
+        assert "mydir/" in names
+        idx = names.index("mydir/")
+        assert metas[idx] == "dir"
+
+    def test_home_expansion(self, tmp_path, monkeypatch):
+        monkeypatch.setenv("HOME", str(tmp_path))
+        (tmp_path / "testfile.md").touch()
+
+        completions = list(SlashCommandCompleter._path_completions("~/test"))
+        names = _display_names(completions)
+        assert "testfile.md" in names
+
+    def test_nonexistent_dir_returns_empty(self):
+        completions = list(SlashCommandCompleter._path_completions("/nonexistent_dir_xyz/"))
+        assert completions == []
+
+    def test_respects_limit(self, tmp_path):
+        for i in range(50):
+            (tmp_path / f"file_{i:03d}.txt").touch()
+
+        completions = list(SlashCommandCompleter._path_completions(f"{tmp_path}/", limit=10))
+        assert len(completions) == 10
+
+    def test_case_insensitive_prefix(self, tmp_path):
+        (tmp_path / "README.md").touch()
+
+        completions = list(SlashCommandCompleter._path_completions(f"{tmp_path}/read"))
+        names = _display_names(completions)
+        assert "README.md" in names
+
+
+class TestIntegration:
+    """Test the completer produces path completions via the prompt_toolkit API."""
+
+    def test_slash_commands_still_work(self, completer):
+        doc = Document("/hel", cursor_position=4)
+        event = MagicMock()
+        completions = list(completer.get_completions(doc, event))
+        names = _display_names(completions)
+        assert "/help" in names
+
+    def test_path_completion_triggers_on_dot_slash(self, completer, tmp_path):
+        (tmp_path / "test.py").touch()
+        old_cwd = os.getcwd()
+        os.chdir(tmp_path)
+        try:
+            doc = Document("edit ./te", cursor_position=9)
+            event = MagicMock()
+            completions = list(completer.get_completions(doc, event))
+            names = _display_names(completions)
+            assert "test.py" in names
+        finally:
+            os.chdir(old_cwd)
+
+    def test_no_completion_for_plain_words(self, completer):
+        doc = Document("hello world", cursor_position=11)
+        event = MagicMock()
+        completions = list(completer.get_completions(doc, event))
+        assert completions == []
+
+    def test_absolute_path_triggers_completion(self, completer):
+        doc = Document("check /etc/hos", cursor_position=14)
+        event = MagicMock()
+        completions = list(completer.get_completions(doc, event))
+        names = _display_names(completions)
+        # /etc/hosts should exist on Linux
+        assert any("host" in n.lower() for n in names)
+
+
+class TestFileSizeLabel:
+    def test_bytes(self, tmp_path):
+        f = tmp_path / "small.txt"
+        f.write_text("hi")
+        assert _file_size_label(str(f)) == "2B"
+
+    def test_kilobytes(self, tmp_path):
+        f = tmp_path / "medium.txt"
+        f.write_bytes(b"x" * 2048)
+        assert _file_size_label(str(f)) == "2K"
+
+    def test_megabytes(self, tmp_path):
+        f = tmp_path / "large.bin"
+        f.write_bytes(b"x" * (2 * 1024 * 1024))
+        assert _file_size_label(str(f)) == "2.0M"
+
+    def test_nonexistent(self):
+        assert _file_size_label("/nonexistent_xyz") == ""
@@ -1,8 +1,18 @@
+"""
+Tests for --yes / --force flag separation in `hermes skills install`.
+
+--yes / -y  → skip_confirm (bypass interactive prompt, needed in TUI mode)
+--force     → force (install despite blocked scan verdict)
+
+Based on PR #1595 by 333Alden333 (salvaged).
+"""
+
 import sys
 from types import SimpleNamespace


-def test_cli_skills_install_accepts_yes_alias(monkeypatch):
+def test_cli_skills_install_yes_sets_skip_confirm(monkeypatch):
+    """--yes should set skip_confirm=True but NOT force."""
    from hermes_cli.main import main

    captured = {}
@@ -10,6 +20,7 @@ def test_cli_skills_install_accepts_yes_alias(monkeypatch):
    def fake_skills_command(args):
        captured["identifier"] = args.identifier
        captured["force"] = args.force
+        captured["yes"] = args.yes

    monkeypatch.setattr("hermes_cli.skills_hub.skills_command", fake_skills_command)
    monkeypatch.setattr(
@@ -20,7 +31,98 @@ def test_cli_skills_install_accepts_yes_alias(monkeypatch):

    main()

-    assert captured == {
-        "identifier": "official/email/agentmail",
-        "force": True,
-    }
+    assert captured["identifier"] == "official/email/agentmail"
+    assert captured["yes"] is True
+    assert captured["force"] is False
+
+
+def test_cli_skills_install_y_alias(monkeypatch):
+    """-y should behave the same as --yes."""
+    from hermes_cli.main import main
+
+    captured = {}
+
+    def fake_skills_command(args):
+        captured["yes"] = args.yes
+        captured["force"] = args.force
+
+    monkeypatch.setattr("hermes_cli.skills_hub.skills_command", fake_skills_command)
+    monkeypatch.setattr(
+        sys,
+        "argv",
+        ["hermes", "skills", "install", "test/skill", "-y"],
+    )
+
+    main()
+
+    assert captured["yes"] is True
+    assert captured["force"] is False
+
+
+def test_cli_skills_install_force_sets_force(monkeypatch):
+    """--force should set force=True but NOT yes."""
+    from hermes_cli.main import main
+
+    captured = {}
+
+    def fake_skills_command(args):
+        captured["force"] = args.force
+        captured["yes"] = args.yes
+
+    monkeypatch.setattr("hermes_cli.skills_hub.skills_command", fake_skills_command)
+    monkeypatch.setattr(
+        sys,
+        "argv",
+        ["hermes", "skills", "install", "test/skill", "--force"],
+    )
+
+    main()
+
+    assert captured["force"] is True
+    assert captured["yes"] is False
+
+
+def test_cli_skills_install_force_and_yes_together(monkeypatch):
+    """--force --yes should set both flags."""
+    from hermes_cli.main import main
+
+    captured = {}
+
+    def fake_skills_command(args):
+        captured["force"] = args.force
+        captured["yes"] = args.yes
+
+    monkeypatch.setattr("hermes_cli.skills_hub.skills_command", fake_skills_command)
+    monkeypatch.setattr(
+        sys,
+        "argv",
+        ["hermes", "skills", "install", "test/skill", "--force", "--yes"],
+    )
+
+    main()
+
+    assert captured["force"] is True
+    assert captured["yes"] is True
+
+
+def test_cli_skills_install_no_flags(monkeypatch):
+    """Without flags, both force and yes should be False."""
+    from hermes_cli.main import main
+
+    captured = {}
+
+    def fake_skills_command(args):
+        captured["force"] = args.force
+        captured["yes"] = args.yes
+
+    monkeypatch.setattr("hermes_cli.skills_hub.skills_command", fake_skills_command)
+    monkeypatch.setattr(
+        sys,
+        "argv",
+        ["hermes", "skills", "install", "test/skill"],
+    )
+
+    main()
+
+    assert captured["force"] is False
+    assert captured["yes"] is False
@@ -0,0 +1,132 @@
+"""
+Tests for skip_confirm behavior in /skills install and /skills uninstall.
+
+Verifies that --yes / -y bypasses the interactive confirmation prompt
+that hangs inside prompt_toolkit's TUI.
+
+Based on PR #1595 by 333Alden333 (salvaged).
+"""
+
+from unittest.mock import patch, MagicMock
+
+import pytest
+
+
+class TestHandleSkillsSlashInstallFlags:
+    """Test flag parsing in handle_skills_slash for install."""
+
+    def test_yes_flag_sets_skip_confirm(self):
+        from hermes_cli.skills_hub import handle_skills_slash
+        with patch("hermes_cli.skills_hub.do_install") as mock_install:
+            handle_skills_slash("/skills install test/skill --yes")
+            mock_install.assert_called_once()
+            _, kwargs = mock_install.call_args
+            assert kwargs.get("skip_confirm") is True
+            assert kwargs.get("force") is False
+
+    def test_y_flag_sets_skip_confirm(self):
+        from hermes_cli.skills_hub import handle_skills_slash
+        with patch("hermes_cli.skills_hub.do_install") as mock_install:
+            handle_skills_slash("/skills install test/skill -y")
+            mock_install.assert_called_once()
+            _, kwargs = mock_install.call_args
+            assert kwargs.get("skip_confirm") is True
+
+    def test_force_flag_sets_force_not_skip(self):
+        from hermes_cli.skills_hub import handle_skills_slash
+        with patch("hermes_cli.skills_hub.do_install") as mock_install:
+            handle_skills_slash("/skills install test/skill --force")
+            mock_install.assert_called_once()
+            _, kwargs = mock_install.call_args
+            assert kwargs.get("force") is True
+            assert kwargs.get("skip_confirm") is False
+
+    def test_no_flags(self):
+        from hermes_cli.skills_hub import handle_skills_slash
+        with patch("hermes_cli.skills_hub.do_install") as mock_install:
+            handle_skills_slash("/skills install test/skill")
+            mock_install.assert_called_once()
+            _, kwargs = mock_install.call_args
+            assert kwargs.get("force") is False
+            assert kwargs.get("skip_confirm") is False
+
+
+class TestHandleSkillsSlashUninstallFlags:
+    """Test flag parsing in handle_skills_slash for uninstall."""
+
+    def test_yes_flag_sets_skip_confirm(self):
+        from hermes_cli.skills_hub import handle_skills_slash
+        with patch("hermes_cli.skills_hub.do_uninstall") as mock_uninstall:
+            handle_skills_slash("/skills uninstall test-skill --yes")
+            mock_uninstall.assert_called_once()
+            _, kwargs = mock_uninstall.call_args
+            assert kwargs.get("skip_confirm") is True
+
+    def test_y_flag_sets_skip_confirm(self):
+        from hermes_cli.skills_hub import handle_skills_slash
+        with patch("hermes_cli.skills_hub.do_uninstall") as mock_uninstall:
+            handle_skills_slash("/skills uninstall test-skill -y")
+            mock_uninstall.assert_called_once()
+            _, kwargs = mock_uninstall.call_args
+            assert kwargs.get("skip_confirm") is True
+
+    def test_no_flags(self):
+        from hermes_cli.skills_hub import handle_skills_slash
+        with patch("hermes_cli.skills_hub.do_uninstall") as mock_uninstall:
+            handle_skills_slash("/skills uninstall test-skill")
+            mock_uninstall.assert_called_once()
+            _, kwargs = mock_uninstall.call_args
+            assert kwargs.get("skip_confirm", False) is False
+
+
+class TestDoInstallSkipConfirm:
+    """Test that do_install respects skip_confirm parameter."""
+
+    @patch("hermes_cli.skills_hub.input", return_value="n")
+    def test_without_skip_confirm_prompts_user(self, mock_input):
+        """Without skip_confirm, input() is called for confirmation."""
+        from hermes_cli.skills_hub import do_install
+        with patch("hermes_cli.skills_hub._console"), \
+             patch("tools.skills_hub.ensure_hub_dirs"), \
+             patch("tools.skills_hub.GitHubAuth"), \
+             patch("tools.skills_hub.create_source_router") as mock_router, \
+             patch("hermes_cli.skills_hub._resolve_short_name", return_value="test/skill"), \
+             patch("hermes_cli.skills_hub._resolve_source_meta_and_bundle") as mock_resolve:
+
+            # Make it return None so we exit early
+            mock_resolve.return_value = (None, None, None)
+            do_install("test-skill", skip_confirm=False)
+            # We don't get to the input() call because resolve returns None,
+            # but the parameter wiring is correct
+
+
+class TestDoUninstallSkipConfirm:
+    """Test that do_uninstall respects skip_confirm parameter."""
+
+    def test_skip_confirm_bypasses_input(self):
+        """With skip_confirm=True, input() should not be called."""
+        from hermes_cli.skills_hub import do_uninstall
+        with patch("hermes_cli.skills_hub._console") as mock_console, \
+             patch("tools.skills_hub.uninstall_skill", return_value=(True, "Removed")) as mock_uninstall, \
+             patch("builtins.input") as mock_input:
+            do_uninstall("test-skill", skip_confirm=True)
+            mock_input.assert_not_called()
+            mock_uninstall.assert_called_once_with("test-skill")
+
+    def test_without_skip_confirm_calls_input(self):
+        """Without skip_confirm, input() should be called."""
+        from hermes_cli.skills_hub import do_uninstall
+        with patch("hermes_cli.skills_hub._console"), \
+             patch("tools.skills_hub.uninstall_skill", return_value=(True, "Removed")), \
+             patch("builtins.input", return_value="y") as mock_input:
+            do_uninstall("test-skill", skip_confirm=False)
+            mock_input.assert_called_once()
+
+    def test_without_skip_confirm_cancel(self):
+        """Without skip_confirm, answering 'n' should cancel."""
+        from hermes_cli.skills_hub import do_uninstall
+        with patch("hermes_cli.skills_hub._console"), \
+             patch("tools.skills_hub.uninstall_skill") as mock_uninstall, \
+             patch("builtins.input", return_value="n"):
+            do_uninstall("test-skill", skip_confirm=False)
+            mock_uninstall.assert_not_called()
@@ -0,0 +1,305 @@
+"""Tests for cmd_update gateway auto-restart — systemd + launchd coverage.
+
+Ensures ``hermes update`` correctly detects running gateways managed by
+systemd (Linux) or launchd (macOS) and restarts/informs the user properly,
+rather than leaving zombie processes or telling users to manually restart
+when launchd will auto-respawn.
+"""
+
+import subprocess
+from types import SimpleNamespace
+from unittest.mock import patch, MagicMock
+
+import pytest
+
+import hermes_cli.gateway as gateway_cli
+from hermes_cli.main import cmd_update
+
+
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+
+def _make_run_side_effect(
+    branch="main",
+    verify_ok=True,
+    commit_count="3",
+    systemd_active=False,
+    launchctl_loaded=False,
+):
+    """Build a subprocess.run side_effect that simulates git + service commands."""
+
+    def side_effect(cmd, **kwargs):
+        joined = " ".join(str(c) for c in cmd)
+
+        # git rev-parse --abbrev-ref HEAD
+        if "rev-parse" in joined and "--abbrev-ref" in joined:
+            return subprocess.CompletedProcess(cmd, 0, stdout=f"{branch}\n", stderr="")
+
+        # git rev-parse --verify origin/{branch}
+        if "rev-parse" in joined and "--verify" in joined:
+            rc = 0 if verify_ok else 128
+            return subprocess.CompletedProcess(cmd, rc, stdout="", stderr="")
+
+        # git rev-list HEAD..origin/{branch} --count
+        if "rev-list" in joined:
+            return subprocess.CompletedProcess(cmd, 0, stdout=f"{commit_count}\n", stderr="")
+
+        # systemctl --user is-active
+        if "systemctl" in joined and "is-active" in joined:
+            if systemd_active:
+                return subprocess.CompletedProcess(cmd, 0, stdout="active\n", stderr="")
+            return subprocess.CompletedProcess(cmd, 3, stdout="inactive\n", stderr="")
+
+        # systemctl --user restart
+        if "systemctl" in joined and "restart" in joined:
+            return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
+
+        # launchctl list ai.hermes.gateway
+        if "launchctl" in joined and "list" in joined:
+            if launchctl_loaded:
+                return subprocess.CompletedProcess(cmd, 0, stdout="PID\tStatus\tLabel\n123\t0\tai.hermes.gateway\n", stderr="")
+            return subprocess.CompletedProcess(cmd, 113, stdout="", stderr="Could not find service")
+
+        return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
+
+    return side_effect
+
+
+@pytest.fixture
+def mock_args():
+    return SimpleNamespace()
+
+
+# ---------------------------------------------------------------------------
+# Launchd plist includes --replace
+# ---------------------------------------------------------------------------
+
+
+class TestLaunchdPlistReplace:
+    """The generated launchd plist must include --replace so respawned
+    gateways kill stale instances."""
+
+    def test_plist_contains_replace_flag(self):
+        plist = gateway_cli.generate_launchd_plist()
+        assert "--replace" in plist
+
+    def test_plist_program_arguments_order(self):
+        """--replace comes after 'run' in the ProgramArguments."""
+        plist = gateway_cli.generate_launchd_plist()
+        lines = [line.strip() for line in plist.splitlines()]
+        # Find 'run' and '--replace' in the string entries
+        string_values = [
+            line.replace("<string>", "").replace("</string>", "")
+            for line in lines
+            if "<string>" in line and "</string>" in line
+        ]
+        assert "run" in string_values
+        assert "--replace" in string_values
+        run_idx = string_values.index("run")
+        replace_idx = string_values.index("--replace")
+        assert replace_idx == run_idx + 1
+
+
+# ---------------------------------------------------------------------------
+# cmd_update — macOS launchd detection
+# ---------------------------------------------------------------------------
+
+
+class TestLaunchdPlistRefresh:
+    """refresh_launchd_plist_if_needed rewrites stale plists (like systemd's
+    refresh_systemd_unit_if_needed)."""
+
+    def test_refresh_rewrites_stale_plist(self, tmp_path, monkeypatch):
+        plist_path = tmp_path / "ai.hermes.gateway.plist"
+        plist_path.write_text("<plist>old content</plist>")
+
+        monkeypatch.setattr(gateway_cli, "get_launchd_plist_path", lambda: plist_path)
+
+        calls = []
+        def fake_run(cmd, check=False, **kwargs):
+            calls.append(cmd)
+            return SimpleNamespace(returncode=0, stdout="", stderr="")
+
+        monkeypatch.setattr(gateway_cli.subprocess, "run", fake_run)
+
+        result = gateway_cli.refresh_launchd_plist_if_needed()
+
+        assert result is True
+        # Plist should now contain the generated content (which includes --replace)
+        assert "--replace" in plist_path.read_text()
+        # Should have unloaded then reloaded
+        assert any("unload" in str(c) for c in calls)
+        assert any("load" in str(c) for c in calls)
+
+    def test_refresh_skips_when_current(self, tmp_path, monkeypatch):
+        plist_path = tmp_path / "ai.hermes.gateway.plist"
+        monkeypatch.setattr(gateway_cli, "get_launchd_plist_path", lambda: plist_path)
+
+        # Write the current expected content
+        plist_path.write_text(gateway_cli.generate_launchd_plist())
+
+        calls = []
+        monkeypatch.setattr(
+            gateway_cli.subprocess, "run",
+            lambda cmd, **kw: calls.append(cmd) or SimpleNamespace(returncode=0),
+        )
+
+        result = gateway_cli.refresh_launchd_plist_if_needed()
+
+        assert result is False
+        assert len(calls) == 0  # No launchctl calls needed
+
+    def test_refresh_skips_when_no_plist(self, tmp_path, monkeypatch):
+        plist_path = tmp_path / "nonexistent.plist"
+        monkeypatch.setattr(gateway_cli, "get_launchd_plist_path", lambda: plist_path)
+
+        result = gateway_cli.refresh_launchd_plist_if_needed()
+        assert result is False
+
+    def test_launchd_start_calls_refresh(self, tmp_path, monkeypatch):
+        """launchd_start refreshes the plist before starting."""
+        plist_path = tmp_path / "ai.hermes.gateway.plist"
+        plist_path.write_text("<plist>old</plist>")
+        monkeypatch.setattr(gateway_cli, "get_launchd_plist_path", lambda: plist_path)
+
+        calls = []
+        def fake_run(cmd, check=False, **kwargs):
+            calls.append(cmd)
+            return SimpleNamespace(returncode=0, stdout="", stderr="")
+
+        monkeypatch.setattr(gateway_cli.subprocess, "run", fake_run)
+
+        gateway_cli.launchd_start()
+
+        # First calls should be refresh (unload/load), then start
+        cmd_strs = [" ".join(c) for c in calls]
+        assert any("unload" in s for s in cmd_strs)
+        assert any("start" in s for s in cmd_strs)
+
+
+class TestCmdUpdateLaunchdRestart:
+    """cmd_update correctly detects and handles launchd on macOS."""
+
+    @patch("shutil.which", return_value=None)
+    @patch("subprocess.run")
+    def test_update_detects_launchd_and_skips_manual_restart_message(
+        self, mock_run, _mock_which, mock_args, capsys, tmp_path, monkeypatch,
+    ):
+        """When launchd is running the gateway, update should print
+        'auto-restart via launchd' instead of 'Restart it with: hermes gateway run'."""
+        # Create a fake launchd plist so is_macos + plist.exists() passes
+        plist_path = tmp_path / "ai.hermes.gateway.plist"
+        plist_path.write_text("<plist/>")
+
+        monkeypatch.setattr(
+            gateway_cli, "is_macos", lambda: True,
+        )
+        monkeypatch.setattr(
+            gateway_cli, "get_launchd_plist_path", lambda: plist_path,
+        )
+
+        mock_run.side_effect = _make_run_side_effect(
+            commit_count="3",
+            launchctl_loaded=True,
+        )
+
+        # Mock get_running_pid to return a PID
+        with patch("gateway.status.get_running_pid", return_value=12345), \
+             patch("gateway.status.remove_pid_file"):
+            cmd_update(mock_args)
+
+        captured = capsys.readouterr().out
+        assert "Gateway restarted via launchd" in captured
+        assert "Restart it with: hermes gateway run" not in captured
+        # Verify launchctl stop + start were called (not manual SIGTERM)
+        launchctl_calls = [
+            c for c in mock_run.call_args_list
+            if len(c.args[0]) > 0 and c.args[0][0] == "launchctl"
+        ]
+        stop_calls = [c for c in launchctl_calls if "stop" in c.args[0]]
+        start_calls = [c for c in launchctl_calls if "start" in c.args[0]]
+        assert len(stop_calls) >= 1
+        assert len(start_calls) >= 1
+
+    @patch("shutil.which", return_value=None)
+    @patch("subprocess.run")
+    def test_update_without_launchd_shows_manual_restart(
+        self, mock_run, _mock_which, mock_args, capsys, tmp_path, monkeypatch,
+    ):
+        """When no service manager is running, update should show the manual restart hint."""
+        monkeypatch.setattr(
+            gateway_cli, "is_macos", lambda: True,
+        )
+        plist_path = tmp_path / "ai.hermes.gateway.plist"
+        # plist does NOT exist — no launchd service
+        monkeypatch.setattr(
+            gateway_cli, "get_launchd_plist_path", lambda: plist_path,
+        )
+
+        mock_run.side_effect = _make_run_side_effect(
+            commit_count="3",
+            launchctl_loaded=False,
+        )
+
+        with patch("gateway.status.get_running_pid", return_value=12345), \
+             patch("gateway.status.remove_pid_file"), \
+             patch("os.kill"):
+            cmd_update(mock_args)
+
+        captured = capsys.readouterr().out
+        assert "Restart it with: hermes gateway run" in captured
+        assert "Gateway restarted via launchd" not in captured
+
+    @patch("shutil.which", return_value=None)
+    @patch("subprocess.run")
+    def test_update_with_systemd_still_restarts_via_systemd(
+        self, mock_run, _mock_which, mock_args, capsys, monkeypatch,
+    ):
+        """On Linux with systemd active, update should restart via systemctl."""
+        monkeypatch.setattr(
+            gateway_cli, "is_macos", lambda: False,
+        )
+
+        mock_run.side_effect = _make_run_side_effect(
+            commit_count="3",
+            systemd_active=True,
+        )
+
+        with patch("gateway.status.get_running_pid", return_value=12345), \
+             patch("gateway.status.remove_pid_file"), \
+             patch("os.kill"):
+            cmd_update(mock_args)
+
+        captured = capsys.readouterr().out
+        assert "Gateway restarted" in captured
+        # Verify systemctl restart was called
+        restart_calls = [
+            c for c in mock_run.call_args_list
+            if "restart" in " ".join(str(a) for a in c.args[0])
+            and "systemctl" in " ".join(str(a) for a in c.args[0])
+        ]
+        assert len(restart_calls) == 1
+
+    @patch("shutil.which", return_value=None)
+    @patch("subprocess.run")
+    def test_update_no_gateway_running_skips_restart(
+        self, mock_run, _mock_which, mock_args, capsys, monkeypatch,
+    ):
+        """When no gateway is running, update should skip the restart section entirely."""
+        monkeypatch.setattr(
+            gateway_cli, "is_macos", lambda: False,
+        )
+
+        mock_run.side_effect = _make_run_side_effect(
+            commit_count="3",
+            systemd_active=False,
+        )
+
+        with patch("gateway.status.get_running_pid", return_value=None):
+            cmd_update(mock_args)
+
+        captured = capsys.readouterr().out
+        assert "Stopped gateway" not in captured
+        assert "Gateway restarted" not in captured
+        assert "Gateway restarted via launchd" not in captured
@@ -0,0 +1,268 @@
+"""Tests for #1630 — gateway infinite 400 failure loop prevention.
+
+Verifies that:
+1. Generic 400 errors with large sessions are treated as context-length errors
+   and trigger compression instead of aborting.
+2. The gateway does not persist messages when the agent fails early, preventing
+   the session from growing on each failure.
+3. Context-overflow failures produce helpful error messages suggesting /compact.
+"""
+
+import pytest
+from types import SimpleNamespace
+from unittest.mock import MagicMock, patch
+
+
+# ---------------------------------------------------------------------------
+# Test 1: Agent heuristic — generic 400 with large session → compression
+# ---------------------------------------------------------------------------
+
+
+class TestGeneric400Heuristic:
+    """The agent should treat a generic 400 with a large session as a
+    probable context-length error and trigger compression, not abort."""
+
+    def _make_agent(self):
+        """Create a minimal AIAgent for testing error handling."""
+        with (
+            patch("run_agent.get_tool_definitions", return_value=[]),
+            patch("run_agent.check_toolset_requirements", return_value={}),
+            patch("run_agent.OpenAI"),
+        ):
+            from run_agent import AIAgent
+            a = AIAgent(
+                api_key="test-key-12345",
+                quiet_mode=True,
+                skip_context_files=True,
+                skip_memory=True,
+            )
+            a.client = MagicMock()
+            a._cached_system_prompt = "You are helpful."
+            a._use_prompt_caching = False
+            a.tool_delay = 0
+            a.compression_enabled = False
+            return a
+
+    def test_generic_400_with_small_session_is_client_error(self):
+        """A generic 400 with a small session should still be treated
+        as a non-retryable client error (not context overflow)."""
+        error_msg = "error"
+        status_code = 400
+        approx_tokens = 1000  # Small session
+        api_messages = [{"role": "user", "content": "hi"}]
+
+        # Simulate the phrase matching
+        is_context_length_error = any(phrase in error_msg for phrase in [
+            'context length', 'context size', 'maximum context',
+            'token limit', 'too many tokens', 'reduce the length',
+            'exceeds the limit', 'context window',
+            'request entity too large',
+            'prompt is too long',
+        ])
+        assert not is_context_length_error
+
+        # The heuristic should NOT trigger for small sessions
+        ctx_len = 200000
+        is_large_session = approx_tokens > ctx_len * 0.4 or len(api_messages) > 80
+        is_generic_error = len(error_msg.strip()) < 30
+        assert not is_large_session  # Small session → heuristic doesn't fire
+
+    def test_generic_400_with_large_token_count_triggers_heuristic(self):
+        """A generic 400 with high token count should be treated as
+        probable context overflow."""
+        error_msg = "error"
+        status_code = 400
+        ctx_len = 200000
+        approx_tokens = 100000  # > 40% of 200k
+        api_messages = [{"role": "user", "content": "hi"}] * 20
+
+        is_context_length_error = any(phrase in error_msg for phrase in [
+            'context length', 'context size', 'maximum context',
+        ])
+        assert not is_context_length_error
+
+        # Heuristic check
+        is_large_session = approx_tokens > ctx_len * 0.4 or len(api_messages) > 80
+        is_generic_error = len(error_msg.strip()) < 30
+        assert is_large_session
+        assert is_generic_error
+        # Both conditions true → should be treated as context overflow
+
+    def test_generic_400_with_many_messages_triggers_heuristic(self):
+        """A generic 400 with >80 messages should trigger the heuristic
+        even if estimated tokens are low."""
+        error_msg = "error"
+        status_code = 400
+        ctx_len = 200000
+        approx_tokens = 5000  # Low token estimate
+        api_messages = [{"role": "user", "content": "x"}] * 100  # > 80 messages
+
+        is_large_session = approx_tokens > ctx_len * 0.4 or len(api_messages) > 80
+        is_generic_error = len(error_msg.strip()) < 30
+        assert is_large_session
+        assert is_generic_error
+
+    def test_specific_error_message_bypasses_heuristic(self):
+        """A 400 with a specific, long error message should NOT trigger
+        the heuristic even with a large session."""
+        error_msg = "invalid model: anthropic/claude-nonexistent-model is not available"
+        status_code = 400
+        ctx_len = 200000
+        approx_tokens = 100000
+
+        is_generic_error = len(error_msg.strip()) < 30
+        assert not is_generic_error  # Long specific message → heuristic doesn't fire
+
+    def test_descriptive_context_error_caught_by_phrases(self):
+        """Descriptive context-length errors should still be caught by
+        the existing phrase matching (not the heuristic)."""
+        error_msg = "prompt is too long: 250000 tokens > 200000 maximum"
+        is_context_length_error = any(phrase in error_msg for phrase in [
+            'context length', 'context size', 'maximum context',
+            'token limit', 'too many tokens', 'reduce the length',
+            'exceeds the limit', 'context window',
+            'request entity too large',
+            'prompt is too long',
+        ])
+        assert is_context_length_error
+
+
+# ---------------------------------------------------------------------------
+# Test 2: Gateway skips persistence on failed agent results
+# ---------------------------------------------------------------------------
+
+class TestGatewaySkipsPersistenceOnFailure:
+    """When the agent returns failed=True with no final_response,
+    the gateway should NOT persist messages to the transcript."""
+
+    def test_agent_failed_early_detected(self):
+        """The agent_failed_early flag is True when failed=True and
+        no final_response."""
+        agent_result = {
+            "failed": True,
+            "final_response": None,
+            "messages": [],
+            "error": "Non-retryable client error",
+        }
+        agent_failed_early = (
+            agent_result.get("failed")
+            and not agent_result.get("final_response")
+        )
+        assert agent_failed_early
+
+    def test_agent_with_response_not_failed_early(self):
+        """When the agent has a final_response, it's not a failed-early
+        scenario even if failed=True."""
+        agent_result = {
+            "failed": True,
+            "final_response": "Here is a partial response",
+            "messages": [],
+        }
+        agent_failed_early = (
+            agent_result.get("failed")
+            and not agent_result.get("final_response")
+        )
+        assert not agent_failed_early
+
+    def test_successful_agent_not_failed_early(self):
+        """A successful agent result should not trigger skip."""
+        agent_result = {
+            "final_response": "Hello!",
+            "messages": [{"role": "assistant", "content": "Hello!"}],
+        }
+        agent_failed_early = (
+            agent_result.get("failed")
+            and not agent_result.get("final_response")
+        )
+        assert not agent_failed_early
+
+
+# ---------------------------------------------------------------------------
+# Test 3: Context-overflow error messages
+# ---------------------------------------------------------------------------
+
+class TestContextOverflowErrorMessages:
+    """The gateway should produce helpful error messages when the failure
+    looks like a context overflow."""
+
+    def test_detects_context_keywords(self):
+        """Error messages containing context-related keywords should be
+        identified as context failures."""
+        keywords = [
+            "context length exceeded",
+            "too many tokens in the prompt",
+            "request entity too large",
+            "payload too large for model",
+            "context window exceeded",
+        ]
+        for error_str in keywords:
+            _is_ctx_fail = any(p in error_str.lower() for p in (
+                "context", "token", "too large", "too long",
+                "exceed", "payload",
+            ))
+            assert _is_ctx_fail, f"Should detect: {error_str}"
+
+    def test_detects_generic_400_with_large_history(self):
+        """A generic 400 error code in the string with a large history
+        should be flagged as context failure."""
+        error_str = "error code: 400 - {'type': 'error', 'message': 'Error'}"
+        history_len = 100  # Large session
+
+        _is_ctx_fail = any(p in error_str.lower() for p in (
+            "context", "token", "too large", "too long",
+            "exceed", "payload",
+        )) or (
+            "400" in error_str.lower()
+            and history_len > 50
+        )
+        assert _is_ctx_fail
+
+    def test_unrelated_error_not_flagged(self):
+        """Unrelated errors should not be flagged as context failures."""
+        error_str = "invalid api key: authentication failed"
+        history_len = 10
+
+        _is_ctx_fail = any(p in error_str.lower() for p in (
+            "context", "token", "too large", "too long",
+            "exceed", "payload",
+        )) or (
+            "400" in error_str.lower()
+            and history_len > 50
+        )
+        assert not _is_ctx_fail
+
+
+# ---------------------------------------------------------------------------
+# Test 4: Agent skips persistence for large failed sessions
+# ---------------------------------------------------------------------------
+
+class TestAgentSkipsPersistenceForLargeFailedSessions:
+    """When a 400 error occurs and the session is large, the agent
+    should skip persisting to prevent the growth loop."""
+
+    def test_large_session_400_skips_persistence(self):
+        """Status 400 + high token count should skip persistence."""
+        status_code = 400
+        approx_tokens = 60000  # > 50000 threshold
+        api_messages = [{"role": "user", "content": "x"}] * 10
+
+        should_skip = status_code == 400 and (approx_tokens > 50000 or len(api_messages) > 80)
+        assert should_skip
+
+    def test_small_session_400_persists_normally(self):
+        """Status 400 + small session should still persist."""
+        status_code = 400
+        approx_tokens = 5000  # < 50000
+        api_messages = [{"role": "user", "content": "x"}] * 10  # < 80
+
+        should_skip = status_code == 400 and (approx_tokens > 50000 or len(api_messages) > 80)
+        assert not should_skip
+
+    def test_non_400_error_persists_normally(self):
+        """Non-400 errors should always persist normally."""
+        status_code = 401  # Auth error
+        approx_tokens = 100000  # Large session, but not a 400
+        api_messages = [{"role": "user", "content": "x"}] * 100
+
+        should_skip = status_code == 400 and (approx_tokens > 50000 or len(api_messages) > 80)
+        assert not should_skip
@@ -144,9 +144,11 @@ class TestIsClaudeCodeTokenValid:


 class TestResolveAnthropicToken:
-    def test_prefers_oauth_token_over_api_key(self, monkeypatch):
+    def test_prefers_oauth_token_over_api_key(self, monkeypatch, tmp_path):
        monkeypatch.setenv("ANTHROPIC_API_KEY", "sk-ant-api03-mykey")
        monkeypatch.setenv("ANTHROPIC_TOKEN", "sk-ant-oat01-mytoken")
+        monkeypatch.delenv("CLAUDE_CODE_OAUTH_TOKEN", raising=False)
+        monkeypatch.setattr("agent.anthropic_adapter.Path.home", lambda: tmp_path)
        assert resolve_anthropic_token() == "sk-ant-oat01-mytoken"

    def test_reports_claude_json_primary_key_source(self, monkeypatch, tmp_path):
@@ -174,9 +176,11 @@ class TestResolveAnthropicToken:
        monkeypatch.setattr("agent.anthropic_adapter.Path.home", lambda: tmp_path)
        assert resolve_anthropic_token() == "sk-ant-api03-mykey"

-    def test_falls_back_to_token(self, monkeypatch):
+    def test_falls_back_to_token(self, monkeypatch, tmp_path):
        monkeypatch.delenv("ANTHROPIC_API_KEY", raising=False)
        monkeypatch.setenv("ANTHROPIC_TOKEN", "sk-ant-oat01-mytoken")
+        monkeypatch.delenv("CLAUDE_CODE_OAUTH_TOKEN", raising=False)
+        monkeypatch.setattr("agent.anthropic_adapter.Path.home", lambda: tmp_path)
        assert resolve_anthropic_token() == "sk-ant-oat01-mytoken"

    def test_returns_none_with_no_creds(self, monkeypatch, tmp_path):
@@ -0,0 +1,480 @@
+"""Tests for Anthropic error handling in the agent retry loop.
+
+Covers all error paths in run_agent.py's run_conversation() for api_mode=anthropic_messages:
+- 429 rate limit → retried with backoff
+- 529 overloaded → retried with backoff
+- 400 bad request → non-retryable, immediate fail
+- 401 unauthorized → credential refresh + retry
+- 500 server error → retried with backoff
+- "prompt is too long" → context length error triggers compression
+"""
+
+import asyncio
+import sys
+import types
+from types import SimpleNamespace
+from unittest.mock import MagicMock, AsyncMock
+
+import pytest
+
+sys.modules.setdefault("fire", types.SimpleNamespace(Fire=lambda *a, **k: None))
+sys.modules.setdefault("firecrawl", types.SimpleNamespace(Firecrawl=object))
+sys.modules.setdefault("fal_client", types.SimpleNamespace())
+
+import gateway.run as gateway_run
+import run_agent
+from gateway.config import Platform
+from gateway.session import SessionSource
+
+
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+
+
+def _patch_agent_bootstrap(monkeypatch):
+    monkeypatch.setattr(
+        run_agent,
+        "get_tool_definitions",
+        lambda **kwargs: [
+            {
+                "type": "function",
+                "function": {
+                    "name": "terminal",
+                    "description": "Run shell commands.",
+                    "parameters": {"type": "object", "properties": {}},
+                },
+            }
+        ],
+    )
+    monkeypatch.setattr(run_agent, "check_toolset_requirements", lambda: {})
+
+
+def _anthropic_response(text: str):
+    """Simulate an Anthropic messages.create() response object."""
+    return SimpleNamespace(
+        content=[SimpleNamespace(type="text", text=text)],
+        stop_reason="end_turn",
+        usage=SimpleNamespace(input_tokens=10, output_tokens=5),
+        model="claude-sonnet-4-6-20250514",
+    )
+
+
+class _RateLimitError(Exception):
+    """Simulates Anthropic 429 rate limit error."""
+    def __init__(self):
+        super().__init__("Error code: 429 - Rate limit exceeded. Please retry after 30s.")
+        self.status_code = 429
+
+
+class _OverloadedError(Exception):
+    """Simulates Anthropic 529 overloaded error."""
+    def __init__(self):
+        super().__init__("Error code: 529 - API is temporarily overloaded.")
+        self.status_code = 529
+
+
+class _BadRequestError(Exception):
+    """Simulates Anthropic 400 bad request error (non-retryable)."""
+    def __init__(self):
+        super().__init__("Error code: 400 - Invalid model specified.")
+        self.status_code = 400
+
+
+class _UnauthorizedError(Exception):
+    """Simulates Anthropic 401 unauthorized error."""
+    def __init__(self):
+        super().__init__("Error code: 401 - Unauthorized. Invalid API key.")
+        self.status_code = 401
+
+
+class _ServerError(Exception):
+    """Simulates Anthropic 500 internal server error."""
+    def __init__(self):
+        super().__init__("Error code: 500 - Internal server error.")
+        self.status_code = 500
+
+
+class _PromptTooLongError(Exception):
+    """Simulates Anthropic prompt-too-long error (triggers context compression)."""
+    def __init__(self):
+        super().__init__("prompt is too long: 250000 tokens > 200000 maximum")
+        self.status_code = 400
+
+
+class _FakeAnthropicClient:
+    def close(self):
+        pass
+
+
+def _fake_build_anthropic_client(key, base_url=None):
+    return _FakeAnthropicClient()
+
+
+def _make_agent_cls(error_cls, recover_after=None):
+    """Create an AIAgent subclass that raises error_cls on API calls.
+
+    If recover_after is set, the agent succeeds after that many failures.
+    """
+
+    class _Agent(run_agent.AIAgent):
+        def __init__(self, *args, **kwargs):
+            kwargs.setdefault("skip_context_files", True)
+            kwargs.setdefault("skip_memory", True)
+            kwargs.setdefault("max_iterations", 4)
+            super().__init__(*args, **kwargs)
+            self._cleanup_task_resources = lambda task_id: None
+            self._persist_session = lambda messages, history=None: None
+            self._save_trajectory = lambda messages, user_message, completed: None
+            self._save_session_log = lambda messages: None
+
+        def run_conversation(self, user_message, conversation_history=None, task_id=None):
+            calls = {"n": 0}
+
+            def _fake_api_call(api_kwargs):
+                calls["n"] += 1
+                if recover_after is not None and calls["n"] > recover_after:
+                    return _anthropic_response("Recovered")
+                raise error_cls()
+
+            self._interruptible_api_call = _fake_api_call
+            return super().run_conversation(
+                user_message, conversation_history=conversation_history, task_id=task_id
+            )
+
+    return _Agent
+
+
+def _run_with_agent(monkeypatch, agent_cls):
+    """Run _run_agent through the gateway with the given agent class."""
+    _patch_agent_bootstrap(monkeypatch)
+    monkeypatch.setattr(
+        "agent.anthropic_adapter.build_anthropic_client", _fake_build_anthropic_client
+    )
+    monkeypatch.setattr(run_agent, "AIAgent", agent_cls)
+    monkeypatch.setattr(
+        gateway_run,
+        "_resolve_runtime_agent_kwargs",
+        lambda: {
+            "provider": "anthropic",
+            "api_mode": "anthropic_messages",
+            "base_url": "https://api.anthropic.com",
+            "api_key": "sk-ant-api03-test-key",
+        },
+    )
+    monkeypatch.setenv("HERMES_TOOL_PROGRESS", "false")
+
+    runner = gateway_run.GatewayRunner.__new__(gateway_run.GatewayRunner)
+    runner.adapters = {}
+    runner._ephemeral_system_prompt = ""
+    runner._prefill_messages = []
+    runner._reasoning_config = None
+    runner._provider_routing = {}
+    runner._fallback_model = None
+    runner._running_agents = {}
+    runner.hooks = MagicMock()
+    runner.hooks.emit = AsyncMock()
+    runner.hooks.loaded_hooks = []
+    runner._session_db = None
+
+    source = SessionSource(
+        platform=Platform.LOCAL,
+        chat_id="cli",
+        chat_name="CLI",
+        chat_type="dm",
+        user_id="test-user-1",
+    )
+
+    return asyncio.run(
+        runner._run_agent(
+            message="hello",
+            context_prompt="",
+            history=[],
+            source=source,
+            session_id="test-session",
+            session_key="agent:main:local:dm",
+        )
+    )
+
+
+# ---------------------------------------------------------------------------
+# Tests
+# ---------------------------------------------------------------------------
+
+
+def test_429_rate_limit_is_retried_and_recovers(monkeypatch):
+    """429 should be retried with backoff. First call fails, second succeeds."""
+    agent_cls = _make_agent_cls(_RateLimitError, recover_after=1)
+    result = _run_with_agent(monkeypatch, agent_cls)
+    assert result["final_response"] == "Recovered"
+
+
+def test_529_overloaded_is_retried_and_recovers(monkeypatch):
+    """529 should be retried with backoff. First call fails, second succeeds."""
+    agent_cls = _make_agent_cls(_OverloadedError, recover_after=1)
+    result = _run_with_agent(monkeypatch, agent_cls)
+    assert result["final_response"] == "Recovered"
+
+
+def test_429_exhausts_all_retries_before_raising(monkeypatch):
+    """429 must retry max_retries times, not abort on first attempt."""
+    agent_cls = _make_agent_cls(_RateLimitError)  # always fails
+    with pytest.raises(_RateLimitError):
+        _run_with_agent(monkeypatch, agent_cls)
+
+
+def test_400_bad_request_is_non_retryable(monkeypatch):
+    """400 should fail immediately with only 1 API call (regression guard)."""
+    agent_cls = _make_agent_cls(_BadRequestError)
+    result = _run_with_agent(monkeypatch, agent_cls)
+    assert result["api_calls"] == 1
+    assert "400" in str(result.get("final_response", ""))
+
+
+def test_500_server_error_is_retried_and_recovers(monkeypatch):
+    """500 should be retried with backoff. First call fails, second succeeds."""
+    agent_cls = _make_agent_cls(_ServerError, recover_after=1)
+    result = _run_with_agent(monkeypatch, agent_cls)
+    assert result["final_response"] == "Recovered"
+
+
+def test_401_credential_refresh_recovers(monkeypatch):
+    """401 should trigger credential refresh and retry once."""
+    _patch_agent_bootstrap(monkeypatch)
+    monkeypatch.setattr(
+        "agent.anthropic_adapter.build_anthropic_client", _fake_build_anthropic_client
+    )
+    monkeypatch.setenv("HERMES_TOOL_PROGRESS", "false")
+
+    refresh_count = {"n": 0}
+
+    class _Auth401ThenSuccessAgent(run_agent.AIAgent):
+        def __init__(self, *args, **kwargs):
+            kwargs.setdefault("skip_context_files", True)
+            kwargs.setdefault("skip_memory", True)
+            kwargs.setdefault("max_iterations", 4)
+            super().__init__(*args, **kwargs)
+            self._cleanup_task_resources = lambda task_id: None
+            self._persist_session = lambda messages, history=None: None
+            self._save_trajectory = lambda messages, user_message, completed: None
+            self._save_session_log = lambda messages: None
+
+        def _try_refresh_anthropic_client_credentials(self) -> bool:
+            refresh_count["n"] += 1
+            return True  # Simulate successful credential refresh
+
+        def run_conversation(self, user_message, conversation_history=None, task_id=None):
+            calls = {"n": 0}
+
+            def _fake_api_call(api_kwargs):
+                calls["n"] += 1
+                if calls["n"] == 1:
+                    raise _UnauthorizedError()
+                return _anthropic_response("Auth refreshed")
+
+            self._interruptible_api_call = _fake_api_call
+            return super().run_conversation(
+                user_message, conversation_history=conversation_history, task_id=task_id
+            )
+
+    monkeypatch.setattr(run_agent, "AIAgent", _Auth401ThenSuccessAgent)
+    monkeypatch.setattr(
+        gateway_run,
+        "_resolve_runtime_agent_kwargs",
+        lambda: {
+            "provider": "anthropic",
+            "api_mode": "anthropic_messages",
+            "base_url": "https://api.anthropic.com",
+            "api_key": "sk-ant-api03-test-key",
+        },
+    )
+
+    runner = gateway_run.GatewayRunner.__new__(gateway_run.GatewayRunner)
+    runner.adapters = {}
+    runner._ephemeral_system_prompt = ""
+    runner._prefill_messages = []
+    runner._reasoning_config = None
+    runner._provider_routing = {}
+    runner._fallback_model = None
+    runner._running_agents = {}
+    runner.hooks = MagicMock()
+    runner.hooks.emit = AsyncMock()
+    runner.hooks.loaded_hooks = []
+    runner._session_db = None
+
+    source = SessionSource(
+        platform=Platform.LOCAL, chat_id="cli", chat_name="CLI",
+        chat_type="dm", user_id="test-user-1",
+    )
+
+    result = asyncio.run(
+        runner._run_agent(
+            message="hello", context_prompt="", history=[],
+            source=source, session_id="session-401",
+            session_key="agent:main:local:dm",
+        )
+    )
+
+    assert result["final_response"] == "Auth refreshed"
+    assert refresh_count["n"] == 1
+
+
+def test_401_refresh_fails_is_non_retryable(monkeypatch):
+    """401 with failed credential refresh should be treated as non-retryable."""
+    _patch_agent_bootstrap(monkeypatch)
+    monkeypatch.setattr(
+        "agent.anthropic_adapter.build_anthropic_client", _fake_build_anthropic_client
+    )
+    monkeypatch.setenv("HERMES_TOOL_PROGRESS", "false")
+
+    class _Auth401AlwaysFailAgent(run_agent.AIAgent):
+        def __init__(self, *args, **kwargs):
+            kwargs.setdefault("skip_context_files", True)
+            kwargs.setdefault("skip_memory", True)
+            kwargs.setdefault("max_iterations", 4)
+            super().__init__(*args, **kwargs)
+            self._cleanup_task_resources = lambda task_id: None
+            self._persist_session = lambda messages, history=None: None
+            self._save_trajectory = lambda messages, user_message, completed: None
+            self._save_session_log = lambda messages: None
+
+        def _try_refresh_anthropic_client_credentials(self) -> bool:
+            return False  # Simulate failed credential refresh
+
+        def run_conversation(self, user_message, conversation_history=None, task_id=None):
+            def _fake_api_call(api_kwargs):
+                raise _UnauthorizedError()
+
+            self._interruptible_api_call = _fake_api_call
+            return super().run_conversation(
+                user_message, conversation_history=conversation_history, task_id=task_id
+            )
+
+    monkeypatch.setattr(run_agent, "AIAgent", _Auth401AlwaysFailAgent)
+    monkeypatch.setattr(
+        gateway_run,
+        "_resolve_runtime_agent_kwargs",
+        lambda: {
+            "provider": "anthropic",
+            "api_mode": "anthropic_messages",
+            "base_url": "https://api.anthropic.com",
+            "api_key": "sk-ant-api03-test-key",
+        },
+    )
+
+    runner = gateway_run.GatewayRunner.__new__(gateway_run.GatewayRunner)
+    runner.adapters = {}
+    runner._ephemeral_system_prompt = ""
+    runner._prefill_messages = []
+    runner._reasoning_config = None
+    runner._provider_routing = {}
+    runner._fallback_model = None
+    runner._running_agents = {}
+    runner.hooks = MagicMock()
+    runner.hooks.emit = AsyncMock()
+    runner.hooks.loaded_hooks = []
+    runner._session_db = None
+
+    source = SessionSource(
+        platform=Platform.LOCAL, chat_id="cli", chat_name="CLI",
+        chat_type="dm", user_id="test-user-1",
+    )
+
+    result = asyncio.run(
+        runner._run_agent(
+            message="hello", context_prompt="", history=[],
+            source=source, session_id="session-401-fail",
+            session_key="agent:main:local:dm",
+        )
+    )
+
+    # 401 after failed refresh → non-retryable (falls through to is_client_error)
+    assert result["api_calls"] == 1
+    assert "401" in str(result.get("final_response", "")) or "unauthorized" in str(result.get("final_response", "")).lower()
+
+
+def test_prompt_too_long_triggers_compression(monkeypatch):
+    """Anthropic 'prompt is too long' error should trigger context compression, not immediate fail."""
+    _patch_agent_bootstrap(monkeypatch)
+    monkeypatch.setattr(
+        "agent.anthropic_adapter.build_anthropic_client", _fake_build_anthropic_client
+    )
+    monkeypatch.setenv("HERMES_TOOL_PROGRESS", "false")
+
+    class _PromptTooLongThenSuccessAgent(run_agent.AIAgent):
+        compress_called = 0
+
+        def __init__(self, *args, **kwargs):
+            kwargs.setdefault("skip_context_files", True)
+            kwargs.setdefault("skip_memory", True)
+            kwargs.setdefault("max_iterations", 4)
+            super().__init__(*args, **kwargs)
+            self._cleanup_task_resources = lambda task_id: None
+            self._persist_session = lambda messages, history=None: None
+            self._save_trajectory = lambda messages, user_message, completed: None
+            self._save_session_log = lambda messages: None
+
+        def _compress_context(self, messages, system_message, approx_tokens=0, task_id=None):
+            type(self).compress_called += 1
+            # Simulate compression by dropping oldest non-system message
+            if len(messages) > 2:
+                compressed = [messages[0]] + messages[2:]
+            else:
+                compressed = messages
+            return compressed, system_message
+
+        def run_conversation(self, user_message, conversation_history=None, task_id=None):
+            calls = {"n": 0}
+
+            def _fake_api_call(api_kwargs):
+                calls["n"] += 1
+                if calls["n"] == 1:
+                    raise _PromptTooLongError()
+                return _anthropic_response("Compressed and recovered")
+
+            self._interruptible_api_call = _fake_api_call
+            return super().run_conversation(
+                user_message, conversation_history=conversation_history, task_id=task_id
+            )
+
+    _PromptTooLongThenSuccessAgent.compress_called = 0
+    monkeypatch.setattr(run_agent, "AIAgent", _PromptTooLongThenSuccessAgent)
+    monkeypatch.setattr(
+        gateway_run,
+        "_resolve_runtime_agent_kwargs",
+        lambda: {
+            "provider": "anthropic",
+            "api_mode": "anthropic_messages",
+            "base_url": "https://api.anthropic.com",
+            "api_key": "sk-ant-api03-test-key",
+        },
+    )
+
+    runner = gateway_run.GatewayRunner.__new__(gateway_run.GatewayRunner)
+    runner.adapters = {}
+    runner._ephemeral_system_prompt = ""
+    runner._prefill_messages = []
+    runner._reasoning_config = None
+    runner._provider_routing = {}
+    runner._fallback_model = None
+    runner._running_agents = {}
+    runner.hooks = MagicMock()
+    runner.hooks.emit = AsyncMock()
+    runner.hooks.loaded_hooks = []
+    runner._session_db = None
+
+    source = SessionSource(
+        platform=Platform.LOCAL, chat_id="cli", chat_name="CLI",
+        chat_type="dm", user_id="test-user-1",
+    )
+
+    result = asyncio.run(
+        runner._run_agent(
+            message="hello", context_prompt="", history=[],
+            source=source, session_id="session-prompt-long",
+            session_key="agent:main:local:dm",
+        )
+    )
+
+    assert result["final_response"] == "Compressed and recovered"
+    assert _PromptTooLongThenSuccessAgent.compress_called >= 1
@@ -1,4 +1,4 @@
-"""Tests for API-key provider support (z.ai/GLM, Kimi, MiniMax)."""
+"""Tests for API-key provider support (z.ai/GLM, Kimi, MiniMax, AI Gateway)."""

 import os
 import sys
@@ -37,6 +37,7 @@ class TestProviderRegistry:
        ("kimi-coding", "Kimi / Moonshot", "api_key"),
        ("minimax", "MiniMax", "api_key"),
        ("minimax-cn", "MiniMax (China)", "api_key"),
+        ("ai-gateway", "AI Gateway", "api_key"),
    ])
    def test_provider_registered(self, provider_id, name, auth_type):
        assert provider_id in PROVIDER_REGISTRY
@@ -65,11 +66,17 @@ class TestProviderRegistry:
        assert pconfig.api_key_env_vars == ("MINIMAX_CN_API_KEY",)
        assert pconfig.base_url_env_var == "MINIMAX_CN_BASE_URL"

+    def test_ai_gateway_env_vars(self):
+        pconfig = PROVIDER_REGISTRY["ai-gateway"]
+        assert pconfig.api_key_env_vars == ("AI_GATEWAY_API_KEY",)
+        assert pconfig.base_url_env_var == "AI_GATEWAY_BASE_URL"
+
    def test_base_urls(self):
        assert PROVIDER_REGISTRY["zai"].inference_base_url == "https://api.z.ai/api/paas/v4"
        assert PROVIDER_REGISTRY["kimi-coding"].inference_base_url == "https://api.moonshot.ai/v1"
        assert PROVIDER_REGISTRY["minimax"].inference_base_url == "https://api.minimax.io/v1"
        assert PROVIDER_REGISTRY["minimax-cn"].inference_base_url == "https://api.minimaxi.com/v1"
+        assert PROVIDER_REGISTRY["ai-gateway"].inference_base_url == "https://ai-gateway.vercel.sh/v1"

    def test_oauth_providers_unchanged(self):
        """Ensure we didn't break the existing OAuth providers."""
@@ -87,6 +94,7 @@ PROVIDER_ENV_VARS = (
    "OPENROUTER_API_KEY", "OPENAI_API_KEY", "ANTHROPIC_API_KEY",
    "GLM_API_KEY", "ZAI_API_KEY", "Z_AI_API_KEY",
    "KIMI_API_KEY", "KIMI_BASE_URL", "MINIMAX_API_KEY", "MINIMAX_CN_API_KEY",
+    "AI_GATEWAY_API_KEY", "AI_GATEWAY_BASE_URL",
    "OPENAI_BASE_URL",
 )

@@ -112,6 +120,9 @@ class TestResolveProvider:
    def test_explicit_minimax_cn(self):
        assert resolve_provider("minimax-cn") == "minimax-cn"

+    def test_explicit_ai_gateway(self):
+        assert resolve_provider("ai-gateway") == "ai-gateway"
+
    def test_alias_glm(self):
        assert resolve_provider("glm") == "zai"

@@ -130,6 +141,12 @@ class TestResolveProvider:
    def test_alias_minimax_underscore(self):
        assert resolve_provider("minimax_cn") == "minimax-cn"

+    def test_alias_aigateway(self):
+        assert resolve_provider("aigateway") == "ai-gateway"
+
+    def test_alias_vercel(self):
+        assert resolve_provider("vercel") == "ai-gateway"
+
    def test_alias_case_insensitive(self):
        assert resolve_provider("GLM") == "zai"
        assert resolve_provider("Z-AI") == "zai"
@@ -163,6 +180,10 @@ class TestResolveProvider:
        monkeypatch.setenv("MINIMAX_CN_API_KEY", "test-mm-cn-key")
        assert resolve_provider("auto") == "minimax-cn"

+    def test_auto_detects_ai_gateway_key(self, monkeypatch):
+        monkeypatch.setenv("AI_GATEWAY_API_KEY", "test-gw-key")
+        assert resolve_provider("auto") == "ai-gateway"
+
    def test_openrouter_takes_priority_over_glm(self, monkeypatch):
        """OpenRouter API key should win over GLM in auto-detection."""
        monkeypatch.setenv("OPENROUTER_API_KEY", "or-key")
@@ -248,6 +269,13 @@ class TestResolveApiKeyProviderCredentials:
        assert creds["api_key"] == "mmcn-secret-key"
        assert creds["base_url"] == "https://api.minimaxi.com/v1"

+    def test_resolve_ai_gateway_with_key(self, monkeypatch):
+        monkeypatch.setenv("AI_GATEWAY_API_KEY", "gw-secret-key")
+        creds = resolve_api_key_provider_credentials("ai-gateway")
+        assert creds["provider"] == "ai-gateway"
+        assert creds["api_key"] == "gw-secret-key"
+        assert creds["base_url"] == "https://ai-gateway.vercel.sh/v1"
+
    def test_resolve_with_custom_base_url(self, monkeypatch):
        monkeypatch.setenv("GLM_API_KEY", "glm-key")
        monkeypatch.setenv("GLM_BASE_URL", "https://custom.glm.example/v4")
@@ -309,6 +337,15 @@ class TestRuntimeProviderResolution:
        assert result["provider"] == "minimax"
        assert result["api_key"] == "mm-key"

+    def test_runtime_ai_gateway(self, monkeypatch):
+        monkeypatch.setenv("AI_GATEWAY_API_KEY", "gw-key")
+        from hermes_cli.runtime_provider import resolve_runtime_provider
+        result = resolve_runtime_provider(requested="ai-gateway")
+        assert result["provider"] == "ai-gateway"
+        assert result["api_mode"] == "chat_completions"
+        assert result["api_key"] == "gw-key"
+        assert "ai-gateway.vercel.sh" in result["base_url"]
+
    def test_runtime_auto_detects_api_key_provider(self, monkeypatch):
        monkeypatch.setenv("KIMI_API_KEY", "auto-kimi-key")
        from hermes_cli.runtime_provider import resolve_runtime_provider
@@ -35,7 +35,9 @@ class TestSlashCommandPrefixMatching:
                raise RecursionError("process_command called too many times")
            return original(self_inner, cmd)

-        with patch.object(type(cli_obj), 'process_command', counting_process_command):
+        # Mock show_config since the test is about recursion, not config display
+        with patch.object(type(cli_obj), 'process_command', counting_process_command), \
+             patch.object(cli_obj, 'show_config'):
            try:
                cli_obj.process_command("/con set key value")
            except RecursionError:
@@ -57,7 +59,9 @@ class TestSlashCommandPrefixMatching:
                raise RecursionError("Infinite recursion detected")
            return original_pc(self_inner, cmd)

-        with patch.object(HermesCLI, 'process_command', guarded):
+        # Mock show_config since the test is about recursion, not config display
+        with patch.object(HermesCLI, 'process_command', guarded), \
+             patch.object(cli_obj, 'show_config'):
            try:
                cli_obj.process_command("/config set key value")
            except RecursionError:
@@ -68,15 +72,17 @@ class TestSlashCommandPrefixMatching:
    def test_ambiguous_prefix_shows_suggestions(self):
        """/re matches multiple commands — should show ambiguous message."""
        cli_obj = _make_cli()
-        cli_obj.process_command("/re")
-        printed = " ".join(str(c) for c in cli_obj.console.print.call_args_list)
+        with patch("cli._cprint") as mock_cprint:
+            cli_obj.process_command("/re")
+            printed = " ".join(str(c) for c in mock_cprint.call_args_list)
        assert "Ambiguous" in printed or "Did you mean" in printed

    def test_unknown_command_shows_error(self):
        """/xyz should show unknown command error."""
        cli_obj = _make_cli()
-        cli_obj.process_command("/xyz")
-        printed = " ".join(str(c) for c in cli_obj.console.print.call_args_list)
+        with patch("cli._cprint") as mock_cprint:
+            cli_obj.process_command("/xyz")
+            printed = " ".join(str(c) for c in mock_cprint.call_args_list)
        assert "Unknown command" in printed

    def test_exact_command_still_works(self):
@@ -65,24 +65,39 @@ class TestCLIStatusBar:
        assert "claude-sonnet-4-20250514" in text
        assert "12.4K/200K" in text
        assert "6%" in text
-        assert "$0.06" in text
+        assert "$0.06" not in text  # cost hidden by default
        assert "15m" in text

+    def test_build_status_bar_text_shows_cost_when_enabled(self):
+        cli_obj = _attach_agent(
+            _make_cli(),
+            prompt_tokens=10000,
+            completion_tokens=2400,
+            total_tokens=12400,
+            api_calls=7,
+            context_tokens=12400,
+            context_length=200_000,
+        )
+        cli_obj.show_cost = True
+
+        text = cli_obj._build_status_bar_text(width=120)
+        assert "$" in text  # cost is shown when enabled
+
    def test_build_status_bar_text_collapses_for_narrow_terminal(self):
        cli_obj = _attach_agent(
            _make_cli(),
-            prompt_tokens=10_230,
-            completion_tokens=2_220,
-            total_tokens=12_450,
+            prompt_tokens=10000,
+            completion_tokens=2400,
+            total_tokens=12400,
            api_calls=7,
-            context_tokens=12_450,
+            context_tokens=12400,
            context_length=200_000,
        )

        text = cli_obj._build_status_bar_text(width=60)

        assert "⚕" in text
-        assert "$0.06" in text
+        assert "$0.06" not in text  # cost hidden by default
        assert "15m" in text
        assert "200K" not in text

@@ -0,0 +1,115 @@
+"""Tests for context token tracking in run_agent.py's usage extraction.
+
+The context counter (status bar) must show the TOTAL prompt tokens including
+Anthropic's cached portions. This is an integration test for the token
+extraction in run_conversation(), not the ContextCompressor itself (which
+is tested in tests/agent/test_context_compressor.py).
+"""
+
+import sys
+import types
+from types import SimpleNamespace
+
+sys.modules.setdefault("fire", types.SimpleNamespace(Fire=lambda *a, **k: None))
+sys.modules.setdefault("firecrawl", types.SimpleNamespace(Firecrawl=object))
+sys.modules.setdefault("fal_client", types.SimpleNamespace())
+
+import run_agent
+
+
+def _patch_bootstrap(monkeypatch):
+    monkeypatch.setattr(run_agent, "get_tool_definitions", lambda **kwargs: [{
+        "type": "function",
+        "function": {"name": "t", "description": "t", "parameters": {"type": "object", "properties": {}}},
+    }])
+    monkeypatch.setattr(run_agent, "check_toolset_requirements", lambda: {})
+
+
+class _FakeAnthropicClient:
+    def close(self):
+        pass
+
+
+def _make_agent(monkeypatch, api_mode, provider, response_fn):
+    _patch_bootstrap(monkeypatch)
+    if api_mode == "anthropic_messages":
+        monkeypatch.setattr("agent.anthropic_adapter.build_anthropic_client", lambda k, b=None: _FakeAnthropicClient())
+
+    class _A(run_agent.AIAgent):
+        def __init__(self, *a, **kw):
+            kw.update(skip_context_files=True, skip_memory=True, max_iterations=4)
+            super().__init__(*a, **kw)
+            self._cleanup_task_resources = self._persist_session = lambda *a, **k: None
+            self._save_trajectory = self._save_session_log = lambda *a, **k: None
+
+        def run_conversation(self, msg, conversation_history=None, task_id=None):
+            self._interruptible_api_call = lambda kw: response_fn()
+            return super().run_conversation(msg, conversation_history=conversation_history, task_id=task_id)
+
+    return _A(model="test-model", api_key="test-key", provider=provider, api_mode=api_mode)
+
+
+def _anthropic_resp(input_tok, output_tok, cache_read=0, cache_creation=0):
+    usage_fields = {"input_tokens": input_tok, "output_tokens": output_tok}
+    if cache_read:
+        usage_fields["cache_read_input_tokens"] = cache_read
+    if cache_creation:
+        usage_fields["cache_creation_input_tokens"] = cache_creation
+    return SimpleNamespace(
+        content=[SimpleNamespace(type="text", text="ok")],
+        stop_reason="end_turn",
+        usage=SimpleNamespace(**usage_fields),
+        model="claude-sonnet-4-6",
+    )
+
+
+# -- Anthropic: cached tokens must be included --
+
+def test_anthropic_cache_read_and_creation_added(monkeypatch):
+    agent = _make_agent(monkeypatch, "anthropic_messages", "anthropic",
+                        lambda: _anthropic_resp(3, 10, cache_read=15000, cache_creation=2000))
+    agent.run_conversation("hi")
+    assert agent.context_compressor.last_prompt_tokens == 17003  # 3+15000+2000
+    assert agent.session_prompt_tokens == 17003
+
+
+def test_anthropic_no_cache_fields(monkeypatch):
+    agent = _make_agent(monkeypatch, "anthropic_messages", "anthropic",
+                        lambda: _anthropic_resp(500, 20))
+    agent.run_conversation("hi")
+    assert agent.context_compressor.last_prompt_tokens == 500
+
+
+def test_anthropic_cache_read_only(monkeypatch):
+    agent = _make_agent(monkeypatch, "anthropic_messages", "anthropic",
+                        lambda: _anthropic_resp(5, 15, cache_read=17666, cache_creation=15))
+    agent.run_conversation("hi")
+    assert agent.context_compressor.last_prompt_tokens == 17686  # 5+17666+15
+
+
+# -- OpenAI: prompt_tokens already total --
+
+def test_openai_prompt_tokens_unchanged(monkeypatch):
+    resp = lambda: SimpleNamespace(
+        choices=[SimpleNamespace(index=0, message=SimpleNamespace(
+            role="assistant", content="ok", tool_calls=None, reasoning_content=None,
+        ), finish_reason="stop")],
+        usage=SimpleNamespace(prompt_tokens=5000, completion_tokens=100, total_tokens=5100),
+        model="gpt-4o",
+    )
+    agent = _make_agent(monkeypatch, "chat_completions", "openrouter", resp)
+    agent.run_conversation("hi")
+    assert agent.context_compressor.last_prompt_tokens == 5000
+
+
+# -- Codex: no cache fields, getattr returns 0 --
+
+def test_codex_no_cache_fields(monkeypatch):
+    resp = lambda: SimpleNamespace(
+        output=[SimpleNamespace(type="message", content=[SimpleNamespace(type="output_text", text="ok")])],
+        usage=SimpleNamespace(input_tokens=3000, output_tokens=50, total_tokens=3050),
+        status="completed", model="gpt-5-codex",
+    )
+    agent = _make_agent(monkeypatch, "codex_responses", "openai-codex", resp)
+    agent.run_conversation("hi")
+    assert agent.context_compressor.last_prompt_tokens == 3000
@@ -59,8 +59,11 @@ def _build_agent(shared_client=None):
    agent._interrupt_requested = False
    agent._interrupt_message = None
    agent._client_lock = threading.RLock()
-    agent._client_kwargs = {"api_key": "test-key", "base_url": agent.base_url}
+    agent._client_kwargs = {"api_key": "***", "base_url": agent.base_url}
    agent.client = shared_client or FakeSharedClient(lambda **kwargs: {"shared": True})
+    agent.stream_delta_callback = None
+    agent._stream_callback = None
+    agent.reasoning_callback = None
    return agent


@@ -173,7 +176,11 @@ def test_streaming_call_recreates_closed_shared_client_before_request(monkeypatc
    monkeypatch.setattr(run_agent, "OpenAI", factory)

    agent = _build_agent(shared_client=stale_shared)
-    response = agent._streaming_api_call({"model": agent.model, "messages": []}, lambda _delta: None)
+    agent.stream_delta_callback = lambda _delta: None
+    # Force chat_completions mode so the streaming path uses
+    # chat.completions.create(stream=True) instead of Codex responses.stream()
+    agent.api_mode = "chat_completions"
+    response = agent._interruptible_streaming_api_call({"model": agent.model, "messages": []})

    assert response.choices[0].message.content == "Hello world"
    assert agent.client is replacement_shared
@@ -0,0 +1,340 @@
+"""Tests for the Hermes plugin system (hermes_cli.plugins)."""
+
+import logging
+import os
+import sys
+import types
+from pathlib import Path
+from unittest.mock import MagicMock, patch
+
+import pytest
+import yaml
+
+from hermes_cli.plugins import (
+    ENTRY_POINTS_GROUP,
+    VALID_HOOKS,
+    LoadedPlugin,
+    PluginContext,
+    PluginManager,
+    PluginManifest,
+    get_plugin_manager,
+    get_plugin_tool_names,
+    discover_plugins,
+    invoke_hook,
+)
+
+
+# ── Helpers ────────────────────────────────────────────────────────────────
+
+
+def _make_plugin_dir(base: Path, name: str, *, register_body: str = "pass",
+                     manifest_extra: dict | None = None) -> Path:
+    """Create a minimal plugin directory with plugin.yaml + __init__.py."""
+    plugin_dir = base / name
+    plugin_dir.mkdir(parents=True, exist_ok=True)
+
+    manifest = {"name": name, "version": "0.1.0", "description": f"Test plugin {name}"}
+    if manifest_extra:
+        manifest.update(manifest_extra)
+
+    (plugin_dir / "plugin.yaml").write_text(yaml.dump(manifest))
+    (plugin_dir / "__init__.py").write_text(
+        f"def register(ctx):\n    {register_body}\n"
+    )
+    return plugin_dir
+
+
+# ── TestPluginDiscovery ────────────────────────────────────────────────────
+
+
+class TestPluginDiscovery:
+    """Tests for plugin discovery from directories and entry points."""
+
+    def test_discover_user_plugins(self, tmp_path, monkeypatch):
+        """Plugins in ~/.hermes/plugins/ are discovered."""
+        plugins_dir = tmp_path / "hermes_test" / "plugins"
+        _make_plugin_dir(plugins_dir, "hello_plugin")
+        monkeypatch.setenv("HERMES_HOME", str(tmp_path / "hermes_test"))
+
+        mgr = PluginManager()
+        mgr.discover_and_load()
+
+        assert "hello_plugin" in mgr._plugins
+        assert mgr._plugins["hello_plugin"].enabled
+
+    def test_discover_project_plugins(self, tmp_path, monkeypatch):
+        """Plugins in ./.hermes/plugins/ are discovered."""
+        project_dir = tmp_path / "project"
+        project_dir.mkdir()
+        monkeypatch.chdir(project_dir)
+        plugins_dir = project_dir / ".hermes" / "plugins"
+        _make_plugin_dir(plugins_dir, "proj_plugin")
+
+        mgr = PluginManager()
+        mgr.discover_and_load()
+
+        assert "proj_plugin" in mgr._plugins
+        assert mgr._plugins["proj_plugin"].enabled
+
+    def test_discover_is_idempotent(self, tmp_path, monkeypatch):
+        """Calling discover_and_load() twice does not duplicate plugins."""
+        plugins_dir = tmp_path / "hermes_test" / "plugins"
+        _make_plugin_dir(plugins_dir, "once_plugin")
+        monkeypatch.setenv("HERMES_HOME", str(tmp_path / "hermes_test"))
+
+        mgr = PluginManager()
+        mgr.discover_and_load()
+        mgr.discover_and_load()  # second call should no-op
+
+        assert len(mgr._plugins) == 1
+
+    def test_discover_skips_dir_without_manifest(self, tmp_path, monkeypatch):
+        """Directories without plugin.yaml are silently skipped."""
+        plugins_dir = tmp_path / "hermes_test" / "plugins"
+        (plugins_dir / "no_manifest").mkdir(parents=True)
+        monkeypatch.setenv("HERMES_HOME", str(tmp_path / "hermes_test"))
+
+        mgr = PluginManager()
+        mgr.discover_and_load()
+
+        assert len(mgr._plugins) == 0
+
+    def test_entry_points_scanned(self, tmp_path, monkeypatch):
+        """Entry-point based plugins are discovered (mocked)."""
+        monkeypatch.setenv("HERMES_HOME", str(tmp_path / "hermes_test"))
+
+        fake_module = types.ModuleType("fake_ep_plugin")
+        fake_module.register = lambda ctx: None  # type: ignore[attr-defined]
+
+        fake_ep = MagicMock()
+        fake_ep.name = "ep_plugin"
+        fake_ep.value = "fake_ep_plugin:register"
+        fake_ep.group = ENTRY_POINTS_GROUP
+        fake_ep.load.return_value = fake_module
+
+        def fake_entry_points():
+            result = MagicMock()
+            result.select = MagicMock(return_value=[fake_ep])
+            return result
+
+        with patch("importlib.metadata.entry_points", fake_entry_points):
+            mgr = PluginManager()
+            mgr.discover_and_load()
+
+        assert "ep_plugin" in mgr._plugins
+
+
+# ── TestPluginLoading ──────────────────────────────────────────────────────
+
+
+class TestPluginLoading:
+    """Tests for plugin module loading."""
+
+    def test_load_missing_init(self, tmp_path, monkeypatch):
+        """Plugin dir without __init__.py records an error."""
+        plugins_dir = tmp_path / "hermes_test" / "plugins"
+        plugin_dir = plugins_dir / "bad_plugin"
+        plugin_dir.mkdir(parents=True)
+        (plugin_dir / "plugin.yaml").write_text(yaml.dump({"name": "bad_plugin"}))
+        monkeypatch.setenv("HERMES_HOME", str(tmp_path / "hermes_test"))
+
+        mgr = PluginManager()
+        mgr.discover_and_load()
+
+        assert "bad_plugin" in mgr._plugins
+        assert not mgr._plugins["bad_plugin"].enabled
+        assert mgr._plugins["bad_plugin"].error is not None
+
+    def test_load_missing_register_fn(self, tmp_path, monkeypatch):
+        """Plugin without register() function records an error."""
+        plugins_dir = tmp_path / "hermes_test" / "plugins"
+        plugin_dir = plugins_dir / "no_reg"
+        plugin_dir.mkdir(parents=True)
+        (plugin_dir / "plugin.yaml").write_text(yaml.dump({"name": "no_reg"}))
+        (plugin_dir / "__init__.py").write_text("# no register function\n")
+        monkeypatch.setenv("HERMES_HOME", str(tmp_path / "hermes_test"))
+
+        mgr = PluginManager()
+        mgr.discover_and_load()
+
+        assert "no_reg" in mgr._plugins
+        assert not mgr._plugins["no_reg"].enabled
+        assert "no register()" in mgr._plugins["no_reg"].error
+
+    def test_load_registers_namespace_module(self, tmp_path, monkeypatch):
+        """Directory plugins are importable under hermes_plugins.<name>."""
+        plugins_dir = tmp_path / "hermes_test" / "plugins"
+        _make_plugin_dir(plugins_dir, "ns_plugin")
+        monkeypatch.setenv("HERMES_HOME", str(tmp_path / "hermes_test"))
+
+        # Clean up any prior namespace module
+        sys.modules.pop("hermes_plugins.ns_plugin", None)
+
+        mgr = PluginManager()
+        mgr.discover_and_load()
+
+        assert "hermes_plugins.ns_plugin" in sys.modules
+
+
+# ── TestPluginHooks ────────────────────────────────────────────────────────
+
+
+class TestPluginHooks:
+    """Tests for lifecycle hook registration and invocation."""
+
+    def test_register_and_invoke_hook(self, tmp_path, monkeypatch):
+        """Registered hooks are called on invoke_hook()."""
+        plugins_dir = tmp_path / "hermes_test" / "plugins"
+        _make_plugin_dir(
+            plugins_dir, "hook_plugin",
+            register_body='ctx.register_hook("pre_tool_call", lambda **kw: None)',
+        )
+        monkeypatch.setenv("HERMES_HOME", str(tmp_path / "hermes_test"))
+
+        mgr = PluginManager()
+        mgr.discover_and_load()
+
+        # Should not raise
+        mgr.invoke_hook("pre_tool_call", tool_name="test", args={}, task_id="t1")
+
+    def test_hook_exception_does_not_propagate(self, tmp_path, monkeypatch):
+        """A hook callback that raises does NOT crash the caller."""
+        plugins_dir = tmp_path / "hermes_test" / "plugins"
+        _make_plugin_dir(
+            plugins_dir, "bad_hook",
+            register_body='ctx.register_hook("post_tool_call", lambda **kw: 1/0)',
+        )
+        monkeypatch.setenv("HERMES_HOME", str(tmp_path / "hermes_test"))
+
+        mgr = PluginManager()
+        mgr.discover_and_load()
+
+        # Should not raise despite 1/0
+        mgr.invoke_hook("post_tool_call", tool_name="x", args={}, result="r", task_id="")
+
+    def test_invalid_hook_name_warns(self, tmp_path, monkeypatch, caplog):
+        """Registering an unknown hook name logs a warning."""
+        plugins_dir = tmp_path / "hermes_test" / "plugins"
+        _make_plugin_dir(
+            plugins_dir, "warn_plugin",
+            register_body='ctx.register_hook("on_banana", lambda **kw: None)',
+        )
+        monkeypatch.setenv("HERMES_HOME", str(tmp_path / "hermes_test"))
+
+        with caplog.at_level(logging.WARNING, logger="hermes_cli.plugins"):
+            mgr = PluginManager()
+            mgr.discover_and_load()
+
+        assert any("on_banana" in record.message for record in caplog.records)
+
+
+# ── TestPluginContext ──────────────────────────────────────────────────────
+
+
+class TestPluginContext:
+    """Tests for the PluginContext facade."""
+
+    def test_register_tool_adds_to_registry(self, tmp_path, monkeypatch):
+        """PluginContext.register_tool() puts the tool in the global registry."""
+        plugins_dir = tmp_path / "hermes_test" / "plugins"
+        plugin_dir = plugins_dir / "tool_plugin"
+        plugin_dir.mkdir(parents=True)
+        (plugin_dir / "plugin.yaml").write_text(yaml.dump({"name": "tool_plugin"}))
+        (plugin_dir / "__init__.py").write_text(
+            'def register(ctx):\n'
+            '    ctx.register_tool(\n'
+            '        name="plugin_echo",\n'
+            '        toolset="plugin_tool_plugin",\n'
+            '        schema={"name": "plugin_echo", "description": "Echo", "parameters": {"type": "object", "properties": {}}},\n'
+            '        handler=lambda args, **kw: "echo",\n'
+            '    )\n'
+        )
+        monkeypatch.setenv("HERMES_HOME", str(tmp_path / "hermes_test"))
+
+        mgr = PluginManager()
+        mgr.discover_and_load()
+
+        assert "plugin_echo" in mgr._plugin_tool_names
+
+        from tools.registry import registry
+        assert "plugin_echo" in registry._tools
+
+
+# ── TestPluginToolVisibility ───────────────────────────────────────────────
+
+
+class TestPluginToolVisibility:
+    """Plugin-registered tools appear in get_tool_definitions()."""
+
+    def test_plugin_tools_in_definitions(self, tmp_path, monkeypatch):
+        """Tools from plugins bypass the toolset filter."""
+        import hermes_cli.plugins as plugins_mod
+
+        plugins_dir = tmp_path / "hermes_test" / "plugins"
+        plugin_dir = plugins_dir / "vis_plugin"
+        plugin_dir.mkdir(parents=True)
+        (plugin_dir / "plugin.yaml").write_text(yaml.dump({"name": "vis_plugin"}))
+        (plugin_dir / "__init__.py").write_text(
+            'def register(ctx):\n'
+            '    ctx.register_tool(\n'
+            '        name="vis_tool",\n'
+            '        toolset="plugin_vis_plugin",\n'
+            '        schema={"name": "vis_tool", "description": "Visible", "parameters": {"type": "object", "properties": {}}},\n'
+            '        handler=lambda args, **kw: "ok",\n'
+            '    )\n'
+        )
+        monkeypatch.setenv("HERMES_HOME", str(tmp_path / "hermes_test"))
+
+        mgr = PluginManager()
+        mgr.discover_and_load()
+        monkeypatch.setattr(plugins_mod, "_plugin_manager", mgr)
+
+        from model_tools import get_tool_definitions
+        tools = get_tool_definitions(enabled_toolsets=["terminal"], quiet_mode=True)
+        tool_names = [t["function"]["name"] for t in tools]
+        assert "vis_tool" in tool_names
+
+
+# ── TestPluginManagerList ──────────────────────────────────────────────────
+
+
+class TestPluginManagerList:
+    """Tests for PluginManager.list_plugins()."""
+
+    def test_list_empty(self):
+        """Empty manager returns empty list."""
+        mgr = PluginManager()
+        assert mgr.list_plugins() == []
+
+    def test_list_returns_sorted(self, tmp_path, monkeypatch):
+        """list_plugins() returns results sorted by name."""
+        plugins_dir = tmp_path / "hermes_test" / "plugins"
+        _make_plugin_dir(plugins_dir, "zulu")
+        _make_plugin_dir(plugins_dir, "alpha")
+        monkeypatch.setenv("HERMES_HOME", str(tmp_path / "hermes_test"))
+
+        mgr = PluginManager()
+        mgr.discover_and_load()
+
+        listing = mgr.list_plugins()
+        names = [p["name"] for p in listing]
+        assert names == sorted(names)
+
+    def test_list_with_plugins(self, tmp_path, monkeypatch):
+        """list_plugins() returns info dicts for each discovered plugin."""
+        plugins_dir = tmp_path / "hermes_test" / "plugins"
+        _make_plugin_dir(plugins_dir, "alpha")
+        _make_plugin_dir(plugins_dir, "beta")
+        monkeypatch.setenv("HERMES_HOME", str(tmp_path / "hermes_test"))
+
+        mgr = PluginManager()
+        mgr.discover_and_load()
+
+        listing = mgr.list_plugins()
+        names = [p["name"] for p in listing]
+        assert "alpha" in names
+        assert "beta" in names
+        for p in listing:
+            assert "enabled" in p
+            assert "tools" in p
+            assert "hooks" in p
@@ -137,6 +137,40 @@ class TestBuildApiKwargsOpenRouter:
        assert "codex_reasoning_items" in messages[1]


+class TestBuildApiKwargsAIGateway:
+    def test_uses_chat_completions_format(self, monkeypatch):
+        agent = _make_agent(monkeypatch, "ai-gateway", base_url="https://ai-gateway.vercel.sh/v1")
+        messages = [{"role": "user", "content": "hi"}]
+        kwargs = agent._build_api_kwargs(messages)
+        assert "messages" in kwargs
+        assert "model" in kwargs
+        assert kwargs["messages"][-1]["content"] == "hi"
+
+    def test_no_responses_api_fields(self, monkeypatch):
+        agent = _make_agent(monkeypatch, "ai-gateway", base_url="https://ai-gateway.vercel.sh/v1")
+        messages = [{"role": "user", "content": "hi"}]
+        kwargs = agent._build_api_kwargs(messages)
+        assert "input" not in kwargs
+        assert "instructions" not in kwargs
+        assert "store" not in kwargs
+
+    def test_includes_reasoning_in_extra_body(self, monkeypatch):
+        agent = _make_agent(monkeypatch, "ai-gateway", base_url="https://ai-gateway.vercel.sh/v1")
+        messages = [{"role": "user", "content": "hi"}]
+        kwargs = agent._build_api_kwargs(messages)
+        extra = kwargs.get("extra_body", {})
+        assert "reasoning" in extra
+        assert extra["reasoning"]["enabled"] is True
+
+    def test_includes_tools(self, monkeypatch):
+        agent = _make_agent(monkeypatch, "ai-gateway", base_url="https://ai-gateway.vercel.sh/v1")
+        messages = [{"role": "user", "content": "hi"}]
+        kwargs = agent._build_api_kwargs(messages)
+        assert "tools" in kwargs
+        tool_names = [t["function"]["name"] for t in kwargs["tools"]]
+        assert "web_search" in tool_names
+
+
 class TestBuildApiKwargsNousPortal:
    def test_includes_nous_product_tags(self, monkeypatch):
        agent = _make_agent(monkeypatch, "nous", base_url="https://inference-api.nousresearch.com/v1")
@@ -72,10 +72,11 @@ class TestCLIQuickCommands:

    def test_unknown_command_still_shows_error(self):
        cli = self._make_cli({})
-        cli.process_command("/nonexistent")
-        cli.console.print.assert_called()
-        args = cli.console.print.call_args_list[0][0][0]
-        assert "unknown command" in args.lower()
+        with patch("cli._cprint") as mock_cprint:
+            cli.process_command("/nonexistent")
+            mock_cprint.assert_called()
+            printed = " ".join(str(c) for c in mock_cprint.call_args_list)
+            assert "unknown command" in printed.lower()

    def test_timeout_shows_error(self):
        cli = self._make_cli({"slow": {"type": "exec", "command": "sleep 100"}})
@@ -2329,8 +2329,9 @@ class TestStreamingApiCall:
        ]
        agent.client.chat.completions.create.return_value = iter(chunks)
        callback = MagicMock()
+        agent.stream_delta_callback = callback

-        resp = agent._streaming_api_call({"messages": []}, callback)
+        resp = agent._interruptible_streaming_api_call({"messages": []})

        assert resp.choices[0].message.content == "Hello World"
        assert resp.choices[0].finish_reason == "stop"
@@ -2347,7 +2348,7 @@ class TestStreamingApiCall:
        ]
        agent.client.chat.completions.create.return_value = iter(chunks)

-        resp = agent._streaming_api_call({"messages": []}, MagicMock())
+        resp = agent._interruptible_streaming_api_call({"messages": []})

        tc = resp.choices[0].message.tool_calls
        assert len(tc) == 1
@@ -2363,7 +2364,7 @@ class TestStreamingApiCall:
        ]
        agent.client.chat.completions.create.return_value = iter(chunks)

-        resp = agent._streaming_api_call({"messages": []}, MagicMock())
+        resp = agent._interruptible_streaming_api_call({"messages": []})

        tc = resp.choices[0].message.tool_calls
        assert len(tc) == 2
@@ -2378,7 +2379,7 @@ class TestStreamingApiCall:
        ]
        agent.client.chat.completions.create.return_value = iter(chunks)

-        resp = agent._streaming_api_call({"messages": []}, MagicMock())
+        resp = agent._interruptible_streaming_api_call({"messages": []})

        assert resp.choices[0].message.content == "I'll search"
        assert len(resp.choices[0].message.tool_calls) == 1
@@ -2387,7 +2388,7 @@ class TestStreamingApiCall:
        chunks = [_make_chunk(finish_reason="stop")]
        agent.client.chat.completions.create.return_value = iter(chunks)

-        resp = agent._streaming_api_call({"messages": []}, MagicMock())
+        resp = agent._interruptible_streaming_api_call({"messages": []})

        assert resp.choices[0].message.content is None
        assert resp.choices[0].message.tool_calls is None
@@ -2399,9 +2400,9 @@ class TestStreamingApiCall:
            _make_chunk(finish_reason="stop"),
        ]
        agent.client.chat.completions.create.return_value = iter(chunks)
-        callback = MagicMock(side_effect=ValueError("boom"))
+        agent.stream_delta_callback = MagicMock(side_effect=ValueError("boom"))

-        resp = agent._streaming_api_call({"messages": []}, callback)
+        resp = agent._interruptible_streaming_api_call({"messages": []})

        assert resp.choices[0].message.content == "Hello World"

@@ -2412,7 +2413,7 @@ class TestStreamingApiCall:
        ]
        agent.client.chat.completions.create.return_value = iter(chunks)

-        resp = agent._streaming_api_call({"messages": []}, MagicMock())
+        resp = agent._interruptible_streaming_api_call({"messages": []})

        assert resp.model == "gpt-4o"

@@ -2420,22 +2421,23 @@ class TestStreamingApiCall:
        chunks = [_make_chunk(content="x"), _make_chunk(finish_reason="stop")]
        agent.client.chat.completions.create.return_value = iter(chunks)

-        agent._streaming_api_call({"messages": [], "model": "test"}, MagicMock())
+        agent._interruptible_streaming_api_call({"messages": [], "model": "test"})

        call_kwargs = agent.client.chat.completions.create.call_args
        assert call_kwargs[1].get("stream") is True or call_kwargs.kwargs.get("stream") is True

-    def test_api_exception_propagated(self, agent):
+    def test_api_exception_falls_back_to_non_streaming(self, agent):
+        """When streaming fails before any deltas, fallback to non-streaming is attempted."""
        agent.client.chat.completions.create.side_effect = ConnectionError("fail")
-
+        # The fallback also uses the same client, so it'll fail too
        with pytest.raises(ConnectionError, match="fail"):
-            agent._streaming_api_call({"messages": []}, MagicMock())
+            agent._interruptible_streaming_api_call({"messages": []})

    def test_response_has_uuid_id(self, agent):
        chunks = [_make_chunk(content="x"), _make_chunk(finish_reason="stop")]
        agent.client.chat.completions.create.return_value = iter(chunks)

-        resp = agent._streaming_api_call({"messages": []}, MagicMock())
+        resp = agent._interruptible_streaming_api_call({"messages": []})

        assert resp.id.startswith("stream-")
        assert len(resp.id) > len("stream-")
@@ -2449,7 +2451,7 @@ class TestStreamingApiCall:
        ]
        agent.client.chat.completions.create.return_value = iter(chunks)

-        resp = agent._streaming_api_call({"messages": []}, MagicMock())
+        resp = agent._interruptible_streaming_api_call({"messages": []})

        assert resp.choices[0].message.content == "Hello"
        assert resp.model == "gpt-4"
@@ -2505,7 +2507,7 @@ class TestAnthropicInterruptHandler:
    def test_streaming_has_anthropic_branch(self):
        """_streaming_api_call must also handle Anthropic interrupt."""
        import inspect
-        source = inspect.getsource(AIAgent._streaming_api_call)
+        source = inspect.getsource(AIAgent._interruptible_streaming_api_call)
        assert "anthropic_messages" in source, \
            "_streaming_api_call must handle Anthropic interrupt"

@@ -26,6 +26,20 @@ def test_resolve_runtime_provider_codex(monkeypatch):
    assert resolved["requested_provider"] == "openai-codex"


+def test_resolve_runtime_provider_ai_gateway(monkeypatch):
+    monkeypatch.setattr(rp, "resolve_provider", lambda *a, **k: "ai-gateway")
+    monkeypatch.setattr(rp, "_get_model_config", lambda: {})
+    monkeypatch.setenv("AI_GATEWAY_API_KEY", "test-ai-gw-key")
+
+    resolved = rp.resolve_runtime_provider(requested="ai-gateway")
+
+    assert resolved["provider"] == "ai-gateway"
+    assert resolved["api_mode"] == "chat_completions"
+    assert resolved["base_url"] == "https://ai-gateway.vercel.sh/v1"
+    assert resolved["api_key"] == "test-ai-gw-key"
+    assert resolved["requested_provider"] == "ai-gateway"
+
+
 def test_resolve_runtime_provider_openrouter_explicit(monkeypatch):
    monkeypatch.setattr(rp, "resolve_provider", lambda *a, **k: "openrouter")
    monkeypatch.setattr(rp, "_get_model_config", lambda: {})
@@ -0,0 +1,571 @@
+"""Tests for streaming token delivery infrastructure.
+
+Tests the unified streaming API call, delta callbacks, tool-call
+suppression, provider fallback, and CLI streaming display.
+"""
+import json
+import threading
+import uuid
+from types import SimpleNamespace
+from unittest.mock import MagicMock, patch, PropertyMock
+
+import pytest
+
+
+# ── Helpers ──────────────────────────────────────────────────────────────
+
+
+def _make_stream_chunk(
+    content=None, tool_calls=None, finish_reason=None,
+    model=None, reasoning_content=None, usage=None,
+):
+    """Build a mock streaming chunk matching OpenAI's ChatCompletionChunk shape."""
+    delta = SimpleNamespace(
+        content=content,
+        tool_calls=tool_calls,
+        reasoning_content=reasoning_content,
+        reasoning=None,
+    )
+    choice = SimpleNamespace(
+        index=0,
+        delta=delta,
+        finish_reason=finish_reason,
+    )
+    chunk = SimpleNamespace(
+        choices=[choice],
+        model=model,
+        usage=usage,
+    )
+    return chunk
+
+
+def _make_tool_call_delta(index=0, tc_id=None, name=None, arguments=None):
+    """Build a mock tool call delta."""
+    func = SimpleNamespace(name=name, arguments=arguments)
+    return SimpleNamespace(index=index, id=tc_id, function=func)
+
+
+def _make_empty_chunk(model=None, usage=None):
+    """Build a chunk with no choices (usage-only final chunk)."""
+    return SimpleNamespace(choices=[], model=model, usage=usage)
+
+
+# ── Test: Streaming Accumulator ──────────────────────────────────────────
+
+
+class TestStreamingAccumulator:
+    """Verify that _interruptible_streaming_api_call accumulates content
+    and tool calls into a response matching the non-streaming shape."""
+
+    @patch("run_agent.AIAgent._create_request_openai_client")
+    @patch("run_agent.AIAgent._close_request_openai_client")
+    def test_text_only_response(self, mock_close, mock_create):
+        """Text-only stream produces correct response shape."""
+        from run_agent import AIAgent
+
+        chunks = [
+            _make_stream_chunk(content="Hello"),
+            _make_stream_chunk(content=" world"),
+            _make_stream_chunk(content="!", finish_reason="stop", model="test-model"),
+            _make_empty_chunk(usage=SimpleNamespace(prompt_tokens=10, completion_tokens=3)),
+        ]
+
+        mock_client = MagicMock()
+        mock_client.chat.completions.create.return_value = iter(chunks)
+        mock_create.return_value = mock_client
+
+        agent = AIAgent(
+            model="test/model",
+            quiet_mode=True,
+            skip_context_files=True,
+            skip_memory=True,
+        )
+        agent.api_mode = "chat_completions"
+        agent._interrupt_requested = False
+
+        response = agent._interruptible_streaming_api_call({})
+
+        assert response.choices[0].message.content == "Hello world!"
+        assert response.choices[0].message.tool_calls is None
+        assert response.choices[0].finish_reason == "stop"
+        assert response.usage is not None
+        assert response.usage.completion_tokens == 3
+
+    @patch("run_agent.AIAgent._create_request_openai_client")
+    @patch("run_agent.AIAgent._close_request_openai_client")
+    def test_tool_call_response(self, mock_close, mock_create):
+        """Tool call stream accumulates ID, name, and arguments."""
+        from run_agent import AIAgent
+
+        chunks = [
+            _make_stream_chunk(tool_calls=[
+                _make_tool_call_delta(index=0, tc_id="call_123", name="terminal")
+            ]),
+            _make_stream_chunk(tool_calls=[
+                _make_tool_call_delta(index=0, arguments='{"command":')
+            ]),
+            _make_stream_chunk(tool_calls=[
+                _make_tool_call_delta(index=0, arguments=' "ls"}')
+            ]),
+            _make_stream_chunk(finish_reason="tool_calls"),
+        ]
+
+        mock_client = MagicMock()
+        mock_client.chat.completions.create.return_value = iter(chunks)
+        mock_create.return_value = mock_client
+
+        agent = AIAgent(
+            model="test/model",
+            quiet_mode=True,
+            skip_context_files=True,
+            skip_memory=True,
+        )
+        agent.api_mode = "chat_completions"
+        agent._interrupt_requested = False
+
+        response = agent._interruptible_streaming_api_call({})
+
+        tc = response.choices[0].message.tool_calls
+        assert tc is not None
+        assert len(tc) == 1
+        assert tc[0].id == "call_123"
+        assert tc[0].function.name == "terminal"
+        assert tc[0].function.arguments == '{"command": "ls"}'
+
+    @patch("run_agent.AIAgent._create_request_openai_client")
+    @patch("run_agent.AIAgent._close_request_openai_client")
+    def test_mixed_content_and_tool_calls(self, mock_close, mock_create):
+        """Stream with both text and tool calls accumulates both."""
+        from run_agent import AIAgent
+
+        chunks = [
+            _make_stream_chunk(content="Let me check"),
+            _make_stream_chunk(tool_calls=[
+                _make_tool_call_delta(index=0, tc_id="call_456", name="web_search")
+            ]),
+            _make_stream_chunk(tool_calls=[
+                _make_tool_call_delta(index=0, arguments='{"query": "test"}')
+            ]),
+            _make_stream_chunk(finish_reason="tool_calls"),
+        ]
+
+        mock_client = MagicMock()
+        mock_client.chat.completions.create.return_value = iter(chunks)
+        mock_create.return_value = mock_client
+
+        agent = AIAgent(
+            model="test/model",
+            quiet_mode=True,
+            skip_context_files=True,
+            skip_memory=True,
+        )
+        agent.api_mode = "chat_completions"
+        agent._interrupt_requested = False
+
+        response = agent._interruptible_streaming_api_call({})
+
+        assert response.choices[0].message.content == "Let me check"
+        assert len(response.choices[0].message.tool_calls) == 1
+
+
+# ── Test: Streaming Callbacks ────────────────────────────────────────────
+
+
+class TestStreamingCallbacks:
+    """Verify that delta callbacks fire correctly."""
+
+    @patch("run_agent.AIAgent._create_request_openai_client")
+    @patch("run_agent.AIAgent._close_request_openai_client")
+    def test_deltas_fire_in_order(self, mock_close, mock_create):
+        """Callbacks receive text deltas in order."""
+        from run_agent import AIAgent
+
+        chunks = [
+            _make_stream_chunk(content="a"),
+            _make_stream_chunk(content="b"),
+            _make_stream_chunk(content="c"),
+            _make_stream_chunk(finish_reason="stop"),
+        ]
+
+        deltas = []
+
+        mock_client = MagicMock()
+        mock_client.chat.completions.create.return_value = iter(chunks)
+        mock_create.return_value = mock_client
+
+        agent = AIAgent(
+            model="test/model",
+            quiet_mode=True,
+            skip_context_files=True,
+            skip_memory=True,
+            stream_delta_callback=lambda t: deltas.append(t),
+        )
+        agent.api_mode = "chat_completions"
+        agent._interrupt_requested = False
+
+        agent._interruptible_streaming_api_call({})
+
+        assert deltas == ["a", "b", "c"]
+
+    @patch("run_agent.AIAgent._create_request_openai_client")
+    @patch("run_agent.AIAgent._close_request_openai_client")
+    def test_on_first_delta_fires_once(self, mock_close, mock_create):
+        """on_first_delta callback fires exactly once."""
+        from run_agent import AIAgent
+
+        chunks = [
+            _make_stream_chunk(content="a"),
+            _make_stream_chunk(content="b"),
+            _make_stream_chunk(finish_reason="stop"),
+        ]
+
+        first_delta_calls = []
+
+        mock_client = MagicMock()
+        mock_client.chat.completions.create.return_value = iter(chunks)
+        mock_create.return_value = mock_client
+
+        agent = AIAgent(
+            model="test/model",
+            quiet_mode=True,
+            skip_context_files=True,
+            skip_memory=True,
+        )
+        agent.api_mode = "chat_completions"
+        agent._interrupt_requested = False
+
+        agent._interruptible_streaming_api_call(
+            {}, on_first_delta=lambda: first_delta_calls.append(True)
+        )
+
+        assert len(first_delta_calls) == 1
+
+    @patch("run_agent.AIAgent._create_request_openai_client")
+    @patch("run_agent.AIAgent._close_request_openai_client")
+    def test_tool_only_does_not_fire_callback(self, mock_close, mock_create):
+        """Tool-call-only stream does not fire the delta callback."""
+        from run_agent import AIAgent
+
+        chunks = [
+            _make_stream_chunk(tool_calls=[
+                _make_tool_call_delta(index=0, tc_id="call_789", name="terminal")
+            ]),
+            _make_stream_chunk(tool_calls=[
+                _make_tool_call_delta(index=0, arguments='{"command": "ls"}')
+            ]),
+            _make_stream_chunk(finish_reason="tool_calls"),
+        ]
+
+        deltas = []
+
+        mock_client = MagicMock()
+        mock_client.chat.completions.create.return_value = iter(chunks)
+        mock_create.return_value = mock_client
+
+        agent = AIAgent(
+            model="test/model",
+            quiet_mode=True,
+            skip_context_files=True,
+            skip_memory=True,
+            stream_delta_callback=lambda t: deltas.append(t),
+        )
+        agent.api_mode = "chat_completions"
+        agent._interrupt_requested = False
+
+        agent._interruptible_streaming_api_call({})
+
+        assert deltas == []
+
+    @patch("run_agent.AIAgent._create_request_openai_client")
+    @patch("run_agent.AIAgent._close_request_openai_client")
+    def test_text_suppressed_when_tool_calls_present(self, mock_close, mock_create):
+        """Text deltas are suppressed when tool calls are also in the stream."""
+        from run_agent import AIAgent
+
+        chunks = [
+            _make_stream_chunk(content="thinking..."),
+            _make_stream_chunk(tool_calls=[
+                _make_tool_call_delta(index=0, tc_id="call_abc", name="read_file")
+            ]),
+            _make_stream_chunk(content=" more text"),
+            _make_stream_chunk(finish_reason="tool_calls"),
+        ]
+
+        deltas = []
+
+        mock_client = MagicMock()
+        mock_client.chat.completions.create.return_value = iter(chunks)
+        mock_create.return_value = mock_client
+
+        agent = AIAgent(
+            model="test/model",
+            quiet_mode=True,
+            skip_context_files=True,
+            skip_memory=True,
+            stream_delta_callback=lambda t: deltas.append(t),
+        )
+        agent.api_mode = "chat_completions"
+        agent._interrupt_requested = False
+
+        response = agent._interruptible_streaming_api_call({})
+
+        # Text before tool call IS fired (we don't know yet it will have tools)
+        assert "thinking..." in deltas
+        # Text after tool call is NOT fired
+        assert " more text" not in deltas
+        # But content is still accumulated in the response
+        assert response.choices[0].message.content == "thinking... more text"
+
+
+# ── Test: Streaming Fallback ────────────────────────────────────────────
+
+
+class TestStreamingFallback:
+    """Verify fallback to non-streaming on ANY streaming error."""
+
+    @patch("run_agent.AIAgent._interruptible_api_call")
+    @patch("run_agent.AIAgent._create_request_openai_client")
+    @patch("run_agent.AIAgent._close_request_openai_client")
+    def test_stream_error_falls_back(self, mock_close, mock_create, mock_non_stream):
+        """'not supported' error triggers fallback to non-streaming."""
+        from run_agent import AIAgent
+
+        mock_client = MagicMock()
+        mock_client.chat.completions.create.side_effect = Exception(
+            "Streaming is not supported for this model"
+        )
+        mock_create.return_value = mock_client
+
+        fallback_response = SimpleNamespace(
+            id="fallback",
+            model="test",
+            choices=[SimpleNamespace(
+                index=0,
+                message=SimpleNamespace(
+                    role="assistant",
+                    content="fallback response",
+                    tool_calls=None,
+                    reasoning_content=None,
+                ),
+                finish_reason="stop",
+            )],
+            usage=None,
+        )
+        mock_non_stream.return_value = fallback_response
+
+        agent = AIAgent(
+            model="test/model",
+            quiet_mode=True,
+            skip_context_files=True,
+            skip_memory=True,
+        )
+        agent.api_mode = "chat_completions"
+        agent._interrupt_requested = False
+
+        response = agent._interruptible_streaming_api_call({})
+
+        assert response.choices[0].message.content == "fallback response"
+        mock_non_stream.assert_called_once()
+
+    @patch("run_agent.AIAgent._interruptible_api_call")
+    @patch("run_agent.AIAgent._create_request_openai_client")
+    @patch("run_agent.AIAgent._close_request_openai_client")
+    def test_any_stream_error_falls_back(self, mock_close, mock_create, mock_non_stream):
+        """ANY streaming error triggers fallback — not just specific messages."""
+        from run_agent import AIAgent
+
+        mock_client = MagicMock()
+        mock_client.chat.completions.create.side_effect = Exception(
+            "Connection reset by peer"
+        )
+        mock_create.return_value = mock_client
+
+        fallback_response = SimpleNamespace(
+            id="fallback",
+            model="test",
+            choices=[SimpleNamespace(
+                index=0,
+                message=SimpleNamespace(
+                    role="assistant",
+                    content="fallback after connection error",
+                    tool_calls=None,
+                    reasoning_content=None,
+                ),
+                finish_reason="stop",
+            )],
+            usage=None,
+        )
+        mock_non_stream.return_value = fallback_response
+
+        agent = AIAgent(
+            model="test/model",
+            quiet_mode=True,
+            skip_context_files=True,
+            skip_memory=True,
+        )
+        agent.api_mode = "chat_completions"
+        agent._interrupt_requested = False
+
+        response = agent._interruptible_streaming_api_call({})
+
+        assert response.choices[0].message.content == "fallback after connection error"
+        mock_non_stream.assert_called_once()
+
+    @patch("run_agent.AIAgent._interruptible_api_call")
+    @patch("run_agent.AIAgent._create_request_openai_client")
+    @patch("run_agent.AIAgent._close_request_openai_client")
+    def test_fallback_error_propagates(self, mock_close, mock_create, mock_non_stream):
+        """When both streaming AND fallback fail, the fallback error propagates."""
+        from run_agent import AIAgent
+
+        mock_client = MagicMock()
+        mock_client.chat.completions.create.side_effect = Exception("stream broke")
+        mock_create.return_value = mock_client
+
+        mock_non_stream.side_effect = Exception("Rate limit exceeded")
+
+        agent = AIAgent(
+            model="test/model",
+            quiet_mode=True,
+            skip_context_files=True,
+            skip_memory=True,
+        )
+        agent.api_mode = "chat_completions"
+        agent._interrupt_requested = False
+
+        with pytest.raises(Exception, match="Rate limit exceeded"):
+            agent._interruptible_streaming_api_call({})
+
+
+# ── Test: Reasoning Streaming ────────────────────────────────────────────
+
+
+class TestReasoningStreaming:
+    """Verify reasoning content is accumulated and callback fires."""
+
+    @patch("run_agent.AIAgent._create_request_openai_client")
+    @patch("run_agent.AIAgent._close_request_openai_client")
+    def test_reasoning_callback_fires(self, mock_close, mock_create):
+        """Reasoning deltas fire the reasoning_callback."""
+        from run_agent import AIAgent
+
+        chunks = [
+            _make_stream_chunk(reasoning_content="Let me think"),
+            _make_stream_chunk(reasoning_content=" about this"),
+            _make_stream_chunk(content="The answer is 42"),
+            _make_stream_chunk(finish_reason="stop"),
+        ]
+
+        reasoning_deltas = []
+        text_deltas = []
+
+        mock_client = MagicMock()
+        mock_client.chat.completions.create.return_value = iter(chunks)
+        mock_create.return_value = mock_client
+
+        agent = AIAgent(
+            model="test/model",
+            quiet_mode=True,
+            skip_context_files=True,
+            skip_memory=True,
+            stream_delta_callback=lambda t: text_deltas.append(t),
+            reasoning_callback=lambda t: reasoning_deltas.append(t),
+        )
+        agent.api_mode = "chat_completions"
+        agent._interrupt_requested = False
+
+        response = agent._interruptible_streaming_api_call({})
+
+        assert reasoning_deltas == ["Let me think", " about this"]
+        assert text_deltas == ["The answer is 42"]
+        assert response.choices[0].message.reasoning_content == "Let me think about this"
+        assert response.choices[0].message.content == "The answer is 42"
+
+
+# ── Test: _has_stream_consumers ──────────────────────────────────────────
+
+
+class TestHasStreamConsumers:
+    """Verify _has_stream_consumers() detects registered callbacks."""
+
+    def test_no_consumers(self):
+        from run_agent import AIAgent
+        agent = AIAgent(
+            model="test/model",
+            quiet_mode=True,
+            skip_context_files=True,
+            skip_memory=True,
+        )
+        assert agent._has_stream_consumers() is False
+
+    def test_delta_callback_set(self):
+        from run_agent import AIAgent
+        agent = AIAgent(
+            model="test/model",
+            quiet_mode=True,
+            skip_context_files=True,
+            skip_memory=True,
+            stream_delta_callback=lambda t: None,
+        )
+        assert agent._has_stream_consumers() is True
+
+    def test_stream_callback_set(self):
+        from run_agent import AIAgent
+        agent = AIAgent(
+            model="test/model",
+            quiet_mode=True,
+            skip_context_files=True,
+            skip_memory=True,
+        )
+        agent._stream_callback = lambda t: None
+        assert agent._has_stream_consumers() is True
+
+
+# ── Test: Codex stream fires callbacks ────────────────────────────────
+
+
+class TestCodexStreamCallbacks:
+    """Verify _run_codex_stream fires delta callbacks."""
+
+    def test_codex_text_delta_fires_callback(self):
+        from run_agent import AIAgent
+
+        deltas = []
+
+        agent = AIAgent(
+            model="test/model",
+            quiet_mode=True,
+            skip_context_files=True,
+            skip_memory=True,
+            stream_delta_callback=lambda t: deltas.append(t),
+        )
+        agent.api_mode = "codex_responses"
+        agent._interrupt_requested = False
+
+        # Mock the stream context manager
+        mock_event_text = SimpleNamespace(
+            type="response.output_text.delta",
+            delta="Hello from Codex!",
+        )
+        mock_event_done = SimpleNamespace(
+            type="response.completed",
+            delta="",
+        )
+
+        mock_stream = MagicMock()
+        mock_stream.__enter__ = MagicMock(return_value=mock_stream)
+        mock_stream.__exit__ = MagicMock(return_value=False)
+        mock_stream.__iter__ = MagicMock(return_value=iter([mock_event_text, mock_event_done]))
+        mock_stream.get_final_response.return_value = SimpleNamespace(
+            output=[SimpleNamespace(
+                type="message",
+                content=[SimpleNamespace(type="output_text", text="Hello from Codex!")],
+            )],
+            status="completed",
+        )
+
+        mock_client = MagicMock()
+        mock_client.responses.stream.return_value = mock_stream
+
+        response = agent._run_codex_stream({}, client=mock_client)
+        assert "Hello from Codex!" in deltas
@@ -241,7 +241,7 @@ class TestCronTimezone:
        job = create_job(prompt="Test job", schedule="every 1h")
        jobs = load_jobs()
        # Force a naive (no timezone) past timestamp
-        naive_past = (datetime.now() - timedelta(minutes=5)).isoformat()
+        naive_past = (datetime.now() - timedelta(seconds=30)).isoformat()
        jobs[0]["next_run_at"] = naive_past
        save_jobs(jobs)

@@ -318,7 +318,7 @@ class TestCronTimezone:

        # Simulate a naive timestamp that was written by datetime.now() on a
        # system running in UTC+5:30 — 5 minutes in the past (local time)
-        naive_past = (datetime.now() - timedelta(minutes=5)).isoformat()
+        naive_past = (datetime.now() - timedelta(seconds=30)).isoformat()
        jobs[0]["next_run_at"] = naive_past
        save_jobs(jobs)

@@ -347,7 +347,7 @@ class TestCronTimezone:
        jobs = load_jobs()

        # Force a naive past timestamp (system-local wall time, 10 min ago)
-        naive_past = (datetime.now() - timedelta(minutes=10)).isoformat()
+        naive_past = (datetime.now() - timedelta(seconds=30)).isoformat()
        jobs[0]["next_run_at"] = naive_past
        save_jobs(jobs)

@@ -62,22 +62,44 @@ class TestScanCronPrompt:


 class TestCronjobRequirements:
-    def test_requires_crontab_binary_even_in_interactive_mode(self, monkeypatch):
+    def test_requires_no_crontab_binary(self, monkeypatch):
+        """Cron is internal (JSON-based scheduler), no system crontab needed."""
        monkeypatch.setenv("HERMES_INTERACTIVE", "1")
        monkeypatch.delenv("HERMES_GATEWAY_SESSION", raising=False)
        monkeypatch.delenv("HERMES_EXEC_ASK", raising=False)
-        monkeypatch.setattr("shutil.which", lambda name: None)
+        # Even with no crontab in PATH, the cronjob tool should be available
+        # because hermes uses an internal scheduler, not system crontab.
+        assert check_cronjob_requirements() is True

-        assert check_cronjob_requirements() is False
-
-    def test_accepts_interactive_mode_when_crontab_exists(self, monkeypatch):
+    def test_accepts_interactive_mode(self, monkeypatch):
        monkeypatch.setenv("HERMES_INTERACTIVE", "1")
        monkeypatch.delenv("HERMES_GATEWAY_SESSION", raising=False)
        monkeypatch.delenv("HERMES_EXEC_ASK", raising=False)
-        monkeypatch.setattr("shutil.which", lambda name: "/usr/bin/crontab")

        assert check_cronjob_requirements() is True

+    def test_accepts_gateway_session(self, monkeypatch):
+        monkeypatch.delenv("HERMES_INTERACTIVE", raising=False)
+        monkeypatch.setenv("HERMES_GATEWAY_SESSION", "1")
+        monkeypatch.delenv("HERMES_EXEC_ASK", raising=False)
+
+        assert check_cronjob_requirements() is True
+
+    def test_accepts_exec_ask(self, monkeypatch):
+        monkeypatch.delenv("HERMES_INTERACTIVE", raising=False)
+        monkeypatch.delenv("HERMES_GATEWAY_SESSION", raising=False)
+        monkeypatch.setenv("HERMES_EXEC_ASK", "1")
+
+        assert check_cronjob_requirements() is True
+
+    def test_rejects_when_no_session_env(self, monkeypatch):
+        """Without any session env vars, cronjob tool should not be available."""
+        monkeypatch.delenv("HERMES_INTERACTIVE", raising=False)
+        monkeypatch.delenv("HERMES_GATEWAY_SESSION", raising=False)
+        monkeypatch.delenv("HERMES_EXEC_ASK", raising=False)
+
+        assert check_cronjob_requirements() is False
+

 # =========================================================================
 # schedule_cronjob
@@ -9,7 +9,7 @@ from types import SimpleNamespace
 from unittest.mock import AsyncMock, MagicMock, patch

 from gateway.config import Platform
-from tools.send_message_tool import _send_telegram, send_message_tool
+from tools.send_message_tool import _send_telegram, _send_to_platform, send_message_tool


 def _run_async_immediately(coro):
@@ -25,8 +25,11 @@ def _make_config():


 def _install_telegram_mock(monkeypatch, bot):
-    telegram_mod = SimpleNamespace(Bot=lambda token: bot)
+    parse_mode = SimpleNamespace(MARKDOWN_V2="MarkdownV2")
+    constants_mod = SimpleNamespace(ParseMode=parse_mode)
+    telegram_mod = SimpleNamespace(Bot=lambda token: bot, constants=constants_mod)
    monkeypatch.setitem(sys.modules, "telegram", telegram_mod)
+    monkeypatch.setitem(sys.modules, "telegram.constants", constants_mod)


 class TestSendMessageTool:
@@ -342,3 +345,49 @@ class TestSendTelegramMediaDelivery:
        assert "error" in result
        assert "No deliverable text or media remained" in result["error"]
        bot.send_message.assert_not_awaited()
+
+
+# ---------------------------------------------------------------------------
+# Regression: long messages are chunked before platform dispatch
+# ---------------------------------------------------------------------------
+
+
+class TestSendToPlatformChunking:
+    def test_long_message_is_chunked(self):
+        """Messages exceeding the platform limit are split into multiple sends."""
+        send = AsyncMock(return_value={"success": True, "message_id": "1"})
+        long_msg = "word " * 1000  # ~5000 chars, well over Discord's 2000 limit
+        with patch("tools.send_message_tool._send_discord", send):
+            result = asyncio.run(
+                _send_to_platform(
+                    Platform.DISCORD,
+                    SimpleNamespace(enabled=True, token="tok", extra={}),
+                    "ch", long_msg,
+                )
+            )
+        assert result["success"] is True
+        assert send.await_count >= 3
+        for call in send.await_args_list:
+            assert len(call.args[2]) <= 2020  # each chunk fits the limit
+
+    def test_telegram_media_attaches_to_last_chunk(self):
+        """When chunked, media files are sent only with the last chunk."""
+        sent_calls = []
+
+        async def fake_send(token, chat_id, message, media_files=None, thread_id=None):
+            sent_calls.append(media_files or [])
+            return {"success": True, "platform": "telegram", "chat_id": chat_id, "message_id": str(len(sent_calls))}
+
+        long_msg = "word " * 2000  # ~10000 chars, well over 4096
+        media = [("/tmp/photo.png", False)]
+        with patch("tools.send_message_tool._send_telegram", fake_send):
+            asyncio.run(
+                _send_to_platform(
+                    Platform.TELEGRAM,
+                    SimpleNamespace(enabled=True, token="tok", extra={}),
+                    "123", long_msg, media_files=media,
+                )
+            )
+        assert len(sent_calls) >= 3
+        assert all(call == [] for call in sent_calls[:-1])
+        assert sent_calls[-1] == media
@@ -0,0 +1,77 @@
+"""Tests for Singularity/Apptainer preflight availability check.
+
+Verifies that a clear error is raised when neither apptainer nor
+singularity is installed, instead of a cryptic FileNotFoundError.
+
+See: https://github.com/NousResearch/hermes-agent/issues/1511
+"""
+
+import subprocess
+from unittest.mock import patch, MagicMock
+
+import pytest
+
+from tools.environments.singularity import (
+    _find_singularity_executable,
+    _ensure_singularity_available,
+)
+
+
+class TestFindSingularityExecutable:
+    """_find_singularity_executable resolution tests."""
+
+    def test_prefers_apptainer(self):
+        """When both are available, apptainer should be preferred."""
+        def which_both(name):
+            return f"/usr/bin/{name}" if name in ("apptainer", "singularity") else None
+
+        with patch("shutil.which", side_effect=which_both):
+            assert _find_singularity_executable() == "apptainer"
+
+    def test_falls_back_to_singularity(self):
+        """When only singularity is available, use it."""
+        def which_singularity_only(name):
+            return "/usr/bin/singularity" if name == "singularity" else None
+
+        with patch("shutil.which", side_effect=which_singularity_only):
+            assert _find_singularity_executable() == "singularity"
+
+    def test_raises_when_neither_found(self):
+        """Must raise RuntimeError with install instructions."""
+        with patch("shutil.which", return_value=None):
+            with pytest.raises(RuntimeError, match="Neither.*apptainer.*nor.*singularity"):
+                _find_singularity_executable()
+
+
+class TestEnsureSingularityAvailable:
+    """_ensure_singularity_available preflight tests."""
+
+    def test_returns_executable_on_success(self):
+        """Returns the executable name when version check passes."""
+        fake_result = MagicMock(returncode=0, stderr="")
+
+        with patch("shutil.which", side_effect=lambda n: "/usr/bin/apptainer" if n == "apptainer" else None), \
+             patch("subprocess.run", return_value=fake_result):
+            assert _ensure_singularity_available() == "apptainer"
+
+    def test_raises_on_version_failure(self):
+        """Raises RuntimeError when version command fails."""
+        fake_result = MagicMock(returncode=1, stderr="unknown flag")
+
+        with patch("shutil.which", side_effect=lambda n: "/usr/bin/apptainer" if n == "apptainer" else None), \
+             patch("subprocess.run", return_value=fake_result):
+            with pytest.raises(RuntimeError, match="version.*failed"):
+                _ensure_singularity_available()
+
+    def test_raises_on_timeout(self):
+        """Raises RuntimeError when version command times out."""
+        with patch("shutil.which", side_effect=lambda n: "/usr/bin/apptainer" if n == "apptainer" else None), \
+             patch("subprocess.run", side_effect=subprocess.TimeoutExpired("apptainer", 10)):
+            with pytest.raises(RuntimeError, match="timed out"):
+                _ensure_singularity_available()
+
+    def test_raises_when_not_installed(self):
+        """Raises RuntimeError when neither executable exists."""
+        with patch("shutil.which", return_value=None):
+            with pytest.raises(RuntimeError, match="Neither.*apptainer.*nor.*singularity"):
+                _ensure_singularity_available()
@@ -522,50 +522,59 @@ class TestCosignVerification:
        assert path is None
        assert reason == "cosign_verification_failed"

+    @patch("tools.tirith_security.tarfile.open")
+    @patch("tools.tirith_security._verify_checksum", return_value=True)
    @patch("tools.tirith_security.shutil.which", return_value=None)
    @patch("tools.tirith_security._download_file")
    @patch("tools.tirith_security._detect_target", return_value="aarch64-apple-darwin")
-    def test_install_aborts_when_cosign_missing(self, mock_target, mock_dl,
-                                                 mock_which):
-        """_install_tirith returns cosign_missing when cosign is not on PATH."""
+    def test_install_proceeds_without_cosign(self, mock_target, mock_dl,
+                                              mock_which, mock_checksum,
+                                              mock_tarfile):
+        """_install_tirith proceeds with SHA-256 only when cosign is not on PATH."""
        from tools.tirith_security import _install_tirith
+        mock_tar = MagicMock()
+        mock_tar.__enter__ = MagicMock(return_value=mock_tar)
+        mock_tar.__exit__ = MagicMock(return_value=False)
+        mock_tar.getmembers.return_value = []
+        mock_tarfile.return_value = mock_tar
+
        path, reason = _install_tirith()
+        # Reaches extraction (no binary in mock archive), but got past cosign
        assert path is None
-        assert reason == "cosign_missing"
-
-    @patch("tools.tirith_security.logger.debug")
-    @patch("tools.tirith_security.logger.warning")
-    @patch("tools.tirith_security.shutil.which", return_value=None)
-    @patch("tools.tirith_security._download_file")
-    @patch("tools.tirith_security._detect_target", return_value="aarch64-apple-darwin")
-    def test_install_quiet_mode_downgrades_cosign_missing_log(self, mock_target, mock_dl,
-                                                              mock_which, mock_warning,
-                                                              mock_debug):
-        """Startup prefetch should not surface cosign-missing as a warning."""
-        from tools.tirith_security import _install_tirith
-        path, reason = _install_tirith(log_failures=False)
-        assert path is None
-        assert reason == "cosign_missing"
-        mock_warning.assert_not_called()
-        mock_debug.assert_called()
+        assert reason == "binary_not_in_archive"
+        assert mock_checksum.called  # SHA-256 verification ran

+    @patch("tools.tirith_security.tarfile.open")
+    @patch("tools.tirith_security._verify_checksum", return_value=True)
    @patch("tools.tirith_security._verify_cosign", return_value=None)
    @patch("tools.tirith_security.shutil.which", return_value="/usr/local/bin/cosign")
    @patch("tools.tirith_security._download_file")
    @patch("tools.tirith_security._detect_target", return_value="aarch64-apple-darwin")
-    def test_install_aborts_when_cosign_exec_fails(self, mock_target, mock_dl,
-                                                     mock_which, mock_cosign):
-        """_install_tirith returns cosign_exec_failed when cosign exists but fails."""
+    def test_install_proceeds_when_cosign_exec_fails(self, mock_target, mock_dl,
+                                                       mock_which, mock_cosign,
+                                                       mock_checksum, mock_tarfile):
+        """_install_tirith falls back to SHA-256 when cosign exists but fails to execute."""
        from tools.tirith_security import _install_tirith
+        mock_tar = MagicMock()
+        mock_tar.__enter__ = MagicMock(return_value=mock_tar)
+        mock_tar.__exit__ = MagicMock(return_value=False)
+        mock_tar.getmembers.return_value = []
+        mock_tarfile.return_value = mock_tar
+
        path, reason = _install_tirith()
        assert path is None
-        assert reason == "cosign_exec_failed"
+        assert reason == "binary_not_in_archive"  # got past cosign
+        assert mock_checksum.called

+    @patch("tools.tirith_security.tarfile.open")
+    @patch("tools.tirith_security._verify_checksum", return_value=True)
+    @patch("tools.tirith_security.shutil.which", return_value="/usr/local/bin/cosign")
    @patch("tools.tirith_security._download_file")
    @patch("tools.tirith_security._detect_target", return_value="aarch64-apple-darwin")
-    def test_install_aborts_when_cosign_artifacts_missing(self, mock_target,
-                                                           mock_dl):
-        """_install_tirith returns None when .sig/.pem downloads fail (404)."""
+    def test_install_proceeds_when_cosign_artifacts_missing(self, mock_target,
+                                                              mock_dl, mock_which,
+                                                              mock_checksum, mock_tarfile):
+        """_install_tirith proceeds with SHA-256 when .sig/.pem downloads fail."""
        from tools.tirith_security import _install_tirith
        import urllib.request

@@ -574,10 +583,16 @@ class TestCosignVerification:
                raise urllib.request.URLError("404 Not Found")

        mock_dl.side_effect = _dl_side_effect
+        mock_tar = MagicMock()
+        mock_tar.__enter__ = MagicMock(return_value=mock_tar)
+        mock_tar.__exit__ = MagicMock(return_value=False)
+        mock_tar.getmembers.return_value = []
+        mock_tarfile.return_value = mock_tar

        path, reason = _install_tirith()
        assert path is None
-        assert reason == "cosign_artifacts_unavailable"
+        assert reason == "binary_not_in_archive"  # got past cosign
+        assert mock_checksum.called

    @patch("tools.tirith_security.tarfile.open")
    @patch("tools.tirith_security._verify_checksum", return_value=True)
@@ -4,6 +4,7 @@ This module is the single source of truth for the dangerous command system:
 - Pattern detection (DANGEROUS_PATTERNS, detect_dangerous_command)
 - Per-session approval state (thread-safe, keyed by session_key)
 - Approval prompting (CLI interactive + gateway async)
+- Smart approval via auxiliary LLM (auto-approve low-risk commands)
 - Permanent allowlist persistence (config.yaml)
 """

@@ -283,6 +284,68 @@ def prompt_dangerous_approval(command: str, description: str,
        sys.stdout.flush()


+def _get_approval_mode() -> str:
+    """Read the approval mode from config. Returns 'manual', 'smart', or 'off'."""
+    try:
+        from hermes_cli.config import load_config
+        config = load_config()
+        return config.get("approvals", {}).get("mode", "manual")
+    except Exception:
+        return "manual"
+
+
+def _smart_approve(command: str, description: str) -> str:
+    """Use the auxiliary LLM to assess risk and decide approval.
+
+    Returns 'approve' if the LLM determines the command is safe,
+    'deny' if genuinely dangerous, or 'escalate' if uncertain.
+
+    Inspired by OpenAI Codex's Smart Approvals guardian subagent
+    (openai/codex#13860).
+    """
+    try:
+        from agent.auxiliary_client import get_text_auxiliary_client, auxiliary_max_tokens_param
+
+        client, model = get_text_auxiliary_client(task="approval")
+        if not client or not model:
+            logger.debug("Smart approvals: no aux client available, escalating")
+            return "escalate"
+
+        prompt = f"""You are a security reviewer for an AI coding agent. A terminal command was flagged by pattern matching as potentially dangerous.
+
+Command: {command}
+Flagged reason: {description}
+
+Assess the ACTUAL risk of this command. Many flagged commands are false positives — for example, `python -c "print('hello')"` is flagged as "script execution via -c flag" but is completely harmless.
+
+Rules:
+- APPROVE if the command is clearly safe (benign script execution, safe file operations, development tools, package installs, git operations, etc.)
+- DENY if the command could genuinely damage the system (recursive delete of important paths, overwriting system files, fork bombs, wiping disks, dropping databases, etc.)
+- ESCALATE if you're uncertain
+
+Respond with exactly one word: APPROVE, DENY, or ESCALATE"""
+
+        response = client.chat.completions.create(
+            model=model,
+            messages=[{"role": "user", "content": prompt}],
+            **auxiliary_max_tokens_param(16),
+            temperature=0,
+        )
+
+        answer = (response.choices[0].message.content or "").strip().upper()
+
+        if "APPROVE" in answer:
+            return "approve"
+        elif "DENY" in answer:
+            return "deny"
+        else:
+            return "escalate"
+
+    except Exception as e:
+        logger.debug("Smart approvals: LLM call failed (%s), escalating", e)
+        return "escalate"
+
+
 def check_dangerous_command(command: str, env_type: str,
                            approval_callback=None) -> dict:
    """Check if a command is dangerous and handle approval.
@@ -372,8 +435,9 @@ def check_all_command_guards(command: str, env_type: str,
    if env_type in ("docker", "singularity", "modal", "daytona"):
        return {"approved": True, "message": None}

-    # --yolo: bypass all approval prompts and pre-exec guard checks
-    if os.getenv("HERMES_YOLO_MODE"):
+    # --yolo or approvals.mode=off: bypass all approval prompts
+    approval_mode = _get_approval_mode()
+    if os.getenv("HERMES_YOLO_MODE") or approval_mode == "off":
        return {"approved": True, "message": None}

    is_cli = os.getenv("HERMES_INTERACTIVE")
@@ -430,6 +494,31 @@ def check_all_command_guards(command: str, env_type: str,
    if not warnings:
        return {"approved": True, "message": None}

+    # --- Phase 2.5: Smart approval (auxiliary LLM risk assessment) ---
+    # When approvals.mode=smart, ask the aux LLM before prompting the user.
+    # Inspired by OpenAI Codex's Smart Approvals guardian subagent
+    # (openai/codex#13860).
+    if approval_mode == "smart":
+        combined_desc_for_llm = "; ".join(desc for _, desc, _ in warnings)
+        verdict = _smart_approve(command, combined_desc_for_llm)
+        if verdict == "approve":
+            # Auto-approve and grant session-level approval for these patterns
+            for key, _, _ in warnings:
+                approve_session(session_key, key)
+            logger.debug("Smart approval: auto-approved '%s' (%s)",
+                         command[:60], combined_desc_for_llm)
+            return {"approved": True, "message": None,
+                    "smart_approved": True}
+        elif verdict == "deny":
+            combined_desc_for_llm = "; ".join(desc for _, desc, _ in warnings)
+            return {
+                "approved": False,
+                "message": f"BLOCKED by smart approval: {combined_desc_for_llm}. "
+                           "The command was assessed as genuinely dangerous. Do NOT retry.",
+                "smart_denied": True,
+            }
+        # verdict == "escalate" → fall through to manual prompt
+
    # --- Phase 3: Approval ---

    # Combine descriptions for a single approval prompt
@@ -0,0 +1,10 @@
+"""Cloud browser provider abstraction.
+
+Import the ABC so callers can do::
+
+    from tools.browser_providers import CloudBrowserProvider
+"""
+
+from tools.browser_providers.base import CloudBrowserProvider
+
+__all__ = ["CloudBrowserProvider"]
@@ -0,0 +1,59 @@
+"""Abstract base class for cloud browser providers."""
+
+from abc import ABC, abstractmethod
+from typing import Dict
+
+
+class CloudBrowserProvider(ABC):
+    """Interface for cloud browser backends (Browserbase, Steel, etc.).
+
+    Implementations live in sibling modules and are registered in
+    ``browser_tool._PROVIDER_REGISTRY``.  The user selects a provider via
+    ``hermes setup`` / ``hermes tools``; the choice is persisted as
+    ``config["browser"]["cloud_provider"]``.
+    """
+
+    @abstractmethod
+    def provider_name(self) -> str:
+        """Short, human-readable name shown in logs and diagnostics."""
+
+    @abstractmethod
+    def is_configured(self) -> bool:
+        """Return True when all required env vars / credentials are present.
+
+        Called at tool-registration time (``check_browser_requirements``) to
+        gate availability.  Must be cheap — no network calls.
+        """
+
+    @abstractmethod
+    def create_session(self, task_id: str) -> Dict[str, object]:
+        """Create a cloud browser session and return session metadata.
+
+        Must return a dict with at least::
+
+            {
+                "session_name": str,   # unique name for agent-browser --session
+                "bb_session_id": str,  # provider session ID (for close/cleanup)
+                "cdp_url": str,        # CDP websocket URL
+                "features": dict,      # feature flags that were enabled
+            }
+
+        ``bb_session_id`` is a legacy key name kept for backward compat with
+        the rest of browser_tool.py — it holds the provider's session ID
+        regardless of which provider is in use.
+        """
+
+    @abstractmethod
+    def close_session(self, session_id: str) -> bool:
+        """Release / terminate a cloud session by its provider session ID.
+
+        Returns True on success, False on failure.  Should not raise.
+        """
+
+    @abstractmethod
+    def emergency_cleanup(self, session_id: str) -> None:
+        """Best-effort session teardown during process exit.
+
+        Called from atexit / signal handlers.  Must tolerate missing
+        credentials, network errors, etc. — log and move on.
+        """
@@ -0,0 +1,107 @@
+"""Browser Use cloud browser provider."""
+
+import logging
+import os
+import uuid
+from typing import Dict
+
+import requests
+
+from tools.browser_providers.base import CloudBrowserProvider
+
+logger = logging.getLogger(__name__)
+
+_BASE_URL = "https://api.browser-use.com/api/v2"
+
+
+class BrowserUseProvider(CloudBrowserProvider):
+    """Browser Use (https://browser-use.com) cloud browser backend."""
+
+    def provider_name(self) -> str:
+        return "Browser Use"
+
+    def is_configured(self) -> bool:
+        return bool(os.environ.get("BROWSER_USE_API_KEY"))
+
+    # ------------------------------------------------------------------
+    # Session lifecycle
+    # ------------------------------------------------------------------
+
+    def _headers(self) -> Dict[str, str]:
+        api_key = os.environ.get("BROWSER_USE_API_KEY")
+        if not api_key:
+            raise ValueError(
+                "BROWSER_USE_API_KEY environment variable is required. "
+                "Get your key at https://browser-use.com"
+            )
+        return {
+            "Content-Type": "application/json",
+            "X-Browser-Use-API-Key": api_key,
+        }
+
+    def create_session(self, task_id: str) -> Dict[str, object]:
+        response = requests.post(
+            f"{_BASE_URL}/browsers",
+            headers=self._headers(),
+            json={},
+            timeout=30,
+        )
+
+        if not response.ok:
+            raise RuntimeError(
+                f"Failed to create Browser Use session: "
+                f"{response.status_code} {response.text}"
+            )
+
+        session_data = response.json()
+        session_name = f"hermes_{task_id}_{uuid.uuid4().hex[:8]}"
+
+        logger.info("Created Browser Use session %s", session_name)
+
+        return {
+            "session_name": session_name,
+            "bb_session_id": session_data["id"],
+            "cdp_url": session_data["cdpUrl"],
+            "features": {"browser_use": True},
+        }
+
+    def close_session(self, session_id: str) -> bool:
+        try:
+            response = requests.patch(
+                f"{_BASE_URL}/browsers/{session_id}",
+                headers=self._headers(),
+                json={"action": "stop"},
+                timeout=10,
+            )
+            if response.status_code in (200, 201, 204):
+                logger.debug("Successfully closed Browser Use session %s", session_id)
+                return True
+            else:
+                logger.warning(
+                    "Failed to close Browser Use session %s: HTTP %s - %s",
+                    session_id,
+                    response.status_code,
+                    response.text[:200],
+                )
+                return False
+        except Exception as e:
+            logger.error("Exception closing Browser Use session %s: %s", session_id, e)
+            return False
+
+    def emergency_cleanup(self, session_id: str) -> None:
+        api_key = os.environ.get("BROWSER_USE_API_KEY")
+        if not api_key:
+            logger.warning("Cannot emergency-cleanup Browser Use session %s — missing credentials", session_id)
+            return
+        try:
+            requests.patch(
+                f"{_BASE_URL}/browsers/{session_id}",
+                headers={
+                    "Content-Type": "application/json",
+                    "X-Browser-Use-API-Key": api_key,
+                },
+                json={"action": "stop"},
+                timeout=5,
+            )
+        except Exception as e:
+            logger.debug("Emergency cleanup failed for Browser Use session %s: %s", session_id, e)
@@ -0,0 +1,206 @@
+"""Browserbase cloud browser provider."""
+
+import logging
+import os
+import uuid
+from typing import Dict
+
+import requests
+
+from tools.browser_providers.base import CloudBrowserProvider
+
+logger = logging.getLogger(__name__)
+
+
+class BrowserbaseProvider(CloudBrowserProvider):
+    """Browserbase (https://browserbase.com) cloud browser backend."""
+
+    def provider_name(self) -> str:
+        return "Browserbase"
+
+    def is_configured(self) -> bool:
+        return bool(
+            os.environ.get("BROWSERBASE_API_KEY")
+            and os.environ.get("BROWSERBASE_PROJECT_ID")
+        )
+
+    # ------------------------------------------------------------------
+    # Session lifecycle
+    # ------------------------------------------------------------------
+
+    def _get_config(self) -> Dict[str, str]:
+        api_key = os.environ.get("BROWSERBASE_API_KEY")
+        project_id = os.environ.get("BROWSERBASE_PROJECT_ID")
+        if not api_key or not project_id:
+            raise ValueError(
+                "BROWSERBASE_API_KEY and BROWSERBASE_PROJECT_ID environment "
+                "variables are required.  Get your credentials at "
+                "https://browserbase.com"
+            )
+        return {"api_key": api_key, "project_id": project_id}
+
+    def create_session(self, task_id: str) -> Dict[str, object]:
+        config = self._get_config()
+
+        # Optional env-var knobs
+        enable_proxies = os.environ.get("BROWSERBASE_PROXIES", "true").lower() != "false"
+        enable_advanced_stealth = os.environ.get("BROWSERBASE_ADVANCED_STEALTH", "false").lower() == "true"
+        enable_keep_alive = os.environ.get("BROWSERBASE_KEEP_ALIVE", "true").lower() != "false"
+        custom_timeout_ms = os.environ.get("BROWSERBASE_SESSION_TIMEOUT")
+
+        features_enabled = {
+            "basic_stealth": True,
+            "proxies": False,
+            "advanced_stealth": False,
+            "keep_alive": False,
+            "custom_timeout": False,
+        }
+
+        session_config: Dict[str, object] = {"projectId": config["project_id"]}
+
+        if enable_keep_alive:
+            session_config["keepAlive"] = True
+
+        if custom_timeout_ms:
+            try:
+                timeout_val = int(custom_timeout_ms)
+                if timeout_val > 0:
+                    session_config["timeout"] = timeout_val
+            except ValueError:
+                logger.warning("Invalid BROWSERBASE_SESSION_TIMEOUT value: %s", custom_timeout_ms)
+
+        if enable_proxies:
+            session_config["proxies"] = True
+
+        if enable_advanced_stealth:
+            session_config["browserSettings"] = {"advancedStealth": True}
+
+        # --- Create session via API ---
+        headers = {
+            "Content-Type": "application/json",
+            "X-BB-API-Key": config["api_key"],
+        }
+        response = requests.post(
+            "https://api.browserbase.com/v1/sessions",
+            headers=headers,
+            json=session_config,
+            timeout=30,
+        )
+
+        proxies_fallback = False
+        keepalive_fallback = False
+
+        # Handle 402 — paid features unavailable
+        if response.status_code == 402:
+            if enable_keep_alive:
+                keepalive_fallback = True
+                logger.warning(
+                    "keepAlive may require paid plan (402), retrying without it. "
+                    "Sessions may timeout during long operations."
+                )
+                session_config.pop("keepAlive", None)
+                response = requests.post(
+                    "https://api.browserbase.com/v1/sessions",
+                    headers=headers,
+                    json=session_config,
+                    timeout=30,
+                )
+
+            if response.status_code == 402 and enable_proxies:
+                proxies_fallback = True
+                logger.warning(
+                    "Proxies unavailable (402), retrying without proxies. "
+                    "Bot detection may be less effective."
+                )
+                session_config.pop("proxies", None)
+                response = requests.post(
+                    "https://api.browserbase.com/v1/sessions",
+                    headers=headers,
+                    json=session_config,
+                    timeout=30,
+                )
+
+        if not response.ok:
+            raise RuntimeError(
+                f"Failed to create Browserbase session: "
+                f"{response.status_code} {response.text}"
+            )
+
+        session_data = response.json()
+        session_name = f"hermes_{task_id}_{uuid.uuid4().hex[:8]}"
+
+        if enable_proxies and not proxies_fallback:
+            features_enabled["proxies"] = True
+        if enable_advanced_stealth:
+            features_enabled["advanced_stealth"] = True
+        if enable_keep_alive and not keepalive_fallback:
+            features_enabled["keep_alive"] = True
+        if custom_timeout_ms and "timeout" in session_config:
+            features_enabled["custom_timeout"] = True
+
+        feature_str = ", ".join(k for k, v in features_enabled.items() if v)
+        logger.info("Created Browserbase session %s with features: %s", session_name, feature_str)
+
+        return {
+            "session_name": session_name,
+            "bb_session_id": session_data["id"],
+            "cdp_url": session_data["connectUrl"],
+            "features": features_enabled,
+        }
+
+    def close_session(self, session_id: str) -> bool:
+        try:
+            config = self._get_config()
+        except ValueError:
+            logger.warning("Cannot close Browserbase session %s — missing credentials", session_id)
+            return False
+
+        try:
+            response = requests.post(
+                f"https://api.browserbase.com/v1/sessions/{session_id}",
+                headers={
+                    "X-BB-API-Key": config["api_key"],
+                    "Content-Type": "application/json",
+                },
+                json={
+                    "projectId": config["project_id"],
+                    "status": "REQUEST_RELEASE",
+                },
+                timeout=10,
+            )
+            if response.status_code in (200, 201, 204):
+                logger.debug("Successfully closed Browserbase session %s", session_id)
+                return True
+            else:
+                logger.warning(
+                    "Failed to close session %s: HTTP %s - %s",
+                    session_id,
+                    response.status_code,
+                    response.text[:200],
+                )
+                return False
+        except Exception as e:
+            logger.error("Exception closing Browserbase session %s: %s", session_id, e)
+            return False
+
+    def emergency_cleanup(self, session_id: str) -> None:
+        api_key = os.environ.get("BROWSERBASE_API_KEY")
+        project_id = os.environ.get("BROWSERBASE_PROJECT_ID")
+        if not api_key or not project_id:
+            logger.warning("Cannot emergency-cleanup Browserbase session %s — missing credentials", session_id)
+            return
+        try:
+            requests.post(
+                f"https://api.browserbase.com/v1/sessions/{session_id}",
+                headers={
+                    "X-BB-API-Key": api_key,
+                    "Content-Type": "application/json",
+                },
+                json={
+                    "projectId": project_id,
+                    "status": "REQUEST_RELEASE",
+                },
+                timeout=5,
+            )
+        except Exception as e:
+            logger.debug("Emergency cleanup failed for Browserbase session %s: %s", session_id, e)
@@ -65,6 +65,9 @@ import requests
 from typing import Dict, Any, Optional, List
 from pathlib import Path
 from agent.auxiliary_client import call_llm
+from tools.browser_providers.base import CloudBrowserProvider
+from tools.browser_providers.browserbase import BrowserbaseProvider
+from tools.browser_providers.browser_use import BrowserUseProvider

 logger = logging.getLogger(__name__)

@@ -98,14 +101,53 @@ def _get_extraction_model() -> Optional[str]:
    return os.getenv("AUXILIARY_WEB_EXTRACT_MODEL", "").strip() or None


-def _is_local_mode() -> bool:
-    """Return True when no Browserbase credentials are configured.
+def _get_cdp_override() -> str:
+    """Return a user-supplied CDP URL override, or empty string.

-    In local mode the browser tools launch a headless Chromium instance via
-    ``agent-browser --session`` instead of connecting to a remote Browserbase
-    session via ``--cdp``.
+    When ``BROWSER_CDP_URL`` is set (e.g. via ``/browser connect``), we skip
+    both Browserbase and the local headless launcher and connect directly to
+    the supplied Chrome DevTools Protocol endpoint.
    """
-    return not (os.environ.get("BROWSERBASE_API_KEY") and os.environ.get("BROWSERBASE_PROJECT_ID"))
+    return os.environ.get("BROWSER_CDP_URL", "").strip()
+
+
+# ============================================================================
+# Cloud Provider Registry
+# ============================================================================
+
+_PROVIDER_REGISTRY: Dict[str, type] = {
+    "browserbase": BrowserbaseProvider,
+    "browser-use": BrowserUseProvider,
+}
+
+_cached_cloud_provider: Optional[CloudBrowserProvider] = None
+_cloud_provider_resolved = False
+
+
+def _get_cloud_provider() -> Optional[CloudBrowserProvider]:
+    """Return the configured cloud browser provider, or None for local mode.
+
+    Reads ``config["browser"]["cloud_provider"]`` once and caches the result
+    for the process lifetime.  If unset → local mode (None).
+    """
+    global _cached_cloud_provider, _cloud_provider_resolved
+    if _cloud_provider_resolved:
+        return _cached_cloud_provider
+
+    _cloud_provider_resolved = True
+    try:
+        hermes_home = Path(os.environ.get("HERMES_HOME", Path.home() / ".hermes"))
+        config_path = hermes_home / "config.yaml"
+        if config_path.exists():
+            import yaml
+            with open(config_path) as f:
+                cfg = yaml.safe_load(f) or {}
+            provider_key = cfg.get("browser", {}).get("cloud_provider")
+            if provider_key and provider_key in _PROVIDER_REGISTRY:
+                _cached_cloud_provider = _PROVIDER_REGISTRY[provider_key]()
+    except Exception as e:
+        logger.debug("Could not read cloud_provider from config: %s", e)
+    return _cached_cloud_provider


 def _socket_safe_tmpdir() -> str:
@@ -440,161 +482,6 @@ BROWSER_TOOL_SCHEMAS = [
 # Utility Functions
 # ============================================================================

-def _create_browserbase_session(task_id: str) -> Dict[str, str]:
-    """
-    Create a Browserbase session with stealth features.
-    
-    Browserbase Stealth Modes:
-    - Basic Stealth: ALWAYS enabled automatically. Generates random fingerprints,
-      viewports, and solves visual CAPTCHAs. No configuration needed.
-    - Advanced Stealth: Uses custom Chromium build for better bot detection avoidance.
-      Requires Scale Plan. Enable via BROWSERBASE_ADVANCED_STEALTH=true.
-    
-    Proxies are enabled by default to route traffic through residential IPs,
-    which significantly improves CAPTCHA solving rates. Can be disabled via
-    BROWSERBASE_PROXIES=false if needed.
-    
-    Args:
-        task_id: Unique identifier for the task
-        
-    Returns:
-        Dict with session_name, bb_session_id, cdp_url, and feature flags
-    """
-    import uuid
-    import sys
-    
-    config = _get_browserbase_config()
-    
-    # Check for optional settings from environment
-    # Proxies: enabled by default for better CAPTCHA solving
-    enable_proxies = os.environ.get("BROWSERBASE_PROXIES", "true").lower() != "false"
-    # Advanced Stealth: requires Scale Plan, disabled by default
-    enable_advanced_stealth = os.environ.get("BROWSERBASE_ADVANCED_STEALTH", "false").lower() == "true"
-    # keepAlive: enabled by default (requires paid plan) - allows reconnection after disconnects
-    enable_keep_alive = os.environ.get("BROWSERBASE_KEEP_ALIVE", "true").lower() != "false"
-    # Custom session timeout in milliseconds (optional) - extends session beyond project default
-    custom_timeout_ms = os.environ.get("BROWSERBASE_SESSION_TIMEOUT")
-    
-    # Track which features are actually enabled for logging/debugging
-    features_enabled = {
-        "basic_stealth": True,  # Always on
-        "proxies": False,
-        "advanced_stealth": False,
-        "keep_alive": False,
-        "custom_timeout": False,
-    }
-    
-    # Build session configuration
-    # Note: Basic stealth mode is ALWAYS active - no configuration needed
-    session_config = {
-        "projectId": config["project_id"],
-    }
-    
-    # Enable keepAlive for session reconnection (default: true, requires paid plan)
-    # Allows reconnecting to the same session after network hiccups
-    if enable_keep_alive:
-        session_config["keepAlive"] = True
-    
-    # Add custom timeout if specified (in milliseconds)
-    # This extends session duration beyond project's default timeout
-    if custom_timeout_ms:
-        try:
-            timeout_val = int(custom_timeout_ms)
-            if timeout_val > 0:
-                session_config["timeout"] = timeout_val
-        except ValueError:
-            logger.warning("Invalid BROWSERBASE_SESSION_TIMEOUT value: %s", custom_timeout_ms)
-    
-    # Enable proxies for better CAPTCHA solving (default: true)
-    # Routes traffic through residential IPs for more reliable access
-    if enable_proxies:
-        session_config["proxies"] = True
-    
-    # Add advanced stealth if enabled (requires Scale Plan)
-    # Uses custom Chromium build to avoid bot detection altogether
-    if enable_advanced_stealth:
-        session_config["browserSettings"] = {
-            "advancedStealth": True,
-        }
-    
-    # Create session via Browserbase API
-    response = requests.post(
-        "https://api.browserbase.com/v1/sessions",
-        headers={
-            "Content-Type": "application/json",
-            "X-BB-API-Key": config["api_key"],
-        },
-        json=session_config,
-        timeout=30
-    )
-    
-    # Track if we fell back from paid features
-    proxies_fallback = False
-    keepalive_fallback = False
-    
-    # Handle 402 Payment Required - likely paid features not available
-    # Try to identify which feature caused the issue and retry without it
-    if response.status_code == 402:
-        # First try without keepAlive (most likely culprit for paid plan requirement)
-        if enable_keep_alive:
-            keepalive_fallback = True
-            logger.warning("keepAlive may require paid plan (402), retrying without it. "
-                          "Sessions may timeout during long operations.")
-            session_config.pop("keepAlive", None)
-            response = requests.post(
-                "https://api.browserbase.com/v1/sessions",
-                headers={
-                    "Content-Type": "application/json",
-                    "X-BB-API-Key": config["api_key"],
-                },
-                json=session_config,
-                timeout=30
-            )
-        
-        # If still 402, try without proxies too
-        if response.status_code == 402 and enable_proxies:
-            proxies_fallback = True
-            logger.warning("Proxies unavailable (402), retrying without proxies. "
-                          "Bot detection may be less effective.")
-            session_config.pop("proxies", None)
-            response = requests.post(
-                "https://api.browserbase.com/v1/sessions",
-                headers={
-                    "Content-Type": "application/json",
-                    "X-BB-API-Key": config["api_key"],
-                },
-                json=session_config,
-                timeout=30
-            )
-    
-    if not response.ok:
-        raise RuntimeError(f"Failed to create Browserbase session: {response.status_code} {response.text}")
-    
-    session_data = response.json()
-    session_name = f"hermes_{task_id}_{uuid.uuid4().hex[:8]}"
-    
-    # Update features based on what actually succeeded
-    if enable_proxies and not proxies_fallback:
-        features_enabled["proxies"] = True
-    if enable_advanced_stealth:
-        features_enabled["advanced_stealth"] = True
-    if enable_keep_alive and not keepalive_fallback:
-        features_enabled["keep_alive"] = True
-    if custom_timeout_ms and "timeout" in session_config:
-        features_enabled["custom_timeout"] = True
-    
-    # Log session info for debugging
-    feature_str = ", ".join(k for k, v in features_enabled.items() if v)
-    logger.info("Created session %s with features: %s", session_name, feature_str)
-    
-    return {
-        "session_name": session_name,
-        "bb_session_id": session_data["id"],
-        "cdp_url": session_data["connectUrl"],
-        "features": features_enabled,
-    }
-
-
 def _create_local_session(task_id: str) -> Dict[str, str]:
    import uuid
    session_name = f"h_{uuid.uuid4().hex[:10]}"
@@ -608,6 +495,20 @@ def _create_local_session(task_id: str) -> Dict[str, str]:
    }


+def _create_cdp_session(task_id: str, cdp_url: str) -> Dict[str, str]:
+    """Create a session that connects to a user-supplied CDP endpoint."""
+    import uuid
+    session_name = f"cdp_{uuid.uuid4().hex[:10]}"
+    logger.info("Created CDP browser session %s → %s for task %s",
+                session_name, cdp_url, task_id)
+    return {
+        "session_name": session_name,
+        "bb_session_id": None,
+        "cdp_url": cdp_url,
+        "features": {"cdp_override": True},
+    }
+
+
 def _get_session_info(task_id: Optional[str] = None) -> Dict[str, str]:
    """
    Get or create session info for the given task.
@@ -638,10 +539,15 @@ def _get_session_info(task_id: Optional[str] = None) -> Dict[str, str]:
            return _active_sessions[task_id]
    
    # Create session outside the lock (network call in cloud mode)
-    if _is_local_mode():
-        session_info = _create_local_session(task_id)
+    cdp_override = _get_cdp_override()
+    if cdp_override:
+        session_info = _create_cdp_session(task_id, cdp_override)
    else:
-        session_info = _create_browserbase_session(task_id)
+        provider = _get_cloud_provider()
+        if provider is None:
+            session_info = _create_local_session(task_id)
+        else:
+            session_info = provider.create_session(task_id)
    
    with _cleanup_lock:
        _active_sessions[task_id] = session_info
@@ -663,31 +569,6 @@ def _get_session_name(task_id: Optional[str] = None) -> str:
    return session_info["session_name"]


-def _get_browserbase_config() -> Dict[str, str]:
-    """
-    Get Browserbase configuration from environment.
-    
-    Returns:
-        Dict with api_key and project_id
-        
-    Raises:
-        ValueError: If required env vars are not set
-    """
-    api_key = os.environ.get("BROWSERBASE_API_KEY")
-    project_id = os.environ.get("BROWSERBASE_PROJECT_ID")
-    
-    if not api_key or not project_id:
-        raise ValueError(
-            "BROWSERBASE_API_KEY and BROWSERBASE_PROJECT_ID environment variables are required. "
-            "Get your credentials at https://browserbase.com"
-        )
-    
-    return {
-        "api_key": api_key,
-        "project_id": project_id
-    }
-
-
 def _find_agent_browser() -> str:
    """
    Find the agent-browser CLI executable.
@@ -830,27 +711,62 @@ def _run_browser_command(
        browser_env["PATH"] = ":".join(path_parts)
        browser_env["AGENT_BROWSER_SOCKET_DIR"] = task_socket_dir
        
-        result = subprocess.run(
-            cmd_parts,
-            capture_output=True,
-            text=True,
-            timeout=timeout,
-            env=browser_env,
-        )
-        
+        # Use temp files for stdout/stderr instead of pipes.
+        # agent-browser starts a background daemon that inherits file
+        # descriptors.  With capture_output=True (pipes), the daemon keeps
+        # the pipe fds open after the CLI exits, so communicate() never
+        # sees EOF and blocks until the timeout fires.
+        stdout_path = os.path.join(task_socket_dir, f"_stdout_{command}")
+        stderr_path = os.path.join(task_socket_dir, f"_stderr_{command}")
+        stdout_fd = os.open(stdout_path, os.O_WRONLY | os.O_CREAT | os.O_TRUNC, 0o600)
+        stderr_fd = os.open(stderr_path, os.O_WRONLY | os.O_CREAT | os.O_TRUNC, 0o600)
+        try:
+            proc = subprocess.Popen(
+                cmd_parts,
+                stdout=stdout_fd,
+                stderr=stderr_fd,
+                stdin=subprocess.DEVNULL,
+                env=browser_env,
+            )
+        finally:
+            os.close(stdout_fd)
+            os.close(stderr_fd)
+
+        try:
+            proc.wait(timeout=timeout)
+        except subprocess.TimeoutExpired:
+            proc.kill()
+            proc.wait()
+            logger.warning("browser '%s' timed out after %ds (task=%s, socket_dir=%s)",
+                           command, timeout, task_id, task_socket_dir)
+            return {"success": False, "error": f"Command timed out after {timeout} seconds"}
+
+        with open(stdout_path, "r") as f:
+            stdout = f.read()
+        with open(stderr_path, "r") as f:
+            stderr = f.read()
+        returncode = proc.returncode
+
+        # Clean up temp files (best-effort)
+        for p in (stdout_path, stderr_path):
+            try:
+                os.unlink(p)
+            except OSError:
+                pass
+
        # Log stderr for diagnostics — use warning level on failure so it's visible
-        if result.stderr and result.stderr.strip():
-            level = logging.WARNING if result.returncode != 0 else logging.DEBUG
-            logger.log(level, "browser '%s' stderr: %s", command, result.stderr.strip()[:500])
+        if stderr and stderr.strip():
+            level = logging.WARNING if returncode != 0 else logging.DEBUG
+            logger.log(level, "browser '%s' stderr: %s", command, stderr.strip()[:500])
        
        # Log empty output as warning — common sign of broken agent-browser
-        if not result.stdout.strip() and result.returncode == 0:
+        if not stdout.strip() and returncode == 0:
            logger.warning("browser '%s' returned empty stdout with rc=0. "
                           "cmd=%s stderr=%s",
                           command, " ".join(cmd_parts[:4]) + "...",
-                           (result.stderr or "")[:200])
+                           (stderr or "")[:200])

-        stdout_text = result.stdout.strip()
+        stdout_text = stdout.strip()

        if stdout_text:
            try:
@@ -861,15 +777,15 @@ def _run_browser_command(
                    if not snap_data.get("snapshot") and not snap_data.get("refs"):
                        logger.warning("snapshot returned empty content. "
                                       "Possible stale daemon or CDP connection issue. "
-                                       "returncode=%s", result.returncode)
+                                       "returncode=%s", returncode)
                return parsed
            except json.JSONDecodeError:
                raw = stdout_text[:2000]
                logger.warning("browser '%s' returned non-JSON output (rc=%s): %s",
-                               command, result.returncode, raw[:500])
+                               command, returncode, raw[:500])

                if command == "screenshot":
-                    stderr_text = (result.stderr or "").strip()
+                    stderr_text = (stderr or "").strip()
                    combined_text = "\n".join(
                        part for part in [stdout_text, stderr_text] if part
                    )
@@ -894,17 +810,13 @@ def _run_browser_command(
                }
        
        # Check for errors
-        if result.returncode != 0:
-            error_msg = result.stderr.strip() if result.stderr else f"Command failed with code {result.returncode}"
-            logger.warning("browser '%s' failed (rc=%s): %s", command, result.returncode, error_msg[:300])
+        if returncode != 0:
+            error_msg = stderr.strip() if stderr else f"Command failed with code {returncode}"
+            logger.warning("browser '%s' failed (rc=%s): %s", command, returncode, error_msg[:300])
            return {"success": False, "error": error_msg}
        
        return {"success": True, "data": {}}
        
-    except subprocess.TimeoutExpired:
-        logger.warning("browser '%s' timed out after %ds (task=%s, socket_dir=%s)",
-                       command, timeout, task_id, task_socket_dir)
-        return {"success": False, "error": f"Command timed out after {timeout} seconds"}
    except Exception as e:
        logger.warning("browser '%s' exception: %s", command, e, exc_info=True)
        return {"success": False, "error": str(e)}
@@ -1480,7 +1392,8 @@ def browser_vision(question: str, annotate: bool = False, task_id: Optional[str]
        
        if not result.get("success"):
            error_detail = result.get("error", "Unknown error")
-            mode = "local" if _is_local_mode() else "cloud"
+            _cp = _get_cloud_provider()
+            mode = "local" if _cp is None else f"cloud ({_cp.provider_name()})"
            return json.dumps({
                "success": False,
                "error": f"Failed to take screenshot ({mode} mode): {error_detail}"
@@ -1492,7 +1405,8 @@ def browser_vision(question: str, annotate: bool = False, task_id: Optional[str]

        # Check if screenshot file was created
        if not screenshot_path.exists():
-            mode = "local" if _is_local_mode() else "cloud"
+            _cp = _get_cloud_provider()
+            mode = "local" if _cp is None else f"cloud ({_cp.provider_name()})"
            return json.dumps({
                "success": False,
                "error": (
@@ -1610,48 +1524,6 @@ def _cleanup_old_recordings(max_age_hours=72):
 # Cleanup and Management Functions
 # ============================================================================

-def _close_browserbase_session(session_id: str, api_key: str, project_id: str) -> bool:
-    """
-    Close a Browserbase session immediately via the API.
-    
-    Uses POST /v1/sessions/{id} with status=REQUEST_RELEASE to immediately
-    terminate the session without waiting for keepAlive timeout.
-    
-    Args:
-        session_id: The Browserbase session ID
-        api_key: Browserbase API key
-        project_id: Browserbase project ID
-        
-    Returns:
-        True if session was successfully closed, False otherwise
-    """
-    try:
-        # POST to update session status to REQUEST_RELEASE
-        response = requests.post(
-            f"https://api.browserbase.com/v1/sessions/{session_id}",
-            headers={
-                "X-BB-API-Key": api_key,
-                "Content-Type": "application/json"
-            },
-            json={
-                "projectId": project_id,
-                "status": "REQUEST_RELEASE"
-            },
-            timeout=10
-        )
-        
-        if response.status_code in (200, 201, 204):
-            logger.debug("Successfully closed BrowserBase session %s", session_id)
-            return True
-        else:
-            logger.warning("Failed to close session %s: HTTP %s - %s", session_id, response.status_code, response.text[:200])
-            return False
-                
-    except Exception as e:
-        logger.error("Exception closing session %s: %s", session_id, e)
-        return False
-
-
 def cleanup_browser(task_id: Optional[str] = None) -> None:
    """
    Clean up browser session for a task.
@@ -1692,15 +1564,14 @@ def cleanup_browser(task_id: Optional[str] = None) -> None:
            _active_sessions.pop(task_id, None)
            _session_last_activity.pop(task_id, None)
        
-        # Cloud mode: close the Browserbase session via API
-        if bb_session_id and not _is_local_mode():
-            try:
-                config = _get_browserbase_config()
-                success = _close_browserbase_session(bb_session_id, config["api_key"], config["project_id"])
-                if not success:
-                    logger.warning("Could not close BrowserBase session %s", bb_session_id)
-            except Exception as e:
-                logger.error("Exception during BrowserBase session close: %s", e)
+        # Cloud mode: close the cloud browser session via provider API
+        if bb_session_id:
+            provider = _get_cloud_provider()
+            if provider is not None:
+                try:
+                    provider.close_session(bb_session_id)
+                except Exception as e:
+                    logger.warning("Could not close cloud browser session: %s", e)
        
        # Kill the daemon process and clean up socket directory
        session_name = session_info.get("session_name", "")
@@ -1769,12 +1640,10 @@ def check_browser_requirements() -> bool:
    except FileNotFoundError:
        return False

-    # In cloud mode, also require Browserbase credentials
-    if not _is_local_mode():
-        api_key = os.environ.get("BROWSERBASE_API_KEY")
-        project_id = os.environ.get("BROWSERBASE_PROJECT_ID")
-        if not api_key or not project_id:
-            return False
+    # In cloud mode, also require provider credentials
+    provider = _get_cloud_provider()
+    if provider is not None and not provider.is_configured():
+        return False

    return True

@@ -1790,7 +1659,8 @@ if __name__ == "__main__":
    print("🌐 Browser Tool Module")
    print("=" * 40)

-    mode = "local" if _is_local_mode() else "cloud (Browserbase)"
+    _cp = _get_cloud_provider()
+    mode = "local" if _cp is None else f"cloud ({_cp.provider_name()})"
    print(f"   Mode: {mode}")
    
    # Check requirements
@@ -1803,12 +1673,9 @@ if __name__ == "__main__":
        except FileNotFoundError:
            print("   - agent-browser CLI not found")
            print("     Install: npm install -g agent-browser && agent-browser install --with-deps")
-        if not _is_local_mode():
-            if not os.environ.get("BROWSERBASE_API_KEY"):
-                print("   - BROWSERBASE_API_KEY not set (required for cloud mode)")
-            if not os.environ.get("BROWSERBASE_PROJECT_ID"):
-                print("   - BROWSERBASE_PROJECT_ID not set (required for cloud mode)")
-            print("   Tip: unset BROWSERBASE_API_KEY to use free local mode instead")
+        if _cp is not None and not _cp.is_configured():
+            print(f"   - {_cp.provider_name()} credentials not configured")
+            print("   Tip: remove cloud_provider from config to use free local mode instead")
    
    print("\n📋 Available Browser Tools:")
    for schema in BROWSER_TOOL_SCHEMAS:
@@ -395,6 +395,7 @@ def execute_code(
    tool_call_log: list = []
    tool_call_counter = [0]  # mutable so the RPC thread can increment
    exec_start = time.monotonic()
+    server_sock = None

    try:
        # Write the auto-generated hermes_tools module
@@ -598,7 +599,14 @@ def execute_code(

    except Exception as exc:
        duration = round(time.monotonic() - exec_start, 2)
-        logging.exception("execute_code failed")
+        logger.error(
+            "execute_code failed after %ss with %d tool calls: %s: %s",
+            duration,
+            tool_call_counter[0],
+            type(exc).__name__,
+            exc,
+            exc_info=True,
+        )
        return json.dumps({
            "status": "error",
            "error": str(exc),
@@ -608,19 +616,17 @@ def execute_code(

    finally:
        # Cleanup temp dir and socket
-        try:
-            server_sock.close()
-        except Exception as e:
-            logger.debug("Server socket close error: %s", e)
-        try:
-            import shutil
-            shutil.rmtree(tmpdir, ignore_errors=True)
-        except Exception as e:
-            logger.debug("Could not clean temp dir: %s", e, exc_info=True)
+        if server_sock is not None:
+            try:
+                server_sock.close()
+            except OSError as e:
+                logger.debug("Server socket close error: %s", e)
+        import shutil
+        shutil.rmtree(tmpdir, ignore_errors=True)
        try:
            os.unlink(sock_path)
-        except OSError as e:
-            logger.debug("Could not remove socket file: %s", e, exc_info=True)
+        except OSError:
+            pass  # already cleaned up or never created


 def _kill_process_group(proc, escalate: bool = False):
@@ -8,7 +8,6 @@ Compatibility wrappers remain for direct Python callers and legacy tests.
 import json
 import os
 import re
-import shutil
 import sys
 from pathlib import Path
 from typing import Any, Dict, List, Optional
@@ -414,13 +413,10 @@ def check_cronjob_requirements() -> bool:
    """
    Check if cronjob tools can be used.

-    Requires 'crontab' executable to be present in the system PATH.
    Available in interactive CLI mode and gateway/messaging platforms.
+    The cron system is internal (JSON file-based scheduler ticked by the gateway),
+    so no external crontab executable is required.
    """
-    # Ensure the system can actually install and manage cron entries.
-    if not shutil.which("crontab"):
-        return False
-
    return bool(
        os.getenv("HERMES_INTERACTIVE")
        or os.getenv("HERMES_GATEWAY_SESSION")
@@ -25,6 +25,57 @@ logger = logging.getLogger(__name__)
 _SNAPSHOT_STORE = get_hermes_home() / "singularity_snapshots.json"


+def _find_singularity_executable() -> str:
+    """Locate the apptainer or singularity CLI binary.
+
+    Returns the executable name (``"apptainer"`` or ``"singularity"``).
+    Raises ``RuntimeError`` with install instructions if neither is found.
+    """
+    if shutil.which("apptainer"):
+        return "apptainer"
+    if shutil.which("singularity"):
+        return "singularity"
+    raise RuntimeError(
+        "Neither 'apptainer' nor 'singularity' was found in PATH. "
+        "Install Apptainer (https://apptainer.org/docs/admin/main/installation.html) "
+        "or Singularity and ensure the CLI is available."
+    )
+
+
+def _ensure_singularity_available() -> str:
+    """Preflight check: resolve the executable and verify it responds.
+
+    Returns the executable name on success.
+    Raises ``RuntimeError`` with an actionable message on failure.
+    """
+    exe = _find_singularity_executable()
+
+    try:
+        result = subprocess.run(
+            [exe, "version"],
+            capture_output=True,
+            text=True,
+            timeout=10,
+        )
+    except FileNotFoundError:
+        raise RuntimeError(
+            f"Singularity backend selected but the resolved executable '{exe}' "
+            "could not be executed. Check your installation."
+        )
+    except subprocess.TimeoutExpired:
+        raise RuntimeError(
+            f"'{exe} version' timed out. The runtime may be misconfigured."
+        )
+
+    if result.returncode != 0:
+        stderr = result.stderr.strip()[:200]
+        raise RuntimeError(
+            f"'{exe} version' failed (exit code {result.returncode}): {stderr}"
+        )
+
+    return exe
+
+
 def _load_snapshots() -> Dict[str, str]:
    if _SNAPSHOT_STORE.exists():
        try:
@@ -169,7 +220,7 @@ class SingularityEnvironment(BaseEnvironment):
        task_id: str = "default",
    ):
        super().__init__(cwd=cwd, timeout=timeout)
-        self.executable = "apptainer" if shutil.which("apptainer") else "singularity"
+        self.executable = _ensure_singularity_available()
        self.image = _get_or_build_sif(image, self.executable)
        self.instance_id = f"hermes_{uuid.uuid4().hex[:12]}"
        self._instance_started = False
--- a/Show More
+++ b/Show More