feat: /model command for mid-chat model switching

Rebuilt from scratch with aggregator-aware resolution that fixes the core problem: bare model names on OpenRouter were getting hijacked to wrong providers (e.g. 'claude-sonnet-4' → opencode-zen instead of staying on OpenRouter as 'anthropic/claude-sonnet-4.6'). Resolution chain (on aggregators like OpenRouter/Nous): 1. Alias table: sonnet → anthropic/claude-sonnet-4.6 2. Vendor:model syntax: openai:gpt-5.4 → openai/gpt-5.4 (stays on OpenRouter, doesn't switch to a nonexistent 'openai' provider) 3. Aggregator-first resolution: a. Exact match on full OpenRouter slug b. Exact match on bare name (gpt-5.4 → openai/gpt-5.4) c. Vendor/model format passthrough (openai/gpt-5.4 → accepted) d. Vendor prefix construction (claude-sonnet-4 → try anthropic/claude-sonnet-4 → fuzzy match to 4.6) e. Fuzzy match on bare names (claude-sonet-4.6 → sonnet-4.6) 4. Only if all aggregator resolution fails → cross-provider detection This prevents detect_provider_for_model() from switching providers when the user just wanted a different model on the same aggregator. opencode-zen's massive static catalog no longer hijacks Claude models. AIAgent.switch_model() does in-place model swap following the existing _try_activate_fallback() pattern: updates primary runtime, invalidates system prompt, rebuilds client for cross-api-mode switches, re-evaluates prompt caching and context compressor. Works across CLI, Telegram, Discord, Slack, Matrix, WhatsApp, Signal. Gateway running-agent guard rejects /model while agent is active. 46 tests. E2E verified with real imports (not mocks) that: - claude-sonnet-4 → anthropic/claude-sonnet-4.6 (was opencode-zen!) - openai:gpt-5.4 → openai/gpt-5.4 on OpenRouter (was parse failure!) - claude-sonet-4.6 (typo) → anthropic/claude-sonnet-4.6 (fuzzy match) - anthropic:claude-opus-4 → switches to native Anthropic provider
fix: regenerate uv.lock to sync with pyproject.toml v0.7.0
2026-04-03 19:03:09 -07:00 · 2026-04-03 12:52:09 -07:00 · 2026-04-03 12:11:29 -07:00 · 2026-04-03 11:14:55 -07:00 · 2026-04-03 10:31:53 -07:00 · 2026-04-03 10:31:11 -07:00
56 changed files with 5682 additions and 446 deletions
@@ -0,0 +1,290 @@
+# Hermes Agent v0.7.0 (v2026.4.3)
+
+**Release Date:** April 3, 2026
+
+> The resilience release — pluggable memory providers, credential pool rotation, Camofox anti-detection browser, inline diff previews, gateway hardening across race conditions and approval routing, and deep security fixes across 168 PRs and 46 resolved issues.
+
+---
+
+## ✨ Highlights
+
+- **Pluggable Memory Provider Interface** — Memory is now an extensible plugin system. Third-party memory backends (Honcho, vector stores, custom DBs) implement a simple provider ABC and register via the plugin system. Built-in memory is the default provider. Honcho integration restored to full parity as the reference plugin with profile-scoped host/peer resolution. ([#4623](https://github.com/NousResearch/hermes-agent/pull/4623), [#4616](https://github.com/NousResearch/hermes-agent/pull/4616), [#4355](https://github.com/NousResearch/hermes-agent/pull/4355))
+
+- **Same-Provider Credential Pools** — Configure multiple API keys for the same provider with automatic rotation. Thread-safe `least_used` strategy distributes load across keys, and 401 failures trigger automatic rotation to the next credential. Set up via the setup wizard or `credential_pool` config. ([#4188](https://github.com/NousResearch/hermes-agent/pull/4188), [#4300](https://github.com/NousResearch/hermes-agent/pull/4300), [#4361](https://github.com/NousResearch/hermes-agent/pull/4361))
+
+- **Camofox Anti-Detection Browser Backend** — New local browser backend using Camoufox for stealth browsing. Persistent sessions with VNC URL discovery for visual debugging, configurable SSRF bypass for local backends, auto-install via `hermes tools`. ([#4008](https://github.com/NousResearch/hermes-agent/pull/4008), [#4419](https://github.com/NousResearch/hermes-agent/pull/4419), [#4292](https://github.com/NousResearch/hermes-agent/pull/4292))
+
+- **Inline Diff Previews** — File write and patch operations now show inline diffs in the tool activity feed, giving you visual confirmation of what changed before the agent moves on. ([#4411](https://github.com/NousResearch/hermes-agent/pull/4411), [#4423](https://github.com/NousResearch/hermes-agent/pull/4423))
+
+- **API Server Session Continuity & Tool Streaming** — The API server (Open WebUI integration) now streams tool progress events in real-time and supports `X-Hermes-Session-Id` headers for persistent sessions across requests. Sessions persist to the shared SessionDB. ([#4092](https://github.com/NousResearch/hermes-agent/pull/4092), [#4478](https://github.com/NousResearch/hermes-agent/pull/4478), [#4802](https://github.com/NousResearch/hermes-agent/pull/4802))
+
+- **ACP: Client-Provided MCP Servers** — Editor integrations (VS Code, Zed, JetBrains) can now register their own MCP servers, which Hermes picks up as additional agent tools. Your editor's MCP ecosystem flows directly into the agent. ([#4705](https://github.com/NousResearch/hermes-agent/pull/4705))
+
+- **Gateway Hardening** — Major stability pass across race conditions, photo media delivery, flood control, stuck sessions, approval routing, and compression death spirals. The gateway is substantially more reliable in production. ([#4727](https://github.com/NousResearch/hermes-agent/pull/4727), [#4750](https://github.com/NousResearch/hermes-agent/pull/4750), [#4798](https://github.com/NousResearch/hermes-agent/pull/4798), [#4557](https://github.com/NousResearch/hermes-agent/pull/4557))
+
+- **Security: Secret Exfiltration Blocking** — Browser URLs and LLM responses are now scanned for secret patterns, blocking exfiltration attempts via URL encoding, base64, or prompt injection. Credential directory protections expanded to `.docker`, `.azure`, `.config/gh`. Execute_code sandbox output is redacted. ([#4483](https://github.com/NousResearch/hermes-agent/pull/4483), [#4360](https://github.com/NousResearch/hermes-agent/pull/4360), [#4305](https://github.com/NousResearch/hermes-agent/pull/4305), [#4327](https://github.com/NousResearch/hermes-agent/pull/4327))
+
+---
+
+## 🏗️ Core Agent & Architecture
+
+### Provider & Model Support
+- **Same-provider credential pools** — configure multiple API keys with automatic `least_used` rotation and 401 failover ([#4188](https://github.com/NousResearch/hermes-agent/pull/4188), [#4300](https://github.com/NousResearch/hermes-agent/pull/4300))
+- **Credential pool preserved through smart routing** — pool state survives fallback provider switches and defers eager fallback on 429 ([#4361](https://github.com/NousResearch/hermes-agent/pull/4361))
+- **Per-turn primary runtime restoration** — after fallback provider use, the agent automatically restores the primary provider on the next turn with transport recovery ([#4624](https://github.com/NousResearch/hermes-agent/pull/4624))
+- **`developer` role for GPT-5 and Codex models** — uses OpenAI's recommended system message role for newer models ([#4498](https://github.com/NousResearch/hermes-agent/pull/4498))
+- **Google model operational guidance** — Gemini and Gemma models get provider-specific prompting guidance ([#4641](https://github.com/NousResearch/hermes-agent/pull/4641))
+- **Anthropic long-context tier 429 handling** — automatically reduces context to 200k when hitting tier limits ([#4747](https://github.com/NousResearch/hermes-agent/pull/4747))
+- **URL-based auth for third-party Anthropic endpoints** + CI test fixes ([#4148](https://github.com/NousResearch/hermes-agent/pull/4148))
+- **Bearer auth for MiniMax Anthropic endpoints** ([#4028](https://github.com/NousResearch/hermes-agent/pull/4028))
+- **Fireworks context length detection** ([#4158](https://github.com/NousResearch/hermes-agent/pull/4158))
+- **Standard DashScope international endpoint** for Alibaba provider ([#4133](https://github.com/NousResearch/hermes-agent/pull/4133), closes [#3912](https://github.com/NousResearch/hermes-agent/issues/3912))
+- **Custom providers context_length** honored in hygiene compression ([#4085](https://github.com/NousResearch/hermes-agent/pull/4085))
+- **Non-sk-ant keys** treated as regular API keys, not OAuth tokens ([#4093](https://github.com/NousResearch/hermes-agent/pull/4093))
+- **Claude-sonnet-4.6** added to OpenRouter and Nous model lists ([#4157](https://github.com/NousResearch/hermes-agent/pull/4157))
+- **Qwen 3.6 Plus Preview** added to model lists ([#4376](https://github.com/NousResearch/hermes-agent/pull/4376))
+- **MiniMax M2.7** added to hermes model picker and OpenCode ([#4208](https://github.com/NousResearch/hermes-agent/pull/4208))
+- **Auto-detect models from server probe** in custom endpoint setup ([#4218](https://github.com/NousResearch/hermes-agent/pull/4218))
+- **Config.yaml single source of truth** for endpoint URLs — no more env var vs config.yaml conflicts ([#4165](https://github.com/NousResearch/hermes-agent/pull/4165))
+- **Setup wizard no longer overwrites** custom endpoint config ([#4180](https://github.com/NousResearch/hermes-agent/pull/4180), closes [#4172](https://github.com/NousResearch/hermes-agent/issues/4172))
+- **Unified setup wizard provider selection** with `hermes model` — single code path for both flows ([#4200](https://github.com/NousResearch/hermes-agent/pull/4200))
+- **Root-level provider config** no longer overrides `model.provider` ([#4329](https://github.com/NousResearch/hermes-agent/pull/4329))
+- **Rate-limit pairing rejection messages** to prevent spam ([#4081](https://github.com/NousResearch/hermes-agent/pull/4081))
+
+### Agent Loop & Conversation
+- **Preserve Anthropic thinking block signatures** across tool-use turns ([#4626](https://github.com/NousResearch/hermes-agent/pull/4626))
+- **Classify think-only empty responses** before retrying — prevents infinite retry loops on models that produce thinking blocks without content ([#4645](https://github.com/NousResearch/hermes-agent/pull/4645))
+- **Prevent compression death spiral** from API disconnects — stops the loop where compression triggers, fails, compresses again ([#4750](https://github.com/NousResearch/hermes-agent/pull/4750), closes [#2153](https://github.com/NousResearch/hermes-agent/issues/2153))
+- **Persist compressed context** to gateway session after mid-run compression ([#4095](https://github.com/NousResearch/hermes-agent/pull/4095))
+- **Context-exceeded error messages** now include actionable guidance ([#4155](https://github.com/NousResearch/hermes-agent/pull/4155), closes [#4061](https://github.com/NousResearch/hermes-agent/issues/4061))
+- **Strip orphaned think/reasoning tags** from user-facing responses ([#4311](https://github.com/NousResearch/hermes-agent/pull/4311), closes [#4285](https://github.com/NousResearch/hermes-agent/issues/4285))
+- **Harden Codex responses preflight** and stream error handling ([#4313](https://github.com/NousResearch/hermes-agent/pull/4313))
+- **Deterministic call_id fallbacks** instead of random UUIDs for prompt cache consistency ([#3991](https://github.com/NousResearch/hermes-agent/pull/3991))
+- **Context pressure warning spam** prevented after compression ([#4012](https://github.com/NousResearch/hermes-agent/pull/4012))
+- **AsyncOpenAI created lazily** in trajectory compressor to avoid closed event loop errors ([#4013](https://github.com/NousResearch/hermes-agent/pull/4013))
+
+### Memory & Sessions
+- **Pluggable memory provider interface** — ABC-based plugin system for custom memory backends with profile isolation ([#4623](https://github.com/NousResearch/hermes-agent/pull/4623))
+- **Honcho full integration parity** restored as reference memory provider plugin ([#4355](https://github.com/NousResearch/hermes-agent/pull/4355)) — @erosika
+- **Honcho profile-scoped** host and peer resolution ([#4616](https://github.com/NousResearch/hermes-agent/pull/4616))
+- **Memory flush state persisted** to prevent redundant re-flushes on gateway restart ([#4481](https://github.com/NousResearch/hermes-agent/pull/4481))
+- **Memory provider tools** routed through sequential execution path ([#4803](https://github.com/NousResearch/hermes-agent/pull/4803))
+- **Honcho config** written to instance-local path for profile isolation ([#4037](https://github.com/NousResearch/hermes-agent/pull/4037))
+- **API server sessions** persist to shared SessionDB ([#4802](https://github.com/NousResearch/hermes-agent/pull/4802))
+- **Token usage persisted** for non-CLI sessions ([#4627](https://github.com/NousResearch/hermes-agent/pull/4627))
+- **Quote dotted terms in FTS5 queries** — fixes session search for terms containing dots ([#4549](https://github.com/NousResearch/hermes-agent/pull/4549))
+
+---
+
+## 📱 Messaging Platforms (Gateway)
+
+### Gateway Core
+- **Race condition fixes** — photo media loss, flood control, stuck sessions, and STT config issues resolved in one hardening pass ([#4727](https://github.com/NousResearch/hermes-agent/pull/4727))
+- **Approval routing through running-agent guard** — `/approve` and `/deny` now route correctly when the agent is blocked waiting for approval instead of being swallowed as interrupts ([#4798](https://github.com/NousResearch/hermes-agent/pull/4798), [#4557](https://github.com/NousResearch/hermes-agent/pull/4557), closes [#4542](https://github.com/NousResearch/hermes-agent/issues/4542))
+- **Resume agent after /approve** — tool result is no longer lost when executing blocked commands ([#4418](https://github.com/NousResearch/hermes-agent/pull/4418))
+- **DM thread sessions seeded** with parent transcript to preserve context ([#4559](https://github.com/NousResearch/hermes-agent/pull/4559))
+- **Skill-aware slash commands** — gateway dynamically registers installed skills as slash commands with paginated `/commands` list and Telegram 100-command cap ([#3934](https://github.com/NousResearch/hermes-agent/pull/3934), [#4005](https://github.com/NousResearch/hermes-agent/pull/4005), [#4006](https://github.com/NousResearch/hermes-agent/pull/4006), [#4010](https://github.com/NousResearch/hermes-agent/pull/4010), [#4023](https://github.com/NousResearch/hermes-agent/pull/4023))
+- **Per-platform disabled skills** respected in Telegram menu and gateway dispatch ([#4799](https://github.com/NousResearch/hermes-agent/pull/4799))
+- **Remove user-facing compression warnings** — cleaner message flow ([#4139](https://github.com/NousResearch/hermes-agent/pull/4139))
+- **`-v/-q` flags wired to stderr logging** for gateway service ([#4474](https://github.com/NousResearch/hermes-agent/pull/4474))
+- **HERMES_HOME remapped** to target user in system service unit ([#4456](https://github.com/NousResearch/hermes-agent/pull/4456))
+- **Honor default for invalid bool-like config values** ([#4029](https://github.com/NousResearch/hermes-agent/pull/4029))
+- **setsid instead of systemd-run** for `/update` command to avoid systemd permission issues ([#4104](https://github.com/NousResearch/hermes-agent/pull/4104), closes [#4017](https://github.com/NousResearch/hermes-agent/issues/4017))
+- **'Initializing agent...'** shown on first message for better UX ([#4086](https://github.com/NousResearch/hermes-agent/pull/4086))
+- **Allow running gateway service as root** for LXC/container environments ([#4732](https://github.com/NousResearch/hermes-agent/pull/4732))
+
+### Telegram
+- **32-char limit on command names** with collision avoidance ([#4211](https://github.com/NousResearch/hermes-agent/pull/4211))
+- **Priority order enforced** in menu — core > plugins > skills ([#4023](https://github.com/NousResearch/hermes-agent/pull/4023))
+- **Capped at 50 commands** — API rejects above ~60 ([#4006](https://github.com/NousResearch/hermes-agent/pull/4006))
+- **Skip empty/whitespace text** to prevent 400 errors ([#4388](https://github.com/NousResearch/hermes-agent/pull/4388))
+- **E2E gateway tests** added ([#4497](https://github.com/NousResearch/hermes-agent/pull/4497)) — @pefontana
+
+### Discord
+- **Button-based approval UI** — register `/approve` and `/deny` slash commands with interactive button prompts ([#4800](https://github.com/NousResearch/hermes-agent/pull/4800))
+- **Configurable reactions** — `discord.reactions` config option to disable message processing reactions ([#4199](https://github.com/NousResearch/hermes-agent/pull/4199))
+- **Skip reactions and auto-threading** for unauthorized users ([#4387](https://github.com/NousResearch/hermes-agent/pull/4387))
+
+### Slack
+- **Reply in thread** — `slack.reply_in_thread` config option for threaded responses ([#4643](https://github.com/NousResearch/hermes-agent/pull/4643), closes [#2662](https://github.com/NousResearch/hermes-agent/issues/2662))
+
+### WhatsApp
+- **Enforce require_mention in group chats** ([#4730](https://github.com/NousResearch/hermes-agent/pull/4730))
+
+### Webhook
+- **Platform support fixes** — skip home channel prompt, disable tool progress for webhook adapters ([#4660](https://github.com/NousResearch/hermes-agent/pull/4660))
+
+### Matrix
+- **E2EE decryption hardening** — request missing keys, auto-trust devices, retry buffered events ([#4083](https://github.com/NousResearch/hermes-agent/pull/4083))
+
+---
+
+## 🖥️ CLI & User Experience
+
+### New Slash Commands
+- **`/yolo`** — toggle dangerous command approvals on/off for the session ([#3990](https://github.com/NousResearch/hermes-agent/pull/3990))
+- **`/btw`** — ephemeral side questions that don't affect the main conversation context ([#4161](https://github.com/NousResearch/hermes-agent/pull/4161))
+- **`/profile`** — show active profile info without leaving the chat session ([#4027](https://github.com/NousResearch/hermes-agent/pull/4027))
+
+### Interactive CLI
+- **Inline diff previews** for write and patch operations in the tool activity feed ([#4411](https://github.com/NousResearch/hermes-agent/pull/4411), [#4423](https://github.com/NousResearch/hermes-agent/pull/4423))
+- **TUI pinned to bottom** on startup — no more large blank spaces between response and input ([#4412](https://github.com/NousResearch/hermes-agent/pull/4412), [#4359](https://github.com/NousResearch/hermes-agent/pull/4359), closes [#4398](https://github.com/NousResearch/hermes-agent/issues/4398), [#4421](https://github.com/NousResearch/hermes-agent/issues/4421))
+- **`/history` and `/resume`** now surface recent sessions directly instead of requiring search ([#4728](https://github.com/NousResearch/hermes-agent/pull/4728))
+- **Cache tokens shown** in `/insights` overview so total adds up ([#4428](https://github.com/NousResearch/hermes-agent/pull/4428))
+- **`--max-turns` CLI flag** for `hermes chat` to limit agent iterations ([#4314](https://github.com/NousResearch/hermes-agent/pull/4314))
+- **Detect dragged file paths** instead of treating them as slash commands ([#4533](https://github.com/NousResearch/hermes-agent/pull/4533)) — @rolme
+- **Allow empty strings and falsy values** in `config set` ([#4310](https://github.com/NousResearch/hermes-agent/pull/4310), closes [#4277](https://github.com/NousResearch/hermes-agent/issues/4277))
+- **Voice mode in WSL** when PulseAudio bridge is configured ([#4317](https://github.com/NousResearch/hermes-agent/pull/4317))
+- **Respect `NO_COLOR` env var** and `TERM=dumb` for accessibility ([#4079](https://github.com/NousResearch/hermes-agent/pull/4079), closes [#4066](https://github.com/NousResearch/hermes-agent/issues/4066)) — @SHL0MS
+- **Correct shell reload instruction** for macOS/zsh users ([#4025](https://github.com/NousResearch/hermes-agent/pull/4025))
+- **Zero exit code** on successful quiet mode queries ([#4613](https://github.com/NousResearch/hermes-agent/pull/4613), closes [#4601](https://github.com/NousResearch/hermes-agent/issues/4601)) — @devorun
+- **on_session_end hook fires** on interrupted exits ([#4159](https://github.com/NousResearch/hermes-agent/pull/4159))
+- **Profile list display** reads `model.default` key correctly ([#4160](https://github.com/NousResearch/hermes-agent/pull/4160))
+- **Browser and TTS** shown in reconfigure menu ([#4041](https://github.com/NousResearch/hermes-agent/pull/4041))
+- **Web backend priority** detection simplified ([#4036](https://github.com/NousResearch/hermes-agent/pull/4036))
+
+### Setup & Configuration
+- **Allowed_users preserved** during setup and quiet unconfigured provider warnings ([#4551](https://github.com/NousResearch/hermes-agent/pull/4551)) — @kshitijk4poor
+- **Save API key to model config** for custom endpoints ([#4202](https://github.com/NousResearch/hermes-agent/pull/4202), closes [#4182](https://github.com/NousResearch/hermes-agent/issues/4182))
+- **Claude Code credentials gated** behind explicit Hermes config in wizard trigger ([#4210](https://github.com/NousResearch/hermes-agent/pull/4210))
+- **Atomic writes in save_config_value** to prevent config loss on interrupt ([#4298](https://github.com/NousResearch/hermes-agent/pull/4298), [#4320](https://github.com/NousResearch/hermes-agent/pull/4320))
+- **Scopes field written** to Claude Code credentials on token refresh ([#4126](https://github.com/NousResearch/hermes-agent/pull/4126))
+
+### Update System
+- **Fork detection and upstream sync** in `hermes update` ([#4744](https://github.com/NousResearch/hermes-agent/pull/4744))
+- **Preserve working optional extras** when one extra fails during update ([#4550](https://github.com/NousResearch/hermes-agent/pull/4550))
+- **Handle conflicted git index** during hermes update ([#4735](https://github.com/NousResearch/hermes-agent/pull/4735))
+- **Avoid launchd restart race** on macOS ([#4736](https://github.com/NousResearch/hermes-agent/pull/4736))
+- **Missing subprocess.run() timeouts** added to doctor and status commands ([#4009](https://github.com/NousResearch/hermes-agent/pull/4009))
+
+---
+
+## 🔧 Tool System
+
+### Browser
+- **Camofox anti-detection browser backend** — local stealth browsing with auto-install via `hermes tools` ([#4008](https://github.com/NousResearch/hermes-agent/pull/4008))
+- **Persistent Camofox sessions** with VNC URL discovery for visual debugging ([#4419](https://github.com/NousResearch/hermes-agent/pull/4419))
+- **Skip SSRF check for local backends** (Camofox, headless Chromium) ([#4292](https://github.com/NousResearch/hermes-agent/pull/4292))
+- **Configurable SSRF check** via `browser.allow_private_urls` ([#4198](https://github.com/NousResearch/hermes-agent/pull/4198)) — @nils010485
+- **CAMOFOX_PORT=9377** added to Docker commands ([#4340](https://github.com/NousResearch/hermes-agent/pull/4340))
+
+### File Operations
+- **Inline diff previews** on write and patch actions ([#4411](https://github.com/NousResearch/hermes-agent/pull/4411), [#4423](https://github.com/NousResearch/hermes-agent/pull/4423))
+- **Stale file detection** on write and patch — warns when file was modified externally since last read ([#4345](https://github.com/NousResearch/hermes-agent/pull/4345))
+- **Staleness timestamp refreshed** after writes ([#4390](https://github.com/NousResearch/hermes-agent/pull/4390))
+- **Size guard, dedup, and device blocking** on read_file ([#4315](https://github.com/NousResearch/hermes-agent/pull/4315))
+
+### MCP
+- **Stability fix pack** — reload timeout, shutdown cleanup, event loop handler, OAuth non-blocking ([#4757](https://github.com/NousResearch/hermes-agent/pull/4757), closes [#4462](https://github.com/NousResearch/hermes-agent/issues/4462), [#2537](https://github.com/NousResearch/hermes-agent/issues/2537))
+
+### ACP (Editor Integration)
+- **Client-provided MCP servers** registered as agent tools — editors pass their MCP servers to Hermes ([#4705](https://github.com/NousResearch/hermes-agent/pull/4705))
+
+### Skills System
+- **Size limits for agent writes** and **fuzzy matching for skill patch** — prevents oversized skill writes and improves edit reliability ([#4414](https://github.com/NousResearch/hermes-agent/pull/4414))
+- **Validate hub bundle paths** before install — blocks path traversal in skill bundles ([#3986](https://github.com/NousResearch/hermes-agent/pull/3986))
+- **Unified hermes-agent and hermes-agent-setup** into single skill ([#4332](https://github.com/NousResearch/hermes-agent/pull/4332))
+- **Skill metadata type check** in extract_skill_conditions ([#4479](https://github.com/NousResearch/hermes-agent/pull/4479))
+
+### New/Updated Skills
+- **research-paper-writing** — full end-to-end research pipeline (replaced ml-paper-writing) ([#4654](https://github.com/NousResearch/hermes-agent/pull/4654)) — @SHL0MS
+- **ascii-video** — text readability techniques and external layout oracle ([#4054](https://github.com/NousResearch/hermes-agent/pull/4054)) — @SHL0MS
+- **youtube-transcript** updated for youtube-transcript-api v1.x ([#4455](https://github.com/NousResearch/hermes-agent/pull/4455)) — @el-analista
+- **Skills browse and search page** added to documentation site ([#4500](https://github.com/NousResearch/hermes-agent/pull/4500)) — @IAvecilla
+
+---
+
+## 🔒 Security & Reliability
+
+### Security Hardening
+- **Block secret exfiltration** via browser URLs and LLM responses — scans for secret patterns in URL encoding, base64, and prompt injection vectors ([#4483](https://github.com/NousResearch/hermes-agent/pull/4483))
+- **Redact secrets from execute_code sandbox output** ([#4360](https://github.com/NousResearch/hermes-agent/pull/4360))
+- **Protect `.docker`, `.azure`, `.config/gh` credential directories** from read/write via file tools and terminal ([#4305](https://github.com/NousResearch/hermes-agent/pull/4305), [#4327](https://github.com/NousResearch/hermes-agent/pull/4327)) — @memosr
+- **GitHub OAuth token patterns** added to redaction + snapshot redact flag ([#4295](https://github.com/NousResearch/hermes-agent/pull/4295))
+- **Reject private and loopback IPs** in Telegram DoH fallback ([#4129](https://github.com/NousResearch/hermes-agent/pull/4129))
+- **Reject path traversal** in credential file registration ([#4316](https://github.com/NousResearch/hermes-agent/pull/4316))
+- **Validate tar archive member paths** on profile import — blocks zip-slip attacks ([#4318](https://github.com/NousResearch/hermes-agent/pull/4318))
+- **Exclude auth.json and .env** from profile exports ([#4475](https://github.com/NousResearch/hermes-agent/pull/4475))
+
+### Reliability
+- **Prevent compression death spiral** from API disconnects ([#4750](https://github.com/NousResearch/hermes-agent/pull/4750), closes [#2153](https://github.com/NousResearch/hermes-agent/issues/2153))
+- **Handle `is_closed` as method** in OpenAI SDK — prevents false positive client closure detection ([#4416](https://github.com/NousResearch/hermes-agent/pull/4416), closes [#4377](https://github.com/NousResearch/hermes-agent/issues/4377))
+- **Exclude matrix from [all] extras** — python-olm is upstream-broken, prevents install failures ([#4615](https://github.com/NousResearch/hermes-agent/pull/4615), closes [#4178](https://github.com/NousResearch/hermes-agent/issues/4178))
+- **OpenCode model routing** repaired ([#4508](https://github.com/NousResearch/hermes-agent/pull/4508))
+- **Docker container image** optimized ([#4034](https://github.com/NousResearch/hermes-agent/pull/4034)) — @bcross
+
+### Windows & Cross-Platform
+- **Voice mode in WSL** with PulseAudio bridge ([#4317](https://github.com/NousResearch/hermes-agent/pull/4317))
+- **Homebrew packaging** preparation ([#4099](https://github.com/NousResearch/hermes-agent/pull/4099))
+- **CI fork conditionals** to prevent workflow failures on forks ([#4107](https://github.com/NousResearch/hermes-agent/pull/4107))
+
+---
+
+## 🐛 Notable Bug Fixes
+
+- **Gateway approval blocked agent thread** — approval now blocks the agent thread like CLI does, preventing tool result loss ([#4557](https://github.com/NousResearch/hermes-agent/pull/4557), closes [#4542](https://github.com/NousResearch/hermes-agent/issues/4542))
+- **Compression death spiral** from API disconnects — detected and halted instead of looping ([#4750](https://github.com/NousResearch/hermes-agent/pull/4750), closes [#2153](https://github.com/NousResearch/hermes-agent/issues/2153))
+- **Anthropic thinking blocks lost** across tool-use turns ([#4626](https://github.com/NousResearch/hermes-agent/pull/4626))
+- **Profile model config ignored** with `-p` flag — model.model now promoted to model.default correctly ([#4160](https://github.com/NousResearch/hermes-agent/pull/4160), closes [#4486](https://github.com/NousResearch/hermes-agent/issues/4486))
+- **CLI blank space** between response and input area ([#4412](https://github.com/NousResearch/hermes-agent/pull/4412), [#4359](https://github.com/NousResearch/hermes-agent/pull/4359), closes [#4398](https://github.com/NousResearch/hermes-agent/issues/4398))
+- **Dragged file paths** treated as slash commands instead of file references ([#4533](https://github.com/NousResearch/hermes-agent/pull/4533)) — @rolme
+- **Orphaned `</think>` tags** leaking into user-facing responses ([#4311](https://github.com/NousResearch/hermes-agent/pull/4311), closes [#4285](https://github.com/NousResearch/hermes-agent/issues/4285))
+- **OpenAI SDK `is_closed`** is a method not property — false positive client closure ([#4416](https://github.com/NousResearch/hermes-agent/pull/4416), closes [#4377](https://github.com/NousResearch/hermes-agent/issues/4377))
+- **MCP OAuth server** could block Hermes startup instead of degrading gracefully ([#4757](https://github.com/NousResearch/hermes-agent/pull/4757), closes [#4462](https://github.com/NousResearch/hermes-agent/issues/4462))
+- **MCP event loop closed** on shutdown with HTTP servers ([#4757](https://github.com/NousResearch/hermes-agent/pull/4757), closes [#2537](https://github.com/NousResearch/hermes-agent/issues/2537))
+- **Alibaba provider** hardcoded to wrong endpoint ([#4133](https://github.com/NousResearch/hermes-agent/pull/4133), closes [#3912](https://github.com/NousResearch/hermes-agent/issues/3912))
+- **Slack reply_in_thread** missing config option ([#4643](https://github.com/NousResearch/hermes-agent/pull/4643), closes [#2662](https://github.com/NousResearch/hermes-agent/issues/2662))
+- **Quiet mode exit code** — successful `-q` queries no longer exit nonzero ([#4613](https://github.com/NousResearch/hermes-agent/pull/4613), closes [#4601](https://github.com/NousResearch/hermes-agent/issues/4601))
+- **Mobile sidebar** shows only close button due to backdrop-filter issue in docs site ([#4207](https://github.com/NousResearch/hermes-agent/pull/4207)) — @xsmyile
+- **Config restore reverted** by stale-branch squash merge — `_config_version` fixed ([#4440](https://github.com/NousResearch/hermes-agent/pull/4440))
+
+---
+
+## 🧪 Testing
+
+- **Telegram gateway E2E tests** — full integration test suite for the Telegram adapter ([#4497](https://github.com/NousResearch/hermes-agent/pull/4497)) — @pefontana
+- **11 real test failures fixed** plus sys.modules cascade poisoner resolved ([#4570](https://github.com/NousResearch/hermes-agent/pull/4570))
+- **7 CI failures resolved** across hooks, plugins, and skill tests ([#3936](https://github.com/NousResearch/hermes-agent/pull/3936))
+- **Codex 401 refresh tests** updated for CI compatibility ([#4166](https://github.com/NousResearch/hermes-agent/pull/4166))
+- **Stale OPENAI_BASE_URL test** fixed ([#4217](https://github.com/NousResearch/hermes-agent/pull/4217))
+
+---
+
+## 📚 Documentation
+
+- **Comprehensive documentation audit** — 9 HIGH and 20+ MEDIUM gaps fixed across 21 files ([#4087](https://github.com/NousResearch/hermes-agent/pull/4087))
+- **Site navigation restructured** — features and platforms promoted to top-level ([#4116](https://github.com/NousResearch/hermes-agent/pull/4116))
+- **Tool progress streaming** documented for API server and Open WebUI ([#4138](https://github.com/NousResearch/hermes-agent/pull/4138))
+- **Telegram webhook mode** documentation ([#4089](https://github.com/NousResearch/hermes-agent/pull/4089))
+- **Local LLM provider guides** — comprehensive setup guides with context length warnings ([#4294](https://github.com/NousResearch/hermes-agent/pull/4294))
+- **WhatsApp allowlist behavior** clarified with `WHATSAPP_ALLOW_ALL_USERS` documentation ([#4293](https://github.com/NousResearch/hermes-agent/pull/4293))
+- **Slack configuration options** — new config section in Slack docs ([#4644](https://github.com/NousResearch/hermes-agent/pull/4644))
+- **Terminal backends section** expanded + docs build fixes ([#4016](https://github.com/NousResearch/hermes-agent/pull/4016))
+- **Adding-providers guide** updated for unified setup flow ([#4201](https://github.com/NousResearch/hermes-agent/pull/4201))
+- **ACP Zed config** fixed ([#4743](https://github.com/NousResearch/hermes-agent/pull/4743))
+- **Community FAQ** entries for common workflows and troubleshooting ([#4797](https://github.com/NousResearch/hermes-agent/pull/4797))
+- **Skills browse and search page** on docs site ([#4500](https://github.com/NousResearch/hermes-agent/pull/4500)) — @IAvecilla
+
+---
+
+## 👥 Contributors
+
+### Core
+- **@teknium1** — 135 commits across all subsystems
+
+### Top Community Contributors
+- **@kshitijk4poor** — 13 commits: preserve allowed_users during setup ([#4551](https://github.com/NousResearch/hermes-agent/pull/4551)), and various fixes
+- **@erosika** — 12 commits: Honcho full integration parity restored as memory provider plugin ([#4355](https://github.com/NousResearch/hermes-agent/pull/4355))
+- **@pefontana** — 9 commits: Telegram gateway E2E test suite ([#4497](https://github.com/NousResearch/hermes-agent/pull/4497))
+- **@bcross** — 5 commits: Docker container image optimization ([#4034](https://github.com/NousResearch/hermes-agent/pull/4034))
+- **@SHL0MS** — 4 commits: NO_COLOR/TERM=dumb support ([#4079](https://github.com/NousResearch/hermes-agent/pull/4079)), ascii-video skill updates ([#4054](https://github.com/NousResearch/hermes-agent/pull/4054)), research-paper-writing skill ([#4654](https://github.com/NousResearch/hermes-agent/pull/4654))
+
+### All Contributors
+@0xbyt4, @arasovic, @Bartok9, @bcross, @binhnt92, @camden-lowrance, @curtitoo, @Dakota, @Dave Tist, @Dean Kerr, @devorun, @dieutx, @Dilee, @el-analista, @erosika, @Gutslabs, @IAvecilla, @Jack, @Johannnnn506, @kshitijk4poor, @Laura Batalha, @Leegenux, @Lume, @MacroAnarchy, @maymuneth, @memosr, @NexVeridian, @Nick, @nils010485, @pefontana, @Penov, @rolme, @SHL0MS, @txchen, @xsmyile
+
+### Issues Resolved from Community
+@acsezen ([#2537](https://github.com/NousResearch/hermes-agent/issues/2537)), @arasovic ([#4285](https://github.com/NousResearch/hermes-agent/issues/4285)), @camden-lowrance ([#4462](https://github.com/NousResearch/hermes-agent/issues/4462)), @devorun ([#4601](https://github.com/NousResearch/hermes-agent/issues/4601)), @eloklam ([#4486](https://github.com/NousResearch/hermes-agent/issues/4486)), @HenkDz ([#3719](https://github.com/NousResearch/hermes-agent/issues/3719)), @hypotyposis ([#2153](https://github.com/NousResearch/hermes-agent/issues/2153)), @kazamak ([#4178](https://github.com/NousResearch/hermes-agent/issues/4178)), @lstep ([#4366](https://github.com/NousResearch/hermes-agent/issues/4366)), @Mark-Lok ([#4542](https://github.com/NousResearch/hermes-agent/issues/4542)), @NoJster ([#4421](https://github.com/NousResearch/hermes-agent/issues/4421)), @patp ([#2662](https://github.com/NousResearch/hermes-agent/issues/2662)), @pr0n ([#4601](https://github.com/NousResearch/hermes-agent/issues/4601)), @saulmc ([#4377](https://github.com/NousResearch/hermes-agent/issues/4377)), @SHL0MS ([#4060](https://github.com/NousResearch/hermes-agent/issues/4060), [#4061](https://github.com/NousResearch/hermes-agent/issues/4061), [#4066](https://github.com/NousResearch/hermes-agent/issues/4066), [#4172](https://github.com/NousResearch/hermes-agent/issues/4172), [#4277](https://github.com/NousResearch/hermes-agent/issues/4277)), @Z-Mackintosh ([#4398](https://github.com/NousResearch/hermes-agent/issues/4398))
+
+---
+
+**Full Changelog**: [v2026.3.30...v2026.4.3](https://github.com/NousResearch/hermes-agent/compare/v2026.3.30...v2026.4.3)
@@ -22,6 +22,9 @@ from acp.schema import (
    InitializeResponse,
    ListSessionsResponse,
    LoadSessionResponse,
+    McpServerHttp,
+    McpServerSse,
+    McpServerStdio,
    NewSessionResponse,
    PromptResponse,
    ResumeSessionResponse,
@@ -93,6 +96,71 @@ class HermesACPAgent(acp.Agent):
        self._conn = conn
        logger.info("ACP client connected")

+    async def _register_session_mcp_servers(
+        self,
+        state: SessionState,
+        mcp_servers: list[McpServerStdio | McpServerHttp | McpServerSse] | None,
+    ) -> None:
+        """Register ACP-provided MCP servers and refresh the agent tool surface."""
+        if not mcp_servers:
+            return
+
+        try:
+            from tools.mcp_tool import register_mcp_servers
+
+            config_map: dict[str, dict] = {}
+            for server in mcp_servers:
+                name = server.name
+                if isinstance(server, McpServerStdio):
+                    config = {
+                        "command": server.command,
+                        "args": list(server.args),
+                        "env": {item.name: item.value for item in server.env},
+                    }
+                else:
+                    config = {
+                        "url": server.url,
+                        "headers": {item.name: item.value for item in server.headers},
+                    }
+                config_map[name] = config
+
+            await asyncio.to_thread(register_mcp_servers, config_map)
+        except Exception:
+            logger.warning(
+                "Session %s: failed to register ACP MCP servers",
+                state.session_id,
+                exc_info=True,
+            )
+            return
+
+        try:
+            from model_tools import get_tool_definitions
+
+            enabled_toolsets = getattr(state.agent, "enabled_toolsets", None) or ["hermes-acp"]
+            disabled_toolsets = getattr(state.agent, "disabled_toolsets", None)
+            state.agent.tools = get_tool_definitions(
+                enabled_toolsets=enabled_toolsets,
+                disabled_toolsets=disabled_toolsets,
+                quiet_mode=True,
+            )
+            state.agent.valid_tool_names = {
+                tool["function"]["name"] for tool in state.agent.tools or []
+            }
+            invalidate = getattr(state.agent, "_invalidate_system_prompt", None)
+            if callable(invalidate):
+                invalidate()
+            logger.info(
+                "Session %s: refreshed tool surface after ACP MCP registration (%d tools)",
+                state.session_id,
+                len(state.agent.tools or []),
+            )
+        except Exception:
+            logger.warning(
+                "Session %s: failed to refresh tool surface after ACP MCP registration",
+                state.session_id,
+                exc_info=True,
+            )
+
    # ---- ACP lifecycle ------------------------------------------------------

    async def initialize(
@@ -149,6 +217,7 @@ class HermesACPAgent(acp.Agent):
        **kwargs: Any,
    ) -> NewSessionResponse:
        state = self.session_manager.create_session(cwd=cwd)
+        await self._register_session_mcp_servers(state, mcp_servers)
        logger.info("New session %s (cwd=%s)", state.session_id, cwd)
        return NewSessionResponse(session_id=state.session_id)

@@ -163,6 +232,7 @@ class HermesACPAgent(acp.Agent):
        if state is None:
            logger.warning("load_session: session %s not found", session_id)
            return None
+        await self._register_session_mcp_servers(state, mcp_servers)
        logger.info("Loaded session %s", session_id)
        return LoadSessionResponse()

@@ -177,6 +247,7 @@ class HermesACPAgent(acp.Agent):
        if state is None:
            logger.warning("resume_session: session %s not found, creating new", session_id)
            state = self.session_manager.create_session(cwd=cwd)
+        await self._register_session_mcp_servers(state, mcp_servers)
        logger.info("Resumed session %s", state.session_id)
        return ResumeSessionResponse()

@@ -200,6 +271,8 @@ class HermesACPAgent(acp.Agent):
    ) -> ForkSessionResponse:
        state = self.session_manager.fork_session(session_id, cwd=cwd)
        new_id = state.session_id if state else ""
+        if state is not None:
+            await self._register_session_mcp_servers(state, mcp_servers)
        logger.info("Forked session %s -> %s", session_id, new_id)
        return ForkSessionResponse(session_id=new_id)

@@ -488,11 +488,19 @@ def build_skills_system_prompt(
        return ""

    # ── Layer 1: in-process LRU cache ─────────────────────────────────
+    # Include the resolved platform so per-platform disabled-skill lists
+    # produce distinct cache entries (gateway serves multiple platforms).
+    _platform_hint = (
+        os.environ.get("HERMES_PLATFORM")
+        or os.environ.get("HERMES_SESSION_PLATFORM")
+        or ""
+    )
    cache_key = (
        str(skills_dir.resolve()),
        tuple(str(d) for d in external_dirs),
        tuple(sorted(str(t) for t in (available_tools or set()))),
        tuple(sorted(str(ts) for ts in (available_toolsets or set()))),
+        _platform_hint,
    )
    with _SKILLS_PROMPT_CACHE_LOCK:
        cached = _SKILLS_PROMPT_CACHE.get(cache_key)
@@ -118,12 +118,17 @@ def skill_matches_platform(frontmatter: Dict[str, Any]) -> bool:
 # ── Disabled skills ───────────────────────────────────────────────────────


-def get_disabled_skill_names() -> Set[str]:
+def get_disabled_skill_names(platform: str | None = None) -> Set[str]:
    """Read disabled skill names from config.yaml.

-    Resolves platform from ``HERMES_PLATFORM`` env var, falls back to
-    the global disabled list.  Reads the config file directly (no CLI
-    config imports) to stay lightweight.
+    Args:
+        platform: Explicit platform name (e.g. ``"telegram"``).  When
+            *None*, resolves from ``HERMES_PLATFORM`` or
+            ``HERMES_SESSION_PLATFORM`` env vars.  Falls back to the
+            global disabled list when no platform is determined.
+
+    Reads the config file directly (no CLI config imports) to stay
+    lightweight.
    """
    config_path = get_hermes_home() / "config.yaml"
    if not config_path.exists():
@@ -140,7 +145,11 @@ def get_disabled_skill_names() -> Set[str]:
    if not isinstance(skills_cfg, dict):
        return set()

-    resolved_platform = os.getenv("HERMES_PLATFORM")
+    resolved_platform = (
+        platform
+        or os.getenv("HERMES_PLATFORM")
+        or os.getenv("HERMES_SESSION_PLATFORM")
+    )
    if resolved_platform:
        platform_disabled = (skills_cfg.get("platform_disabled") or {}).get(
            resolved_platform
@@ -3052,10 +3052,54 @@ class HermesCLI:
        print(f"  Config File: {config_path} {config_status}")
        print()
    
+    def _list_recent_sessions(self, limit: int = 10) -> list[dict[str, Any]]:
+        """Return recent CLI sessions for in-chat browsing/resume affordances."""
+        if not self._session_db:
+            return []
+        try:
+            sessions = self._session_db.list_sessions_rich(
+                source="cli",
+                exclude_sources=["tool"],
+                limit=limit,
+            )
+        except Exception:
+            return []
+        return [s for s in sessions if s.get("id") != self.session_id]
+
+    def _show_recent_sessions(self, *, reason: str = "history", limit: int = 10) -> bool:
+        """Render recent sessions inline from the active chat TUI.
+
+        Returns True when something was shown, False if no session list was available.
+        """
+        sessions = self._list_recent_sessions(limit=limit)
+        if not sessions:
+            return False
+
+        from hermes_cli.main import _relative_time
+
+        print()
+        if reason == "history":
+            print("(._.) No messages in the current chat yet — here are recent sessions you can resume:")
+        else:
+            print("  Recent sessions:")
+        print()
+        print(f"  {'Title':<32} {'Preview':<40} {'Last Active':<13} {'ID'}")
+        print(f"  {'─' * 32} {'─' * 40} {'─' * 13} {'─' * 24}")
+        for session in sessions:
+            title = (session.get("title") or "—")[:30]
+            preview = (session.get("preview") or "")[:38]
+            last_active = _relative_time(session.get("last_active"))
+            print(f"  {title:<32} {preview:<40} {last_active:<13} {session['id']}")
+        print()
+        print("  Use /resume <session id or title> to continue where you left off.")
+        print()
+        return True
+
    def show_history(self):
        """Display conversation history."""
        if not self.conversation_history:
-            print("(._.) No conversation history yet.")
+            if not self._show_recent_sessions(reason="history"):
+                print("(._.) No conversation history yet.")
            return

        preview_limit = 400
@@ -3180,6 +3224,8 @@ class HermesCLI:

        if not target:
            _cprint("  Usage: /resume <session_id_or_title>")
+            if self._show_recent_sessions(reason="resume"):
+                return
            _cprint("  Tip:   Use /history or `hermes sessions list` to find sessions.")
            return

@@ -3401,7 +3447,122 @@ class HermesCLI:
            print("  Run: hermes setup")
            print()

-        print("  To change model or provider, use: hermes model")
+        print("  Switch mid-chat: /model <provider:model>")
+        print("  Full picker: hermes model")
+
+    def _handle_model_switch(self, cmd: str):
+        """Handle /model command — switch model mid-session.
+
+        Syntax:
+            /model                          → show current model + usage
+            /model sonnet                   → alias for claude-sonnet-4.6
+            /model claude-sonnet-4          → auto-detect provider
+            /model openai:gpt-5            → explicit provider
+            /model custom                   → switch to custom endpoint
+        """
+        from hermes_cli.models import _PROVIDER_LABELS, normalize_provider
+        from hermes_cli.model_switch import (
+            switch_model, switch_to_custom_provider,
+            MODEL_ALIASES, suggest_models,
+        )
+
+        parts = cmd.split(maxsplit=1)
+        raw_input = parts[1].strip() if len(parts) > 1 else ""
+
+        # No argument → show current model and how to switch
+        if not raw_input:
+            provider_label = _PROVIDER_LABELS.get(self.provider, self.provider)
+            print(f"\n  Current: {self.model} via {provider_label}")
+            print()
+            print("  Switch with aliases:")
+            print("    /model sonnet       /model opus        /model haiku")
+            print("    /model gpt5         /model gpt5-mini   /model codex")
+            print("    /model gemini       /model deepseek    /model grok")
+            print()
+            print("  Or full names:  /model anthropic/claude-sonnet-4.5")
+            print("  Direct provider:  /model anthropic:claude-opus-4")
+            print("  Custom endpoint:  /model custom:my-local-model")
+            print()
+            return
+
+        # Handle bare "custom" → auto-detect custom endpoint
+        if raw_input.lower() == "custom":
+            result = switch_to_custom_provider()
+            if not result.success:
+                print(f"\n  Error: {result.error_message}")
+                return
+            raw_input = f"custom:{result.model}"
+
+        # Same model check (quick path)
+        if raw_input == self.model:
+            print(f"\n  Already using {self.model}")
+            return
+
+        # Run the shared switch pipeline
+        result = switch_model(
+            raw_input,
+            current_provider=self.provider,
+            current_model=self.model,
+            current_base_url=self.base_url,
+            current_api_key=getattr(self, 'api_key', ''),
+        )
+
+        if not result.success:
+            # On failure, try to suggest alternatives
+            suggestions = suggest_models(raw_input)
+            print(f"\n  Error: {result.error_message}")
+            if suggestions:
+                sug_str = ", ".join(suggestions)
+                print(f"  Did you mean: {sug_str}?")
+            print()
+            return
+
+        # Same model after resolution (e.g. alias resolved to current)
+        if result.new_model == self.model and not result.provider_changed:
+            print(f"\n  Already using {self.model}")
+            return
+
+        old_model = self.model
+        old_provider = self.provider
+
+        # Apply the switch to the live agent (if one exists)
+        if self.agent is not None:
+            self.agent.switch_model(
+                new_model=result.new_model,
+                new_provider=result.target_provider,
+                api_key=result.api_key,
+                base_url=result.base_url,
+                api_mode=result.api_mode,
+            )
+
+        # Update CLI-level state so the next _init_agent() (if agent is None)
+        # also picks up the new model
+        self.model = result.new_model
+        self.provider = result.target_provider
+        self.api_key = result.api_key
+        self.base_url = result.base_url
+        self.api_mode = result.api_mode
+
+        # Persist to config.yaml so future sessions use the new model
+        if result.persist:
+            save_config_value("model.default", result.new_model)
+            save_config_value("model.provider", result.target_provider)
+            if result.base_url:
+                save_config_value("model.base_url", result.base_url)
+
+        # Format output
+        new_label = _PROVIDER_LABELS.get(result.target_provider, result.target_provider)
+        if result.resolved_via_alias:
+            print(f"\n  {result.resolved_via_alias} → {result.new_model} via {new_label}")
+        else:
+            print(f"\n  Switched to {result.new_model} via {new_label}")
+        if result.provider_changed:
+            old_label = _PROVIDER_LABELS.get(old_provider, old_provider)
+            print(f"  Provider: {old_label} → {new_label}")
+        if result.warning_message:
+            print(f"  Note: {result.warning_message}")
+        print(f"  Prompt cache reset (new model).")
+        print()

    def _handle_prompt_command(self, cmd: str):
        """Handle the /prompt command to view or set system prompt."""
@@ -3952,6 +4113,8 @@ class HermesCLI:
            self.new_session()
        elif canonical == "resume":
            self._handle_resume_command(cmd_original)
+        elif canonical == "model":
+            self._handle_model_switch(cmd_original)
        elif canonical == "provider":
            self._show_model_and_providers()
        elif canonical == "prompt":
@@ -4970,11 +5133,18 @@ class HermesCLI:
            return  # mcp_servers unchanged (some other section was edited)

        self._config_mcp_servers = new_mcp
-        # Notify user and reload
+        # Notify user and reload.  Run in a separate thread with a hard
+        # timeout so a hung MCP server cannot block the process_loop
+        # indefinitely (which would freeze the entire TUI).
        print()
        print("🔄 MCP server config changed — reloading connections...")
-        with self._busy_command(self._slow_command_status("/reload-mcp")):
-            self._reload_mcp()
+        _reload_thread = threading.Thread(
+            target=self._reload_mcp, daemon=True
+        )
+        _reload_thread.start()
+        _reload_thread.join(timeout=30)
+        if _reload_thread.is_alive():
+            print("  ⚠️  MCP reload timed out (30s). Some servers may not have reconnected.")

    def _reload_mcp(self):
        """Reload MCP servers: disconnect all, re-read config.yaml, reconnect.
@@ -9,6 +9,7 @@ runs at a time if multiple processes overlap.
 """

 import asyncio
+import concurrent.futures
 import json
 import logging
 import os
@@ -443,8 +444,30 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
            session_db=_session_db,
        )
        
-        result = agent.run_conversation(prompt)
-        
+        # Run the agent with a timeout so a hung API call or tool doesn't
+        # block the cron ticker thread indefinitely.  Default 10 minutes;
+        # override via env var.  Uses a separate thread because
+        # run_conversation is synchronous.
+        _cron_timeout = float(os.getenv("HERMES_CRON_TIMEOUT", 600))
+        _cron_pool = concurrent.futures.ThreadPoolExecutor(max_workers=1)
+        _cron_future = _cron_pool.submit(agent.run_conversation, prompt)
+        try:
+            result = _cron_future.result(timeout=_cron_timeout)
+        except concurrent.futures.TimeoutError:
+            logger.error(
+                "Job '%s' timed out after %.0fs — interrupting agent",
+                job_name, _cron_timeout,
+            )
+            if hasattr(agent, "interrupt"):
+                agent.interrupt("Cron job timed out")
+            _cron_pool.shutdown(wait=False, cancel_futures=True)
+            raise TimeoutError(
+                f"Cron job '{job_name}' timed out after "
+                f"{int(_cron_timeout // 60)} minutes"
+            )
+        finally:
+            _cron_pool.shutdown(wait=False)
+
        final_response = result.get("final_response", "") or ""
        # Use a separate variable for log display; keep final_response clean
        # for delivery logic (empty response = no delivery).
@@ -76,14 +76,13 @@ Open Zed settings (`Cmd+,` on macOS or `Ctrl+,` on Linux) and add to your

 ```json
 {
-  "acp": {
-    "agents": [
-      {
-        "name": "hermes-agent",
-        "registry_dir": "/path/to/hermes-agent/acp_registry"
-      }
-    ]
-  }
+  "agent_servers": {
+    "hermes-agent": {
+      "type": "custom",
+      "command": "hermes",
+      "args": ["acp"],
+    },
+  },
 }
 ```

@@ -563,6 +563,18 @@ def load_gateway_config() -> GatewayConfig:
                    if isinstance(frc, list):
                        frc = ",".join(str(v) for v in frc)
                    os.environ["TELEGRAM_FREE_RESPONSE_CHATS"] = str(frc)
+
+            whatsapp_cfg = yaml_cfg.get("whatsapp", {})
+            if isinstance(whatsapp_cfg, dict):
+                if "require_mention" in whatsapp_cfg and not os.getenv("WHATSAPP_REQUIRE_MENTION"):
+                    os.environ["WHATSAPP_REQUIRE_MENTION"] = str(whatsapp_cfg["require_mention"]).lower()
+                if "mention_patterns" in whatsapp_cfg and not os.getenv("WHATSAPP_MENTION_PATTERNS"):
+                    os.environ["WHATSAPP_MENTION_PATTERNS"] = json.dumps(whatsapp_cfg["mention_patterns"])
+                frc = whatsapp_cfg.get("free_response_chats")
+                if frc is not None and not os.getenv("WHATSAPP_FREE_RESPONSE_CHATS"):
+                    if isinstance(frc, list):
+                        frc = ",".join(str(v) for v in frc)
+                    os.environ["WHATSAPP_FREE_RESPONSE_CHATS"] = str(frc)
    except Exception as e:
        logger.warning(
            "Failed to process config.yaml — falling back to .env / gateway.json values. "
@@ -372,6 +372,24 @@ class APIServerAdapter(BasePlatformAdapter):
            status=401,
        )

+    # ------------------------------------------------------------------
+    # Session DB helper
+    # ------------------------------------------------------------------
+
+    def _ensure_session_db(self):
+        """Lazily initialise and return the shared SessionDB instance.
+
+        Sessions are persisted to ``state.db`` so that ``hermes sessions list``
+        shows API-server conversations alongside CLI and gateway ones.
+        """
+        if self._session_db is None:
+            try:
+                from hermes_state import SessionDB
+                self._session_db = SessionDB()
+            except Exception as e:
+                logger.debug("SessionDB unavailable for API server: %s", e)
+        return self._session_db
+
    # ------------------------------------------------------------------
    # Agent creation helper
    # ------------------------------------------------------------------
@@ -415,6 +433,7 @@ class APIServerAdapter(BasePlatformAdapter):
            platform="api_server",
            stream_delta_callback=stream_delta_callback,
            tool_progress_callback=tool_progress_callback,
+            session_db=self._ensure_session_db(),
        )
        return agent

@@ -503,10 +522,9 @@ class APIServerAdapter(BasePlatformAdapter):
        if provided_session_id:
            session_id = provided_session_id
            try:
-                if self._session_db is None:
-                    from hermes_state import SessionDB
-                    self._session_db = SessionDB()
-                history = self._session_db.get_messages_as_conversation(session_id)
+                db = self._ensure_session_db()
+                if db is not None:
+                    history = db.get_messages_as_conversation(session_id)
            except Exception as e:
                logger.warning("Failed to load session history for %s: %s", session_id, e)
                history = []
@@ -1046,6 +1046,13 @@ class BasePlatformAdapter(ABC):
            self._active_sessions[session_key].set()
            return  # Don't process now - will be handled after current task finishes
        
+        # Mark session as active BEFORE spawning background task to close
+        # the race window where a second message arriving before the task
+        # starts would also pass the _active_sessions check and spawn a
+        # duplicate task.  (grammY sequentialize / aiogram EventIsolation
+        # pattern — set the guard synchronously, not inside the task.)
+        self._active_sessions[session_key] = asyncio.Event()
+
        # Spawn background task to process this message
        task = asyncio.create_task(self._process_message_background(event, session_key))
        try:
@@ -1092,8 +1099,10 @@ class BasePlatformAdapter(ABC):
            if getattr(result, "success", False):
                delivery_succeeded = True

-        # Create interrupt event for this session
-        interrupt_event = asyncio.Event()
+        # Reuse the interrupt event set by handle_message() (which marks
+        # the session active before spawning this task to prevent races).
+        # Fall back to a new Event only if the entry was removed externally.
+        interrupt_event = self._active_sessions.get(session_key) or asyncio.Event()
        self._active_sessions[session_key] = interrupt_event
        
        # Start continuous typing indicator (refreshes every 2 seconds)
@@ -1106,9 +1115,12 @@ class BasePlatformAdapter(ABC):
            # Call the handler (this can take a while with tool calls)
            response = await self._message_handler(event)
            
-            # Send response if any
+            # Send response if any.  A None/empty response is normal when
+            # streaming already delivered the text (already_sent=True) or
+            # when the message was queued behind an active agent.  Log at
+            # DEBUG to avoid noisy warnings for expected behavior.
            if not response:
-                logger.warning("[%s] Handler returned empty/None response for %s", self.name, event.source.chat_id)
+                logger.debug("[%s] Handler returned empty/None response for %s", self.name, event.source.chat_id)
            if response:
                # Extract MEDIA:<path> tags (from TTS tool) before other processing
                media_files, response = self.extract_media(response)
@@ -1617,6 +1617,16 @@ class DiscordAdapter(BasePlatformAdapter):
        async def slash_update(interaction: discord.Interaction):
            await self._run_simple_slash(interaction, "/update", "Update initiated~")

+        @tree.command(name="approve", description="Approve a pending dangerous command")
+        @discord.app_commands.describe(scope="Optional: 'all', 'session', 'always', 'all session', 'all always'")
+        async def slash_approve(interaction: discord.Interaction, scope: str = ""):
+            await self._run_simple_slash(interaction, f"/approve {scope}".strip())
+
+        @tree.command(name="deny", description="Deny a pending dangerous command")
+        @discord.app_commands.describe(scope="Optional: 'all' to deny all pending commands")
+        async def slash_deny(interaction: discord.Interaction, scope: str = ""):
+            await self._run_simple_slash(interaction, f"/deny {scope}".strip())
+
        @tree.command(name="thread", description="Create a new thread and start a Hermes session in it")
        @discord.app_commands.describe(
            name="Thread name",
@@ -1860,33 +1870,41 @@ class DiscordAdapter(BasePlatformAdapter):
            return None

    async def send_exec_approval(
-        self, chat_id: str, command: str, approval_id: str
+        self, chat_id: str, command: str, session_key: str,
+        description: str = "dangerous command",
+        metadata: Optional[dict] = None,
    ) -> SendResult:
        """
        Send a button-based exec approval prompt for a dangerous command.

-        Returns SendResult. The approval is resolved when a user clicks a button.
+        The buttons call ``resolve_gateway_approval()`` to unblock the waiting
+        agent thread — this replaces the text-based ``/approve`` flow on Discord.
        """
        if not self._client or not DISCORD_AVAILABLE:
            return SendResult(success=False, error="Not connected")

        try:
-            channel = self._client.get_channel(int(chat_id))
+            # Resolve channel — use thread_id from metadata if present
+            target_id = chat_id
+            if metadata and metadata.get("thread_id"):
+                target_id = metadata["thread_id"]
+
+            channel = self._client.get_channel(int(target_id))
            if not channel:
-                channel = await self._client.fetch_channel(int(chat_id))
+                channel = await self._client.fetch_channel(int(target_id))

            # Discord embed description limit is 4096; show full command up to that
            max_desc = 4088
            cmd_display = command if len(command) <= max_desc else command[: max_desc - 3] + "..."
            embed = discord.Embed(
-                title="Command Approval Required",
+                title="⚠️ Command Approval Required",
                description=f"```\n{cmd_display}\n```",
                color=discord.Color.orange(),
            )
-            embed.set_footer(text=f"Approval ID: {approval_id}")
+            embed.add_field(name="Reason", value=description, inline=False)

            view = ExecApprovalView(
-                approval_id=approval_id,
+                session_key=session_key,
                allowed_user_ids=self._allowed_user_ids,
            )

@@ -2219,13 +2237,15 @@ if DISCORD_AVAILABLE:
        """
        Interactive button view for exec approval of dangerous commands.

-        Shows three buttons: Allow Once (green), Always Allow (blue), Deny (red).
-        Only users in the allowed list can click. The view times out after 5 minutes.
+        Shows four buttons: Allow Once, Allow Session, Always Allow, Deny.
+        Clicking a button calls ``resolve_gateway_approval()`` to unblock the
+        waiting agent thread — the same mechanism as the text ``/approve`` flow.
+        Only users in the allowed list can click.  Times out after 5 minutes.
        """

-        def __init__(self, approval_id: str, allowed_user_ids: set):
+        def __init__(self, session_key: str, allowed_user_ids: set):
            super().__init__(timeout=300)  # 5-minute timeout
-            self.approval_id = approval_id
+            self.session_key = session_key
            self.allowed_user_ids = allowed_user_ids
            self.resolved = False

@@ -2236,9 +2256,10 @@ if DISCORD_AVAILABLE:
            return str(interaction.user.id) in self.allowed_user_ids

        async def _resolve(
-            self, interaction: discord.Interaction, action: str, color: discord.Color
+            self, interaction: discord.Interaction, choice: str,
+            color: discord.Color, label: str,
        ):
-            """Resolve the approval and update the message."""
+            """Resolve the approval via the gateway approval queue and update the embed."""
            if self.resolved:
                await interaction.response.send_message(
                    "This approval has already been resolved~", ephemeral=True
@@ -2257,7 +2278,7 @@ if DISCORD_AVAILABLE:
            embed = interaction.message.embeds[0] if interaction.message.embeds else None
            if embed:
                embed.color = color
-                embed.set_footer(text=f"{action} by {interaction.user.display_name}")
+                embed.set_footer(text=f"{label} by {interaction.user.display_name}")

            # Disable all buttons
            for child in self.children:
@@ -2265,33 +2286,40 @@ if DISCORD_AVAILABLE:

            await interaction.response.edit_message(embed=embed, view=self)

-            # Store the approval decision
+            # Unblock the waiting agent thread via the gateway approval queue
            try:
-                from tools.approval import approve_permanent
-                if action == "allow_once":
-                    pass  # One-time approval handled by gateway
-                elif action == "allow_always":
-                    approve_permanent(self.approval_id)
-            except ImportError:
-                pass
+                from tools.approval import resolve_gateway_approval
+                count = resolve_gateway_approval(self.session_key, choice)
+                logger.info(
+                    "Discord button resolved %d approval(s) for session %s (choice=%s, user=%s)",
+                    count, self.session_key, choice, interaction.user.display_name,
+                )
+            except Exception as exc:
+                logger.error("Failed to resolve gateway approval from button: %s", exc)

        @discord.ui.button(label="Allow Once", style=discord.ButtonStyle.green)
        async def allow_once(
            self, interaction: discord.Interaction, button: discord.ui.Button
        ):
-            await self._resolve(interaction, "allow_once", discord.Color.green())
+            await self._resolve(interaction, "once", discord.Color.green(), "Approved once")
+
+        @discord.ui.button(label="Allow Session", style=discord.ButtonStyle.grey)
+        async def allow_session(
+            self, interaction: discord.Interaction, button: discord.ui.Button
+        ):
+            await self._resolve(interaction, "session", discord.Color.blue(), "Approved for session")

        @discord.ui.button(label="Always Allow", style=discord.ButtonStyle.blurple)
        async def allow_always(
            self, interaction: discord.Interaction, button: discord.ui.Button
        ):
-            await self._resolve(interaction, "allow_always", discord.Color.blue())
+            await self._resolve(interaction, "always", discord.Color.purple(), "Approved permanently")

        @discord.ui.button(label="Deny", style=discord.ButtonStyle.red)
        async def deny(
            self, interaction: discord.Interaction, button: discord.ui.Button
        ):
-            await self._resolve(interaction, "deny", discord.Color.red())
+            await self._resolve(interaction, "deny", discord.Color.red(), "Denied")

        async def on_timeout(self):
            """Handle view timeout -- disable buttons and mark as expired."""
@@ -900,7 +900,9 @@ class TelegramAdapter(BasePlatformAdapter):
                except Exception:
                    pass  # best-effort truncation
                return SendResult(success=True, message_id=message_id)
-            # Flood control / RetryAfter — back off and retry once
+            # Flood control / RetryAfter — short waits are retried inline,
+            # long waits return a failure immediately so streaming can fall back
+            # to a normal final send instead of leaving a truncated partial.
            retry_after = getattr(e, "retry_after", None)
            if retry_after is not None or "retry after" in err_str:
                wait = retry_after if retry_after else 1.0
@@ -908,6 +910,8 @@ class TelegramAdapter(BasePlatformAdapter):
                    "[%s] Telegram flood control, waiting %.1fs",
                    self.name, wait,
                )
+                if wait > 5.0:
+                    return SendResult(success=False, error=f"flood_control:{wait}")
                await asyncio.sleep(wait)
                try:
                    await self._bot.edit_message_text(
@@ -16,9 +16,11 @@ with different backends via a bridge pattern.
 """

 import asyncio
+import json
 import logging
 import os
 import platform
+import re
 import subprocess

 _IS_WINDOWS = platform.system() == "Windows"
@@ -138,12 +140,137 @@ class WhatsAppAdapter(BasePlatformAdapter):
            get_hermes_dir("platforms/whatsapp/session", "whatsapp/session")
        ))
        self._reply_prefix: Optional[str] = config.extra.get("reply_prefix")
+        self._mention_patterns = self._compile_mention_patterns()
        self._message_queue: asyncio.Queue = asyncio.Queue()
        self._bridge_log_fh = None
        self._bridge_log: Optional[Path] = None
        self._poll_task: Optional[asyncio.Task] = None
        self._http_session: Optional["aiohttp.ClientSession"] = None
        self._session_lock_identity: Optional[str] = None
+
+    def _whatsapp_require_mention(self) -> bool:
+        configured = self.config.extra.get("require_mention")
+        if configured is not None:
+            if isinstance(configured, str):
+                return configured.lower() in ("true", "1", "yes", "on")
+            return bool(configured)
+        return os.getenv("WHATSAPP_REQUIRE_MENTION", "false").lower() in ("true", "1", "yes", "on")
+
+    def _whatsapp_free_response_chats(self) -> set[str]:
+        raw = self.config.extra.get("free_response_chats")
+        if raw is None:
+            raw = os.getenv("WHATSAPP_FREE_RESPONSE_CHATS", "")
+        if isinstance(raw, list):
+            return {str(part).strip() for part in raw if str(part).strip()}
+        return {part.strip() for part in str(raw).split(",") if part.strip()}
+
+    def _compile_mention_patterns(self):
+        patterns = self.config.extra.get("mention_patterns")
+        if patterns is None:
+            raw = os.getenv("WHATSAPP_MENTION_PATTERNS", "").strip()
+            if raw:
+                try:
+                    patterns = json.loads(raw)
+                except Exception:
+                    patterns = [part.strip() for part in raw.splitlines() if part.strip()]
+                    if not patterns:
+                        patterns = [part.strip() for part in raw.split(",") if part.strip()]
+        if patterns is None:
+            return []
+        if isinstance(patterns, str):
+            patterns = [patterns]
+        if not isinstance(patterns, list):
+            logger.warning("[%s] whatsapp mention_patterns must be a list or string; got %s", self.name, type(patterns).__name__)
+            return []
+
+        compiled = []
+        for pattern in patterns:
+            if not isinstance(pattern, str) or not pattern.strip():
+                continue
+            try:
+                compiled.append(re.compile(pattern, re.IGNORECASE))
+            except re.error as exc:
+                logger.warning("[%s] Invalid WhatsApp mention pattern %r: %s", self.name, pattern, exc)
+        if compiled:
+            logger.info("[%s] Loaded %d WhatsApp mention pattern(s)", self.name, len(compiled))
+        return compiled
+
+    @staticmethod
+    def _normalize_whatsapp_id(value: Optional[str]) -> str:
+        if not value:
+            return ""
+        normalized = str(value).strip()
+        if ":" in normalized and "@" in normalized:
+            normalized = normalized.replace(":", "@", 1)
+        return normalized
+
+    def _bot_ids_from_message(self, data: Dict[str, Any]) -> set[str]:
+        bot_ids = set()
+        for candidate in data.get("botIds") or []:
+            normalized = self._normalize_whatsapp_id(candidate)
+            if normalized:
+                bot_ids.add(normalized)
+        return bot_ids
+
+    def _message_is_reply_to_bot(self, data: Dict[str, Any]) -> bool:
+        quoted_participant = self._normalize_whatsapp_id(data.get("quotedParticipant"))
+        if not quoted_participant:
+            return False
+        return quoted_participant in self._bot_ids_from_message(data)
+
+    def _message_mentions_bot(self, data: Dict[str, Any]) -> bool:
+        bot_ids = self._bot_ids_from_message(data)
+        if not bot_ids:
+            return False
+        mentioned_ids = {
+            nid
+            for candidate in (data.get("mentionedIds") or [])
+            if (nid := self._normalize_whatsapp_id(candidate))
+        }
+        if mentioned_ids & bot_ids:
+            return True
+
+        body = str(data.get("body") or "")
+        lower_body = body.lower()
+        for bot_id in bot_ids:
+            bare_id = bot_id.split("@", 1)[0].lower()
+            if bare_id and (f"@{bare_id}" in lower_body or bare_id in lower_body):
+                return True
+        return False
+
+    def _message_matches_mention_patterns(self, data: Dict[str, Any]) -> bool:
+        if not self._mention_patterns:
+            return False
+        body = str(data.get("body") or "")
+        return any(pattern.search(body) for pattern in self._mention_patterns)
+
+    def _clean_bot_mention_text(self, text: str, data: Dict[str, Any]) -> str:
+        if not text:
+            return text
+        bot_ids = self._bot_ids_from_message(data)
+        cleaned = text
+        for bot_id in bot_ids:
+            bare_id = bot_id.split("@", 1)[0]
+            if bare_id:
+                cleaned = re.sub(rf"@{re.escape(bare_id)}\b[,:\-]*\s*", "", cleaned)
+        return cleaned.strip() or text
+
+    def _should_process_message(self, data: Dict[str, Any]) -> bool:
+        if not data.get("isGroup"):
+            return True
+        chat_id = str(data.get("chatId") or "")
+        if chat_id in self._whatsapp_free_response_chats():
+            return True
+        if not self._whatsapp_require_mention():
+            return True
+        body = str(data.get("body") or "").strip()
+        if body.startswith("/"):
+            return True
+        if self._message_is_reply_to_bot(data):
+            return True
+        if self._message_mentions_bot(data):
+            return True
+        return self._message_matches_mention_patterns(data)
    
    async def connect(self) -> bool:
        """
@@ -687,6 +814,9 @@ class WhatsAppAdapter(BasePlatformAdapter):
    async def _build_message_event(self, data: Dict[str, Any]) -> Optional[MessageEvent]:
        """Build a MessageEvent from bridge message data, downloading images to cache."""
        try:
+            if not self._should_process_message(data):
+                return None
+
            # Determine message type
            msg_type = MessageType.TEXT
            if data.get("hasMedia"):
@@ -768,6 +898,8 @@ class WhatsAppAdapter(BasePlatformAdapter):
            # the message text so the agent can read it inline.
            # Cap at 100KB to match Telegram/Discord/Slack behaviour.
            body = data.get("body", "")
+            if data.get("isGroup"):
+                body = self._clean_bot_mention_text(body, data)
            MAX_TEXT_INJECT_BYTES = 100 * 1024
            if msg_type == MessageType.DOCUMENT and cached_urls:
                for doc_path in cached_urls:
@@ -303,6 +303,43 @@ def _resolve_runtime_agent_kwargs() -> dict:
    }


+def _build_media_placeholder(event) -> str:
+    """Build a text placeholder for media-only events so they aren't dropped.
+
+    When a photo/document is queued during active processing and later
+    dequeued, only .text is extracted.  If the event has no caption,
+    the media would be silently lost.  This builds a placeholder that
+    the vision enrichment pipeline will replace with a real description.
+    """
+    parts = []
+    media_urls = getattr(event, "media_urls", None) or []
+    media_types = getattr(event, "media_types", None) or []
+    for i, url in enumerate(media_urls):
+        mtype = media_types[i] if i < len(media_types) else ""
+        if mtype.startswith("image/") or getattr(event, "message_type", None) == MessageType.PHOTO:
+            parts.append(f"[User sent an image: {url}]")
+        elif mtype.startswith("audio/"):
+            parts.append(f"[User sent audio: {url}]")
+        else:
+            parts.append(f"[User sent a file: {url}]")
+    return "\n".join(parts)
+
+
+def _dequeue_pending_text(adapter, session_key: str) -> str | None:
+    """Consume and return the text of a pending queued message.
+
+    Preserves media context for captionless photo/document events by
+    building a placeholder so the message isn't silently dropped.
+    """
+    event = adapter.get_pending_message(session_key)
+    if not event:
+        return None
+    text = event.text
+    if not text and getattr(event, "media_urls", None):
+        text = _build_media_placeholder(event)
+    return text
+
+
 def _check_unavailable_skill(command_name: str) -> str | None:
    """Check if a command matches a known-but-inactive skill.

@@ -411,10 +448,14 @@ def _resolve_hermes_bin() -> Optional[list[str]]:
 class GatewayRunner:
    """
    Main gateway controller.
-    
+
    Manages the lifecycle of all platform adapters and routes
    messages to/from the agent.
    """
+
+    # Class-level defaults so partial construction in tests doesn't
+    # blow up on attribute access.
+    _running_agents_ts: Dict[str, float] = {}
    
    def __init__(self, config: Optional[GatewayConfig] = None):
        self.config = config or load_gateway_config()
@@ -446,6 +487,7 @@ class GatewayRunner:
        # Track running agents per session for interrupt support
        # Key: session_key, Value: AIAgent instance
        self._running_agents: Dict[str, Any] = {}
+        self._running_agents_ts: Dict[str, float] = {}  # start timestamp per session
        self._pending_messages: Dict[str, str] = {}  # Queued messages during interrupt

        # Cache AIAgent instances per session to preserve prompt caching.
@@ -1698,6 +1740,20 @@ class GatewayRunner:
        # simultaneous updates. Do NOT interrupt for photo-only follow-ups here;
        # let the adapter-level batching/queueing logic absorb them.
        _quick_key = self._session_key_for_source(source)
+
+        # Staleness eviction: if an entry has been in _running_agents for
+        # longer than the agent timeout, it's a leaked lock from a hung or
+        # crashed handler.  Evict it so the session isn't permanently stuck.
+        _STALE_TTL = float(os.getenv("HERMES_AGENT_TIMEOUT", 600)) + 60  # timeout + 1 min grace
+        _stale_ts = self._running_agents_ts.get(_quick_key, 0)
+        if _quick_key in self._running_agents and _stale_ts and (time.time() - _stale_ts) > _STALE_TTL:
+            logger.warning(
+                "Evicting stale _running_agents entry for %s (age: %.0fs)",
+                _quick_key[:30], time.time() - _stale_ts,
+            )
+            del self._running_agents[_quick_key]
+            self._running_agents_ts.pop(_quick_key, None)
+
        if _quick_key in self._running_agents:
            if event.get_command() == "status":
                return await self._handle_status_command(event)
@@ -1765,6 +1821,20 @@ class GatewayRunner:
                    adapter._pending_messages[_quick_key] = queued_event
                return "Queued for the next turn."

+            # /model must not be queued as an interrupt — it's a config change
+            # that requires no agent to be running.  Return a clear message.
+            if _cmd_def_inner and _cmd_def_inner.name == "model":
+                return "⏳ Agent is running — wait for it to finish or `/stop` first, then switch models."
+
+            # /approve and /deny must bypass the running-agent interrupt path.
+            # The agent thread is blocked on a threading.Event inside
+            # tools/approval.py — sending an interrupt won't unblock it.
+            # Route directly to the approval handler so the event is signalled.
+            if _cmd_def_inner and _cmd_def_inner.name in ("approve", "deny"):
+                if _cmd_def_inner.name == "approve":
+                    return await self._handle_approve_command(event)
+                return await self._handle_deny_command(event)
+
            if event.message_type == MessageType.PHOTO:
                logger.debug("PRIORITY photo follow-up for session %s — queueing without interrupt", _quick_key[:20])
                adapter = self.adapters.get(source.platform)
@@ -1855,6 +1925,9 @@ class GatewayRunner:
        if canonical == "yolo":
            return await self._handle_yolo_command(event)

+        if canonical == "model":
+            return await self._handle_model_command(event)
+
        if canonical == "provider":
            return await self._handle_provider_command(event)
        
@@ -1995,6 +2068,19 @@ class GatewayRunner:
                skill_cmds = get_skill_commands()
                cmd_key = f"/{command}"
                if cmd_key in skill_cmds:
+                    # Check per-platform disabled status before executing.
+                    # get_skill_commands() only applies the *global* disabled
+                    # list at scan time; per-platform overrides need checking
+                    # here because the cache is process-global across platforms.
+                    _skill_name = skill_cmds[cmd_key].get("name", "")
+                    _plat = source.platform.value if source.platform else None
+                    if _plat and _skill_name:
+                        from agent.skill_utils import get_disabled_skill_names as _get_plat_disabled
+                        if _skill_name in _get_plat_disabled(platform=_plat):
+                            return (
+                                f"The **{_skill_name}** skill is disabled for {_plat}.\n"
+                                f"Enable it with: `hermes skills config`"
+                            )
                    user_instruction = event.get_command_args().strip()
                    msg = build_skill_invocation_message(
                        cmd_key, user_instruction, task_id=_quick_key
@@ -2023,6 +2109,7 @@ class GatewayRunner:
        # "already running" guard and spin up a duplicate agent for the
        # same session — corrupting the transcript.
        self._running_agents[_quick_key] = _AGENT_PENDING_SENTINEL
+        self._running_agents_ts[_quick_key] = time.time()

        try:
            return await self._handle_message_with_agent(event, source, _quick_key)
@@ -2033,6 +2120,7 @@ class GatewayRunner:
            # not linger or the session would be permanently locked out.
            if self._running_agents.get(_quick_key) is _AGENT_PENDING_SENTINEL:
                del self._running_agents[_quick_key]
+            self._running_agents_ts.pop(_quick_key, None)

    async def _handle_message_with_agent(self, event, source, _quick_key: str):
        """Inner handler that runs under the _running_agents sentinel guard."""
@@ -2303,7 +2391,18 @@ class GatewayRunner:
                    # 85% * 1.4 = 119% of context — which exceeds the model's limit
                    # and prevented hygiene from ever firing for ~200K models (GLM-5).

-                _needs_compress = _approx_tokens >= _compress_token_threshold
+                # Hard safety valve: force compression if message count is
+                # extreme, regardless of token estimates.  This breaks the
+                # death spiral where API disconnects prevent token data
+                # collection, which prevents compression, which causes more
+                # disconnects.  400 messages is well above normal sessions
+                # but catches runaway growth before it becomes unrecoverable.
+                # (#2153)
+                _HARD_MSG_LIMIT = 400
+                _needs_compress = (
+                    _approx_tokens >= _compress_token_threshold
+                    or _msg_count >= _HARD_MSG_LIMIT
+                )

                if _needs_compress:
                    logger.info(
@@ -3136,6 +3235,130 @@ class GatewayRunner:
            lines.append(f"_(Requested page {requested_page} was out of range, showing page {page}.)_")
        return "\n".join(lines)
    
+    async def _handle_model_command(self, event: MessageEvent) -> str:
+        """Handle /model command — switch model mid-session.
+
+        Works across all gateway platforms (Telegram, Discord, Slack,
+        Matrix, WhatsApp, etc.) since they all route through the same
+        gateway command dispatch.
+        """
+        import yaml
+        from hermes_cli.models import _PROVIDER_LABELS, normalize_provider
+        from hermes_cli.model_switch import (
+            switch_model, switch_to_custom_provider,
+            MODEL_ALIASES, suggest_models,
+        )
+        from hermes_cli.config import save_config
+
+        raw_input = event.get_command_args().strip()
+
+        # Resolve current provider/model from config
+        config_path = _hermes_home / "config.yaml"
+        current_provider = "openrouter"
+        current_model = ""
+        current_base_url = ""
+        current_api_key = ""
+        try:
+            if config_path.exists():
+                with open(config_path, encoding="utf-8") as f:
+                    cfg = yaml.safe_load(f) or {}
+                model_cfg = cfg.get("model", {})
+                if isinstance(model_cfg, dict):
+                    current_provider = model_cfg.get("provider", "openrouter")
+                    current_model = model_cfg.get("default") or model_cfg.get("model", "")
+                    current_base_url = model_cfg.get("base_url", "")
+                elif isinstance(model_cfg, str):
+                    current_model = model_cfg
+        except Exception:
+            pass
+
+        current_provider = normalize_provider(current_provider)
+
+        # No argument → show current model and how to switch
+        if not raw_input:
+            provider_label = _PROVIDER_LABELS.get(current_provider, current_provider)
+            return (
+                f"🤖 **Current:** `{current_model}` via {provider_label}\n\n"
+                "**Aliases:** sonnet, opus, haiku, gpt5, gpt5-mini, codex, "
+                "gemini, deepseek, grok, qwen, minimax\n\n"
+                "**Full names:** `/model anthropic/claude-sonnet-4.5`\n"
+                "**Direct provider:** `/model anthropic:claude-opus-4`\n"
+                "**Custom endpoint:** `/model custom:my-local-model`"
+            )
+
+        # Handle bare "custom"
+        if raw_input.lower() == "custom":
+            custom_result = switch_to_custom_provider()
+            if not custom_result.success:
+                return f"❌ {custom_result.error_message}"
+            raw_input = f"custom:{custom_result.model}"
+
+        # Same model check (quick path)
+        if raw_input == current_model:
+            return f"Already using `{current_model}`"
+
+        # Run the shared switch pipeline
+        result = switch_model(
+            raw_input,
+            current_provider=current_provider,
+            current_model=current_model,
+            current_base_url=current_base_url,
+            current_api_key=current_api_key,
+        )
+
+        if not result.success:
+            # Try to suggest alternatives on failure
+            suggestions = suggest_models(raw_input, limit=3)
+            msg = f"❌ {result.error_message}"
+            if suggestions:
+                sug_str = ", ".join(f"`{s}`" for s in suggestions)
+                msg += f"\nDid you mean: {sug_str}?"
+            return msg
+
+        # Same model after resolution
+        if result.new_model == current_model and not result.provider_changed:
+            return f"Already using `{current_model}`"
+
+        # Persist to config.yaml
+        if result.persist:
+            try:
+                with open(config_path, encoding="utf-8") as f:
+                    cfg = yaml.safe_load(f) or {}
+                model_cfg = cfg.get("model", {})
+                if not isinstance(model_cfg, dict):
+                    model_cfg = {"default": model_cfg} if model_cfg else {}
+                model_cfg["default"] = result.new_model
+                model_cfg["provider"] = result.target_provider
+                if result.base_url:
+                    model_cfg["base_url"] = result.base_url
+                elif "base_url" in model_cfg and not result.is_custom_target:
+                    # Clear stale base_url when switching away from custom
+                    del model_cfg["base_url"]
+                cfg["model"] = model_cfg
+                save_config(cfg)
+            except Exception as e:
+                logger.warning("Failed to persist model switch: %s", e)
+
+        # Evict the cached agent so the next message creates a fresh one
+        # with the new model/provider configuration.
+        source = event.source
+        session_key = self._session_key_for_source(source)
+        self._evict_cached_agent(session_key)
+
+        # Format response
+        new_label = _PROVIDER_LABELS.get(result.target_provider, result.target_provider)
+        if result.resolved_via_alias:
+            lines = [f"✅ **{result.resolved_via_alias}** → `{result.new_model}` via {new_label}"]
+        else:
+            lines = [f"✅ Switched to `{result.new_model}` via {new_label}"]
+        if result.provider_changed:
+            old_label = _PROVIDER_LABELS.get(current_provider, current_provider)
+            lines.append(f"Provider: {old_label} → {new_label}")
+        if result.warning_message:
+            lines.append(f"⚠️ {result.warning_message}")
+        lines.append("Prompt cache reset (new model).")
+        return "\n".join(lines)
+
    async def _handle_provider_command(self, event: MessageEvent) -> str:
        """Handle /provider command - show available providers."""
        import yaml
@@ -5384,11 +5607,13 @@ class GatewayRunner:
            progress_lines = []      # Accumulated tool lines
            progress_msg_id = None   # ID of the progress message to edit
            can_edit = True          # False once an edit fails (platform doesn't support it)
+            _last_edit_ts = 0.0      # Throttle edits to avoid Telegram flood control
+            _PROGRESS_EDIT_INTERVAL = 1.5  # Minimum seconds between edits

            while True:
                try:
                    raw = progress_queue.get_nowait()
-                    
+
                    # Handle dedup messages: update last line with repeat counter
                    if isinstance(raw, tuple) and len(raw) == 3 and raw[0] == "__dedup__":
                        _, base_msg, count = raw
@@ -5399,6 +5624,19 @@ class GatewayRunner:
                        msg = raw
                        progress_lines.append(msg)

+                    # Throttle edits: batch rapid tool updates into fewer
+                    # API calls to avoid hitting Telegram flood control.
+                    # (grammY auto-retry pattern: proactively rate-limit
+                    # instead of reacting to 429s.)
+                    _now = time.monotonic()
+                    _remaining = _PROGRESS_EDIT_INTERVAL - (_now - _last_edit_ts)
+                    if _remaining > 0:
+                        # Wait out the throttle interval, then loop back to
+                        # drain any additional queued messages before sending
+                        # a single batched edit.
+                        await asyncio.sleep(_remaining)
+                        continue
+
                    if can_edit and progress_msg_id is not None:
                        # Try to edit the existing progress message
                        full_text = "\n".join(progress_lines)
@@ -5408,8 +5646,15 @@ class GatewayRunner:
                            content=full_text,
                        )
                        if not result.success:
-                            # Platform doesn't support editing — stop trying,
-                            # send just this new line as a separate message
+                            _err = (getattr(result, "error", "") or "").lower()
+                            if "flood" in _err or "retry after" in _err:
+                                # Flood control hit — disable further edits,
+                                # switch to sending new messages only for
+                                # important updates.  Don't block 23s.
+                                logger.info(
+                                    "[%s] Progress edits disabled due to flood control",
+                                    adapter.name,
+                                )
                            can_edit = False
                            await adapter.send(chat_id=source.chat_id, content=msg, metadata=_progress_metadata)
                    else:
@@ -5423,6 +5668,8 @@ class GatewayRunner:
                        if result.success and result.message_id:
                            progress_msg_id = result.message_id

+                    _last_edit_ts = time.monotonic()
+
                    # Restore typing indicator
                    await asyncio.sleep(0.3)
                    await adapter.send_typing(source.chat_id, metadata=_progress_metadata)
@@ -5468,15 +5715,25 @@ class GatewayRunner:
        _loop_for_step = asyncio.get_event_loop()
        _hooks_ref = self.hooks

-        def _step_callback_sync(iteration: int, tool_names: list) -> None:
+        def _step_callback_sync(iteration: int, prev_tools: list) -> None:
            try:
+                # prev_tools may be list[str] or list[dict] with "name"/"result"
+                # keys.  Normalise to keep "tool_names" backward-compatible for
+                # user-authored hooks that do ', '.join(tool_names)'.
+                _names: list[str] = []
+                for _t in (prev_tools or []):
+                    if isinstance(_t, dict):
+                        _names.append(_t.get("name") or "")
+                    else:
+                        _names.append(str(_t))
                asyncio.run_coroutine_threadsafe(
                    _hooks_ref.emit("agent:step", {
                        "platform": source.platform.value if source.platform else "",
                        "user_id": source.user_id,
                        "session_id": session_id,
                        "iteration": iteration,
-                        "tool_names": tool_names,
+                        "tool_names": _names,
+                        "tools": prev_tools,
                    }),
                    _loop_for_step,
                )
@@ -5726,10 +5983,39 @@ class GatewayRunner:
            from tools.approval import register_gateway_notify, unregister_gateway_notify

            def _approval_notify_sync(approval_data: dict) -> None:
-                """Send the approval request to the user from the agent thread."""
+                """Send the approval request to the user from the agent thread.
+
+                If the adapter supports interactive button-based approvals
+                (e.g. Discord's ``send_exec_approval``), use that for a richer
+                UX.  Otherwise fall back to a plain text message with
+                ``/approve`` instructions.
+                """
                cmd = approval_data.get("command", "")
-                cmd_preview = cmd[:200] + "..." if len(cmd) > 200 else cmd
                desc = approval_data.get("description", "dangerous command")
+
+                # Prefer button-based approval when the adapter supports it.
+                # Check the *class* for the method, not the instance — avoids
+                # false positives from MagicMock auto-attribute creation in tests.
+                if getattr(type(_status_adapter), "send_exec_approval", None) is not None:
+                    try:
+                        asyncio.run_coroutine_threadsafe(
+                            _status_adapter.send_exec_approval(
+                                chat_id=_status_chat_id,
+                                command=cmd,
+                                session_key=_approval_session_key,
+                                description=desc,
+                                metadata=_status_thread_metadata,
+                            ),
+                            _loop_for_step,
+                        ).result(timeout=15)
+                        return
+                    except Exception as _e:
+                        logger.warning(
+                            "Button-based approval failed, falling back to text: %s", _e
+                        )
+
+                # Fallback: plain text approval prompt
+                cmd_preview = cmd[:200] + "..." if len(cmd) > 200 else cmd
                msg = (
                    f"⚠️ **Dangerous command requires approval:**\n"
                    f"```\n{cmd_preview}\n```\n"
@@ -5933,9 +6219,38 @@ class GatewayRunner:
        interrupt_monitor = asyncio.create_task(monitor_for_interrupt())
        
        try:
-            # Run in thread pool to not block
+            # Run in thread pool to not block.  Cap total execution time
+            # so a hung API call or runaway tool doesn't permanently lock
+            # the session.  Default 10 minutes; override with env var.
+            _agent_timeout = float(os.getenv("HERMES_AGENT_TIMEOUT", 600))
            loop = asyncio.get_event_loop()
-            response = await loop.run_in_executor(None, run_sync)
+            try:
+                response = await asyncio.wait_for(
+                    loop.run_in_executor(None, run_sync),
+                    timeout=_agent_timeout,
+                )
+            except asyncio.TimeoutError:
+                logger.error(
+                    "Agent execution timed out after %.0fs for session %s",
+                    _agent_timeout, session_key,
+                )
+                # Interrupt the agent if it's still running so the thread
+                # pool worker is freed.
+                _timed_out_agent = agent_holder[0]
+                if _timed_out_agent and hasattr(_timed_out_agent, "interrupt"):
+                    _timed_out_agent.interrupt("Execution timed out")
+                response = {
+                    "final_response": (
+                        f"⏱️ Request timed out after {int(_agent_timeout // 60)} minutes. "
+                        "The agent may have been stuck on a tool or API call.\n"
+                        "Try again, or use /reset to start fresh."
+                    ),
+                    "messages": result_holder[0].get("messages", []) if result_holder[0] else [],
+                    "api_calls": 0,
+                    "tools": tools_holder[0] or [],
+                    "history_offset": 0,
+                    "failed": True,
+                }

            # Track fallback model state: if the agent switched to a
            # fallback model during this run, persist it so /model shows
@@ -5963,18 +6278,12 @@ class GatewayRunner:
            pending = None
            if result and adapter and session_key:
                if result.get("interrupted"):
-                    # Interrupted — consume the interrupt message
-                    pending_event = adapter.get_pending_message(session_key)
-                    if pending_event:
-                        pending = pending_event.text
-                    elif result.get("interrupt_message"):
+                    pending = _dequeue_pending_text(adapter, session_key)
+                    if not pending and result.get("interrupt_message"):
                        pending = result.get("interrupt_message")
                else:
-                    # Normal completion — check for /queue'd messages that were
-                    # stored without triggering an interrupt.
-                    pending_event = adapter.get_pending_message(session_key)
-                    if pending_event:
-                        pending = pending_event.text
+                    pending = _dequeue_pending_text(adapter, session_key)
+                    if pending:
                        logger.debug("Processing queued message after agent completion: '%s...'", pending[:40])
            
            if pending:
@@ -6050,6 +6359,8 @@ class GatewayRunner:
            tracking_task.cancel()
            if session_key and session_key in self._running_agents:
                del self._running_agents[session_key]
+            if session_key:
+                self._running_agents_ts.pop(session_key, None)
            
            # Wait for cancelled tasks
            for task in [progress_task, interrupt_monitor, tracking_task]:
@@ -174,12 +174,12 @@ class GatewayStreamConsumer:
                        self._already_sent = True
                        self._last_sent_text = text
                    else:
-                        # Edit not supported by this adapter — stop streaming,
-                        # let the normal send path handle the final response.
-                        # Without this guard, adapters like Signal/Email would
-                        # flood the chat with a new message every edit_interval.
+                        # If an edit fails mid-stream (especially Telegram flood control),
+                        # stop progressive edits and let the normal final send path deliver
+                        # the complete answer instead of leaving the user with a partial.
                        logger.debug("Edit failed, disabling streaming for this adapter")
                        self._edit_supported = False
+                        self._already_sent = False
                else:
                    # Editing not supported — skip intermediate updates.
                    # The final response will be sent by the normal path.
@@ -11,5 +11,5 @@ Provides subcommands for:
 - hermes cron          - Manage cron jobs
 """

-__version__ = "0.6.0"
-__release_date__ = "2026.3.30"
+__version__ = "0.7.0"
+__release_date__ = "2026.4.3"
@@ -82,6 +82,8 @@ COMMAND_REGISTRY: list[CommandDef] = [
    # Configuration
    CommandDef("config", "Show current configuration", "Configuration",
               cli_only=True),
+    CommandDef("model", "Switch model mid-session (e.g. /model claude-sonnet-4 or /model openai:gpt-5)",
+               "Configuration", args_hint="[provider:model]"),
    CommandDef("provider", "Show available providers and current provider",
               "Configuration"),
    CommandDef("prompt", "View/set custom system prompt", "Configuration",
@@ -414,6 +416,8 @@ def telegram_menu_commands(max_commands: int = 100) -> tuple[list[tuple[str, str

    Skills are the only tier that gets trimmed when the cap is hit.
    User-installed hub skills are excluded — accessible via /skills.
+    Skills disabled for the ``"telegram"`` platform (via ``hermes skills
+    config``) are excluded from the menu entirely.

    Returns:
        (menu_commands, hidden_count) where hidden_count is the number of
@@ -444,6 +448,17 @@ def telegram_menu_commands(max_commands: int = 100) -> tuple[list[tuple[str, str
    reserved_names.update(n for n, _ in plugin_entries)
    all_commands.extend(plugin_entries)

+    # Load per-platform disabled skills so they don't consume menu slots.
+    # get_skill_commands() already filters the *global* disabled list, but
+    # per-platform overrides (skills.platform_disabled.telegram) were never
+    # applied here — that's what this block fixes.
+    _platform_disabled: set[str] = set()
+    try:
+        from agent.skill_utils import get_disabled_skill_names
+        _platform_disabled = get_disabled_skill_names(platform="telegram")
+    except Exception:
+        pass
+
    # Remaining slots go to built-in skill commands (not hub-installed).
    skill_entries: list[tuple[str, str]] = []
    try:
@@ -459,6 +474,10 @@ def telegram_menu_commands(max_commands: int = 100) -> tuple[list[tuple[str, str
                continue
            if skill_path.startswith(_hub_dir):
                continue
+            # Skip skills disabled for telegram
+            skill_name = info.get("name", "")
+            if skill_name in _platform_disabled:
+                continue
            name = cmd_key.lstrip("/").replace("-", "_")
            desc = info.get("description", "")
            # Keep descriptions short — setMyCommands has an undocumented
@@ -258,8 +258,11 @@ def _system_service_identity(run_as_user: str | None = None) -> tuple[str, str,
    username = (run_as_user or os.getenv("SUDO_USER") or os.getenv("USER") or os.getenv("LOGNAME") or getpass.getuser()).strip()
    if not username:
        raise ValueError("Could not determine which user the gateway service should run as")
+    if username == "root" and not run_as_user:
+        raise ValueError("Refusing to install the gateway system service as root; pass --run-as-user root to override (e.g. in LXC containers)")
    if username == "root":
-        raise ValueError("Refusing to install the gateway system service as root; pass --run-as USER")
+        print_warning("Installing gateway service to run as root.")
+        print_info("  This is fine for LXC/container environments but not recommended on bare-metal hosts.")

    try:
        user_info = pwd.getpwnam(username)
@@ -321,9 +324,9 @@ def install_linux_gateway_from_setup(force: bool = False) -> tuple[str | None, b
            while True:
                run_as_user = prompt("  Run the system gateway service as which user?", default="")
                run_as_user = (run_as_user or "").strip()
-                if run_as_user and run_as_user != "root":
+                if run_as_user:
                    break
-                print_error("  Enter a non-root username.")
+                print_error("  Enter a username.")

        systemd_install(force=force, system=True, run_as_user=run_as_user)
        return scope, True
@@ -2682,6 +2682,20 @@ def _stash_local_changes_if_needed(git_cmd: list[str], cwd: Path) -> Optional[st
    if not status.stdout.strip():
        return None

+    # If the index has unmerged entries (e.g. from an interrupted merge/rebase),
+    # git stash will fail with "needs merge / could not write index".  Clear the
+    # conflict state with `git reset` so the stash can proceed.  Working-tree
+    # changes are preserved; only the index conflict markers are dropped.
+    unmerged = subprocess.run(
+        git_cmd + ["ls-files", "--unmerged"],
+        cwd=cwd,
+        capture_output=True,
+        text=True,
+    )
+    if unmerged.stdout.strip():
+        print("→ Clearing unmerged index entries from a previous conflict...")
+        subprocess.run(git_cmd + ["reset"], cwd=cwd, capture_output=True)
+
    from datetime import datetime, timezone

    stash_name = datetime.now(timezone.utc).strftime("hermes-update-autostash-%Y%m%d-%H%M%S")
@@ -2835,6 +2849,231 @@ def _restore_stashed_changes(
    print("  Review `git diff` / `git status` if Hermes behaves unexpectedly.")
    return True

+# =========================================================================
+# Fork detection and upstream management for `hermes update`
+# =========================================================================
+
+OFFICIAL_REPO_URLS = {
+    "https://github.com/NousResearch/hermes-agent.git",
+    "git@github.com:NousResearch/hermes-agent.git",
+    "https://github.com/NousResearch/hermes-agent",
+    "git@github.com:NousResearch/hermes-agent",
+}
+OFFICIAL_REPO_URL = "https://github.com/NousResearch/hermes-agent.git"
+SKIP_UPSTREAM_PROMPT_FILE = ".skip_upstream_prompt"
+
+
+def _get_origin_url(git_cmd: list[str], cwd: Path) -> Optional[str]:
+    """Get the URL of the origin remote, or None if not set."""
+    try:
+        result = subprocess.run(
+            git_cmd + ["remote", "get-url", "origin"],
+            cwd=cwd,
+            capture_output=True,
+            text=True,
+        )
+        if result.returncode == 0:
+            return result.stdout.strip()
+    except Exception:
+        pass
+    return None
+
+
+def _is_fork(origin_url: Optional[str]) -> bool:
+    """Check if the origin remote points to a fork (not the official repo)."""
+    if not origin_url:
+        return False
+    # Normalize URL for comparison (strip trailing .git if present)
+    normalized = origin_url.rstrip("/")
+    if normalized.endswith(".git"):
+        normalized = normalized[:-4]
+    for official in OFFICIAL_REPO_URLS:
+        official_normalized = official.rstrip("/")
+        if official_normalized.endswith(".git"):
+            official_normalized = official_normalized[:-4]
+        if normalized == official_normalized:
+            return False
+    return True
+
+
+def _has_upstream_remote(git_cmd: list[str], cwd: Path) -> bool:
+    """Check if an 'upstream' remote already exists."""
+    try:
+        result = subprocess.run(
+            git_cmd + ["remote", "get-url", "upstream"],
+            cwd=cwd,
+            capture_output=True,
+            text=True,
+        )
+        return result.returncode == 0
+    except Exception:
+        return False
+
+
+def _add_upstream_remote(git_cmd: list[str], cwd: Path) -> bool:
+    """Add the official repo as the 'upstream' remote. Returns True on success."""
+    try:
+        result = subprocess.run(
+            git_cmd + ["remote", "add", "upstream", OFFICIAL_REPO_URL],
+            cwd=cwd,
+            capture_output=True,
+            text=True,
+        )
+        return result.returncode == 0
+    except Exception:
+        return False
+
+
+def _count_commits_between(git_cmd: list[str], cwd: Path, base: str, head: str) -> int:
+    """Count commits on `head` that are not on `base`. Returns -1 on error."""
+    try:
+        result = subprocess.run(
+            git_cmd + ["rev-list", "--count", f"{base}..{head}"],
+            cwd=cwd,
+            capture_output=True,
+            text=True,
+        )
+        if result.returncode == 0:
+            return int(result.stdout.strip())
+    except Exception:
+        pass
+    return -1
+
+
+def _should_skip_upstream_prompt() -> bool:
+    """Check if user previously declined to add upstream."""
+    from hermes_constants import get_hermes_home
+    return (get_hermes_home() / SKIP_UPSTREAM_PROMPT_FILE).exists()
+
+
+def _mark_skip_upstream_prompt():
+    """Create marker file to skip future upstream prompts."""
+    try:
+        from hermes_constants import get_hermes_home
+        (get_hermes_home() / SKIP_UPSTREAM_PROMPT_FILE).touch()
+    except Exception:
+        pass
+
+
+def _sync_fork_with_upstream(git_cmd: list[str], cwd: Path) -> bool:
+    """Attempt to push updated main to origin (sync fork).
+
+    Returns True if push succeeded, False otherwise.
+    """
+    try:
+        result = subprocess.run(
+            git_cmd + ["push", "origin", "main", "--force-with-lease"],
+            cwd=cwd,
+            capture_output=True,
+            text=True,
+        )
+        return result.returncode == 0
+    except Exception:
+        return False
+
+
+def _sync_with_upstream_if_needed(git_cmd: list[str], cwd: Path) -> None:
+    """Check if fork is behind upstream and sync if safe.
+
+    This implements the fork upstream sync logic:
+    - If upstream remote doesn't exist, ask user if they want to add it
+    - Compare origin/main with upstream/main
+    - If origin/main is strictly behind upstream/main, pull from upstream
+    - Try to sync fork back to origin if possible
+    """
+    has_upstream = _has_upstream_remote(git_cmd, cwd)
+
+    if not has_upstream:
+        # Check if user previously declined
+        if _should_skip_upstream_prompt():
+            return
+
+        # Ask user if they want to add upstream
+        print()
+        print("ℹ Your fork is not tracking the official Hermes repository.")
+        print("  This means you may miss updates from NousResearch/hermes-agent.")
+        print()
+        try:
+            response = input("Add official repo as 'upstream' remote? [Y/n]: ").strip().lower()
+        except (EOFError, KeyboardInterrupt):
+            print()
+            response = "n"
+
+        if response in ("", "y", "yes"):
+            print("→ Adding upstream remote...")
+            if _add_upstream_remote(git_cmd, cwd):
+                print("  ✓ Added upstream: https://github.com/NousResearch/hermes-agent.git")
+                has_upstream = True
+            else:
+                print("  ✗ Failed to add upstream remote. Skipping upstream sync.")
+                return
+        else:
+            print("  Skipped. Run 'git remote add upstream https://github.com/NousResearch/hermes-agent.git' to add later.")
+            _mark_skip_upstream_prompt()
+            return
+
+    # Fetch upstream
+    print()
+    print("→ Fetching upstream...")
+    try:
+        subprocess.run(
+            git_cmd + ["fetch", "upstream", "--quiet"],
+            cwd=cwd,
+            capture_output=True,
+            check=True,
+        )
+    except subprocess.CalledProcessError:
+        print("  ✗ Failed to fetch upstream. Skipping upstream sync.")
+        return
+
+    # Compare origin/main with upstream/main
+    origin_ahead = _count_commits_between(git_cmd, cwd, "upstream/main", "origin/main")
+    upstream_ahead = _count_commits_between(git_cmd, cwd, "origin/main", "upstream/main")
+
+    if origin_ahead < 0 or upstream_ahead < 0:
+        print("  ✗ Could not compare branches. Skipping upstream sync.")
+        return
+
+    # If origin/main has commits not on upstream, don't trample
+    if origin_ahead > 0:
+        print()
+        print(f"ℹ Your fork has {origin_ahead} commit(s) not on upstream.")
+        print("  Skipping upstream sync to preserve your changes.")
+        print("  If you want to merge upstream changes, run:")
+        print("    git pull upstream main")
+        return
+
+    # If upstream is not ahead, fork is up to date
+    if upstream_ahead == 0:
+        print("  ✓ Fork is up to date with upstream")
+        return
+
+    # origin/main is strictly behind upstream/main (can fast-forward)
+    print()
+    print(f"→ Fork is {upstream_ahead} commit(s) behind upstream")
+    print("→ Pulling from upstream...")
+
+    try:
+        subprocess.run(
+            git_cmd + ["pull", "--ff-only", "upstream", "main"],
+            cwd=cwd,
+            check=True,
+        )
+    except subprocess.CalledProcessError:
+        print("  ✗ Failed to pull from upstream. You may need to resolve conflicts manually.")
+        return
+
+    print("  ✓ Updated from upstream")
+
+    # Try to sync fork back to origin
+    print("→ Syncing fork...")
+    if _sync_fork_with_upstream(git_cmd, cwd):
+        print("  ✓ Fork synced with upstream")
+    else:
+        print("  ℹ Got updates from upstream but couldn't push to fork (no write access?)")
+        print("    Your local repo is updated, but your fork on GitHub may be behind.")
+
+
 def _invalidate_update_cache():
    """Delete the update-check cache for ALL profiles so no banner
    reports a stale "commits behind" count after a successful update.
@@ -2971,6 +3210,20 @@ def cmd_update(args):
            cwd=PROJECT_ROOT, check=False, capture_output=True
        )

+    # Build git command once — reused for fork detection and the update itself.
+    git_cmd = ["git"]
+    if sys.platform == "win32":
+        git_cmd = ["git", "-c", "windows.appendAtomically=false"]
+
+    # Detect if we're updating from a fork (before any branch logic)
+    origin_url = _get_origin_url(git_cmd, PROJECT_ROOT)
+    is_fork = _is_fork(origin_url)
+
+    if is_fork:
+        print("⚠ Updating from fork:")
+        print(f"  {origin_url}")
+        print()
+
    if use_zip_update:
        # ZIP-based update for Windows when git is broken
        _update_via_zip(args)
@@ -2978,9 +3231,6 @@ def cmd_update(args):

    # Fetch and pull
    try:
-        git_cmd = ["git"]
-        if sys.platform == "win32":
-            git_cmd = ["git", "-c", "windows.appendAtomically=false"]

        print("→ Fetching updates...")
        fetch_result = subprocess.run(
@@ -3111,6 +3361,10 @@ def cmd_update(args):
        removed = _clear_bytecode_cache(PROJECT_ROOT)
        if removed:
            print(f"  ✓ Cleared {removed} stale __pycache__ director{'y' if removed == 1 else 'ies'}")
+
+        # Fork upstream sync logic (only for main branch on forks)
+        if is_fork and branch == "main":
+            _sync_with_upstream_if_needed(git_cmd, PROJECT_ROOT)
        
        # Reinstall Python dependencies. Prefer .[all], but if one optional extra
        # breaks on this machine, keep base deps and reinstall the remaining extras
@@ -3269,8 +3523,8 @@ def cmd_update(args):
            from gateway.status import get_running_pid, remove_pid_file
            from hermes_cli.gateway import (
                get_service_name, get_launchd_plist_path, is_macos, is_linux,
-                refresh_launchd_plist_if_needed,
-                _ensure_user_systemd_env, get_systemd_linger_status,
+                launchd_restart, _ensure_user_systemd_env,
+                get_systemd_linger_status,
            )
            import signal as _signal

@@ -3374,26 +3628,15 @@ def cmd_update(args):
                        print("  System services may require root.  Try:")
                        print(f"    sudo systemctl restart {_gw_service_name}")
                elif has_launchd_service:
-                    # Refresh the plist first (picks up --replace and other
-                    # changes from the update we just pulled).
-                    refresh_launchd_plist_if_needed()
-                    # Explicit stop+start — don't rely on KeepAlive respawn
-                    # after a manual SIGTERM, which would race with the
-                    # PID file cleanup.
+                    # Use the shared launchd restart helper so we wait for the
+                    # old gateway process to fully exit before starting the new
+                    # one. This avoids stop/start races during self-update.
                    print("→ Restarting gateway service...")
-                    _launchd_label = get_launchd_label()
-                    stop = subprocess.run(
-                        ["launchctl", "stop", _launchd_label],
-                        capture_output=True, text=True, timeout=10,
-                    )
-                    start = subprocess.run(
-                        ["launchctl", "start", _launchd_label],
-                        capture_output=True, text=True, timeout=10,
-                    )
-                    if start.returncode == 0:
-                        print("✓ Gateway restarted via launchd.")
-                    else:
-                        print(f"⚠ Gateway restart failed: {start.stderr.strip()}")
+                    try:
+                        launchd_restart()
+                    except subprocess.CalledProcessError as e:
+                        stderr = (getattr(e, "stderr", "") or "").strip()
+                        print(f"⚠ Gateway restart failed: {stderr}")
                        print("  Try manually: hermes gateway restart")
                elif existing_pid:
                    try:
@@ -1,25 +1,90 @@
-"""Shared model-switching logic for CLI and gateway /model commands.
+"""Mid-chat model switching pipeline for CLI and gateway.

-Both the CLI (cli.py) and gateway (gateway/run.py) /model handlers
-share the same core pipeline:
-
-  parse_model_input → is_custom detection → auto-detect provider
-  → credential resolution → validate model → return result
-
-This module extracts that shared pipeline into pure functions that
-return result objects. The callers handle all platform-specific
-concerns: state mutation, config persistence, output formatting.
+Core design: aliases resolve to an abstract model identity, then the
+pipeline formats it for whatever provider you're currently on.  Typing
+'/model sonnet' on OpenRouter gives you 'anthropic/claude-sonnet-4.6'.
+Typing it on native Anthropic gives you 'claude-sonnet-4-6'.  Same
+intent, correct name for each provider.
 """

 from __future__ import annotations

 from dataclasses import dataclass
+from difflib import get_close_matches
+from typing import Optional


+# ═══════════════════════════════════════════════════════════════════════
+# Model aliases — abstract identities, not provider-specific names
+# ═══════════════════════════════════════════════════════════════════════
+
+@dataclass(frozen=True)
+class ModelIdentity:
+    """Abstract model identity resolved dynamically from catalogs.
+
+    ``vendor`` + ``family`` define WHAT model you want.  The actual
+    version is resolved at runtime from the provider's catalog — so
+    "sonnet" always means the latest sonnet, not a hardcoded version.
+    """
+    vendor: str    # openai, anthropic, google, etc.
+    family: str    # prefix to match: "claude-sonnet", "gpt-5", etc.
+
+
+# Maps short alias → model family.  NO version numbers here — the
+# catalog is searched at runtime for the first match, which is the
+# latest/recommended version.
+MODEL_ALIASES: dict[str, ModelIdentity] = {
+    # Anthropic Claude
+    "opus":            ModelIdentity("anthropic", "claude-opus"),
+    "sonnet":          ModelIdentity("anthropic", "claude-sonnet"),
+    "haiku":           ModelIdentity("anthropic", "claude-haiku"),
+    "claude":          ModelIdentity("anthropic", "claude-opus"),
+
+    # OpenAI GPT
+    "gpt5":            ModelIdentity("openai", "gpt-5"),
+    "gpt-5":           ModelIdentity("openai", "gpt-5"),
+    "gpt5-mini":       ModelIdentity("openai", "gpt-5-mini"),    # family suffix narrows it
+    "gpt5-pro":        ModelIdentity("openai", "gpt-5-pro"),
+    "gpt5-nano":       ModelIdentity("openai", "gpt-5-nano"),
+    "codex":           ModelIdentity("openai", "codex"),
+
+    # Google Gemini
+    "gemini":          ModelIdentity("google", "gemini"),
+    "gemini-pro":      ModelIdentity("google", "gemini-pro"),
+    "gemini-flash":    ModelIdentity("google", "gemini-flash"),
+
+    # Others — family is broad enough to pick the latest
+    "deepseek":        ModelIdentity("deepseek", "deepseek-chat"),
+    "qwen":            ModelIdentity("qwen", "qwen"),
+    "grok":            ModelIdentity("x-ai", "grok"),
+    "glm":             ModelIdentity("z-ai", "glm"),
+    "kimi":            ModelIdentity("moonshotai", "kimi"),
+    "minimax":         ModelIdentity("minimax", "minimax-m2"),
+    "mimo":            ModelIdentity("xiaomi", "mimo"),
+    "nemotron":        ModelIdentity("nvidia", "nemotron"),
+}
+
+# Providers that use vendor/model slug format
+_AGGREGATOR_PROVIDERS = {"openrouter", "nous", "ai-gateway", "kilocode"}
+
+# Providers that use hyphens instead of dots in model names
+_HYPHEN_PROVIDERS = {"anthropic", "opencode-zen", "opencode-go"}
+
+# Common vendor prefixes on OpenRouter
+_OPENROUTER_VENDORS = {
+    "openai", "anthropic", "google", "deepseek", "meta", "mistral",
+    "qwen", "minimax", "x-ai", "z-ai", "moonshotai", "nvidia",
+    "xiaomi", "stepfun", "arcee-ai", "cohere", "databricks",
+}
+
+
+# ═══════════════════════════════════════════════════════════════════════
+# Result types
+# ═══════════════════════════════════════════════════════════════════════
+
@dataclass
 class ModelSwitchResult:
    """Result of a model switch attempt."""
-
    success: bool
    new_model: str = ""
    target_provider: str = ""
@@ -32,12 +97,12 @@ class ModelSwitchResult:
    warning_message: str = ""
    is_custom_target: bool = False
    provider_label: str = ""
+    resolved_via_alias: str = ""


@dataclass
 class CustomAutoResult:
-    """Result of switching to bare 'custom' provider with auto-detect."""
-
+    """Result of switching to bare 'custom' with auto-detect."""
    success: bool
    model: str = ""
    base_url: str = ""
@@ -45,158 +110,378 @@ class CustomAutoResult:
    error_message: str = ""


+# ═══════════════════════════════════════════════════════════════════════
+# Provider-aware alias resolution
+# ═══════════════════════════════════════════════════════════════════════
+
+def _find_in_catalog(
+    identity: ModelIdentity,
+    provider: str,
+) -> Optional[str]:
+    """Find the best matching model in a provider's catalog.
+
+    Searches for the first model whose bare name starts with the
+    identity's family prefix.  Catalogs are ordered by recommendation,
+    so the first match is the latest/best version.
+
+    Returns the model name in the provider's native format, or None.
+    """
+    from hermes_cli.models import OPENROUTER_MODELS, _PROVIDER_MODELS
+
+    family = identity.family.lower()
+    vendor = identity.vendor.lower()
+
+    # Split family into tokens for flexible matching.
+    # "gpt-5-mini" → ["gpt", "5", "mini"] — matches "gpt-5.4-mini"
+    family_tokens = [t for t in family.replace(".", "-").split("-") if t]
+
+    def _tokens_match(name: str) -> bool:
+        """Check if all family tokens appear in the model name."""
+        nl = name.lower()
+        return all(t in nl for t in family_tokens)
+
+    if provider in _AGGREGATOR_PROVIDERS:
+        prefix = f"{vendor}/{family}"
+        # 1. Prefix match (strongest)
+        for slug, _ in OPENROUTER_MODELS:
+            if slug.lower().startswith(prefix):
+                return slug
+        # 2. Token match — all family tokens present + correct vendor
+        for slug, _ in OPENROUTER_MODELS:
+            if slug.lower().startswith(f"{vendor}/") and _tokens_match(slug):
+                return slug
+        return None
+
+    # Non-aggregator providers
+    catalog = _PROVIDER_MODELS.get(provider, [])
+    # 1. Prefix match
+    for model_name in catalog:
+        bare = model_name.lower()
+        if "/" in bare:
+            bare = bare.split("/", 1)[1]
+        if bare.startswith(family):
+            return model_name
+    # 2. Token match
+    for model_name in catalog:
+        if _tokens_match(model_name):
+            return model_name
+
+    return None
+
+
+def resolve_alias(
+    raw_input: str,
+    current_provider: str = "openrouter",
+) -> Optional[tuple[str, str, str]]:
+    """Resolve a short alias to (provider, model_name, alias_used).
+
+    Dynamically searches the current provider's catalog for the latest
+    model matching the alias's family prefix:
+    - 'sonnet' on OpenRouter → first catalog entry starting with
+      'anthropic/claude-sonnet' → ('openrouter', 'anthropic/claude-sonnet-4.6', 'sonnet')
+    - 'sonnet' on Anthropic → first entry starting with 'claude-sonnet'
+      → ('anthropic', 'claude-sonnet-4-6', 'sonnet')
+    - 'gpt5' on Anthropic → no GPT in Anthropic catalog → None
+    """
+    key = raw_input.strip().lower()
+    if key not in MODEL_ALIASES:
+        return None
+
+    identity = MODEL_ALIASES[key]
+    match = _find_in_catalog(identity, current_provider)
+
+    if match:
+        return (current_provider, match, key)
+
+    # Not found on current provider — return None so the pipeline
+    # can try fallback providers or cross-provider detection
+    return None
+
+
+# ═══════════════════════════════════════════════════════════════════════
+# Fuzzy suggestions
+# ═══════════════════════════════════════════════════════════════════════
+
+def suggest_models(raw_input: str, limit: int = 3) -> list[str]:
+    """Suggest similar model names when input doesn't match."""
+    from hermes_cli.models import OPENROUTER_MODELS, _PROVIDER_MODELS
+
+    candidates: list[str] = list(MODEL_ALIASES.keys())
+
+    for model_id, _ in OPENROUTER_MODELS:
+        candidates.append(model_id)
+        if "/" in model_id:
+            candidates.append(model_id.split("/", 1)[1])
+
+    for models in _PROVIDER_MODELS.values():
+        for m in models:
+            candidates.append(m)
+            if "/" in m:
+                candidates.append(m.split("/", 1)[1])
+
+    seen: set[str] = set()
+    unique: list[str] = []
+    for c in candidates:
+        cl = c.lower()
+        if cl not in seen:
+            seen.add(cl)
+            unique.append(c)
+
+    query = raw_input.strip().lower()
+    matches = get_close_matches(query, [c.lower() for c in unique], n=limit, cutoff=0.5)
+    lower_to_orig = {c.lower(): c for c in unique}
+    return [lower_to_orig.get(m, m) for m in matches]
+
+
+# ═══════════════════════════════════════════════════════════════════════
+# Aggregator-aware model resolution
+# ═══════════════════════════════════════════════════════════════════════
+
+def _resolve_on_aggregator(
+    raw_model: str,
+    current_provider: str,
+) -> Optional[str]:
+    """Try to resolve a bare model name within an aggregator.
+
+    Prevents bare names from triggering unwanted provider switches.
+    """
+    from hermes_cli.models import OPENROUTER_MODELS
+
+    model_lower = raw_model.lower()
+
+    slugs = [m for m, _ in OPENROUTER_MODELS]
+    slug_lower = {m.lower(): m for m in slugs}
+    bare_to_slug: dict[str, str] = {}
+    for s in slugs:
+        if "/" in s:
+            bare = s.split("/", 1)[1].lower()
+            bare_to_slug[bare] = s
+
+    # Exact match on full slug
+    if model_lower in slug_lower:
+        return slug_lower[model_lower]
+
+    # Exact match on bare name
+    if model_lower in bare_to_slug:
+        return bare_to_slug[model_lower]
+
+    # Already has vendor/ prefix — accept on aggregator
+    if "/" in raw_model:
+        vendor = raw_model.split("/", 1)[0].lower()
+        if vendor in _OPENROUTER_VENDORS:
+            return raw_model
+
+    # Try prepending vendor prefixes
+    for vendor in _OPENROUTER_VENDORS:
+        candidate = f"{vendor}/{raw_model}"
+        if candidate.lower() in slug_lower:
+            return slug_lower[candidate.lower()]
+
+    # Fuzzy match on bare names
+    close = get_close_matches(model_lower, list(bare_to_slug.keys()), n=1, cutoff=0.75)
+    if close:
+        return bare_to_slug[close[0]]
+
+    return None
+
+
+# ═══════════════════════════════════════════════════════════════════════
+# Core switch pipeline
+# ═══════════════════════════════════════════════════════════════════════
+
 def switch_model(
    raw_input: str,
    current_provider: str,
+    current_model: str = "",
    current_base_url: str = "",
    current_api_key: str = "",
 ) -> ModelSwitchResult:
-    """Core model-switching pipeline shared between CLI and gateway.
+    """Core model-switching pipeline.

-    Handles parsing, provider detection, credential resolution, and
-    model validation.  Does NOT handle config persistence, state
-    mutation, or output formatting — those are caller responsibilities.
-
-    Args:
-        raw_input: The user's model input (e.g. "claude-sonnet-4",
-            "zai:glm-5", "custom:local:qwen").
-        current_provider: The currently active provider.
-        current_base_url: The currently active base URL (used for
-            is_custom detection).
-        current_api_key: The currently active API key.
-
-    Returns:
-        ModelSwitchResult with all information the caller needs to
-        apply the switch and format output.
+    Key behavior: aliases and bare names resolve on your CURRENT provider.
+    '/model sonnet' on Anthropic gives you claude-sonnet-4-6 on Anthropic.
+    '/model sonnet' on OpenRouter gives you anthropic/claude-sonnet-4.6.
+    Only explicit provider:model syntax switches providers.
    """
    from hermes_cli.models import (
        parse_model_input,
        detect_provider_for_model,
-        validate_requested_model,
        _PROVIDER_LABELS,
+        _PROVIDER_MODELS,
+        _KNOWN_PROVIDER_NAMES,
+        OPENROUTER_MODELS,
        opencode_model_api_mode,
    )
    from hermes_cli.runtime_provider import resolve_runtime_provider

-    # Step 1: Parse provider:model syntax
-    target_provider, new_model = parse_model_input(raw_input, current_provider)
+    stripped = raw_input.strip()
+    if not stripped:
+        return ModelSwitchResult(
+            success=False,
+            error_message="No model specified. Usage: /model <name> or /model provider:model",
+        )

-    # Step 2: Detect if we're currently on a custom endpoint
+    on_aggregator = current_provider in _AGGREGATOR_PROVIDERS
+
+    # ── Step 1: Alias resolution (provider-aware) ──
+    alias_result = resolve_alias(stripped, current_provider)
+    resolved_alias = ""
+    if alias_result:
+        target_provider, new_model, resolved_alias = alias_result
+    else:
+        # Check if this was an alias that's unavailable on the current provider
+        key = stripped.strip().lower()
+        if key in MODEL_ALIASES:
+            identity = MODEL_ALIASES[key]
+            # Model isn't available on current provider — find one that has it
+            # Try aggregators first (most likely to have everything)
+            for fallback in ["openrouter", "nous"]:
+                if fallback != current_provider:
+                    fallback_match = _find_in_catalog(identity, fallback)
+                    if not fallback_match:
+                        continue
+                    try:
+                        runtime = resolve_runtime_provider(requested=fallback)
+                        if runtime.get("api_key"):
+                            fallback_label = _PROVIDER_LABELS.get(fallback, fallback)
+                            current_label = _PROVIDER_LABELS.get(current_provider, current_provider)
+                            return ModelSwitchResult(
+                                success=True,
+                                new_model=fallback_match,
+                                target_provider=fallback,
+                                provider_changed=True,
+                                api_key=runtime.get("api_key", ""),
+                                base_url=runtime.get("base_url", ""),
+                                api_mode=runtime.get("api_mode", ""),
+                                persist=True,
+                                warning_message=(
+                                    f"{identity.family} isn't available on "
+                                    f"{current_label} — switching to {fallback_label}."
+                                ),
+                                provider_label=fallback_label,
+                                resolved_via_alias=key,
+                            )
+                    except Exception:
+                        continue
+            return ModelSwitchResult(
+                success=False,
+                error_message=(
+                    f"{identity.family} isn't available on {current_provider} "
+                    f"and no fallback provider is configured."
+                ),
+            )
+
+        # ── Step 2: Vendor:model on aggregators ──
+        if on_aggregator and ":" in stripped:
+            left, right = stripped.split(":", 1)
+            left_lower = left.strip().lower()
+            if left_lower in _OPENROUTER_VENDORS and left_lower not in _KNOWN_PROVIDER_NAMES:
+                target_provider = current_provider
+                new_model = f"{left.strip()}/{right.strip()}"
+            else:
+                target_provider, new_model = parse_model_input(stripped, current_provider)
+        else:
+            # ── Step 3: Standard parse ──
+            target_provider, new_model = parse_model_input(stripped, current_provider)
+
+    if not new_model:
+        return ModelSwitchResult(
+            success=False,
+            error_message="No model name provided. Usage: /model <name> or /model provider:model",
+        )
+
+    # ── Step 4: Aggregator-aware resolution ──
    _base = current_base_url or ""
    is_custom = current_provider == "custom" or (
        "localhost" in _base or "127.0.0.1" in _base
    )

-    # Step 3: Auto-detect provider when no explicit provider:model syntax
-    # was used.  Skip for custom providers — the model name might
-    # coincidentally match a known provider's catalog.
-    if target_provider == current_provider and not is_custom:
+    if not alias_result and target_provider == current_provider and on_aggregator:
+        aggregator_slug = _resolve_on_aggregator(new_model, current_provider)
+        if aggregator_slug:
+            new_model = aggregator_slug
+        else:
+            detected = detect_provider_for_model(new_model, current_provider)
+            if detected:
+                target_provider, new_model = detected
+    elif not alias_result and target_provider == current_provider and not is_custom:
        detected = detect_provider_for_model(new_model, current_provider)
        if detected:
            target_provider, new_model = detected

    provider_changed = target_provider != current_provider

-    # Step 4: Resolve credentials for target provider
+    # ── Step 5: Resolve credentials ──
    api_key = current_api_key
    base_url = current_base_url
    api_mode = ""
-    if provider_changed:
-        try:
-            runtime = resolve_runtime_provider(requested=target_provider)
-            api_key = runtime.get("api_key", "")
-            base_url = runtime.get("base_url", "")
-            api_mode = runtime.get("api_mode", "")
-        except Exception as e:
-            provider_label = _PROVIDER_LABELS.get(target_provider, target_provider)
-            if target_provider == "custom":
-                return ModelSwitchResult(
-                    success=False,
-                    target_provider=target_provider,
-                    error_message=(
-                        "No custom endpoint configured. Set model.base_url "
-                        "in config.yaml, or set OPENAI_BASE_URL in .env, "
-                        "or run: hermes setup → Custom OpenAI-compatible endpoint"
-                    ),
-                )
+
+    try:
+        runtime = resolve_runtime_provider(requested=target_provider)
+        api_key = runtime.get("api_key", "")
+        base_url = runtime.get("base_url", "")
+        api_mode = runtime.get("api_mode", "")
+    except Exception as e:
+        provider_label = _PROVIDER_LABELS.get(target_provider, target_provider)
+        if target_provider == "custom":
            return ModelSwitchResult(
-                success=False,
-                target_provider=target_provider,
+                success=False, target_provider=target_provider,
                error_message=(
-                    f"Could not resolve credentials for provider "
-                    f"'{provider_label}': {e}"
+                    "No custom endpoint configured.\n"
+                    "Set model.base_url in config.yaml or OPENAI_BASE_URL in .env."
                ),
            )
-    else:
-        # Gateway also resolves for unchanged provider to get accurate
-        # base_url for validation probing.
-        try:
-            runtime = resolve_runtime_provider(requested=current_provider)
-            api_key = runtime.get("api_key", "")
-            base_url = runtime.get("base_url", "")
-            api_mode = runtime.get("api_mode", "")
-        except Exception:
-            pass
-
-    # Step 5: Validate the model
-    try:
-        validation = validate_requested_model(
-            new_model,
-            target_provider,
-            api_key=api_key,
-            base_url=base_url,
-        )
-    except Exception:
-        validation = {
-            "accepted": True,
-            "persist": True,
-            "recognized": False,
-            "message": None,
-        }
-
-    if not validation.get("accepted"):
-        msg = validation.get("message", "Invalid model")
        return ModelSwitchResult(
-            success=False,
-            new_model=new_model,
-            target_provider=target_provider,
-            error_message=msg,
+            success=False, target_provider=target_provider,
+            error_message=(
+                f"No credentials for {provider_label}.\n"
+                f"Run `hermes setup` to configure it.\nDetail: {e}"
+            ),
        )

-    # Step 6: Build result
+    # ── Step 6: Catalog validation ──
+    known_models: list[str] = []
+    if target_provider in _AGGREGATOR_PROVIDERS:
+        known_models = [m for m, _ in OPENROUTER_MODELS]
+    elif target_provider in _PROVIDER_MODELS:
+        known_models = list(_PROVIDER_MODELS[target_provider])
+
+    model_lower = new_model.lower()
+    found = any(m.lower() == model_lower for m in known_models)
+
+    warning_message = ""
+    if not found and known_models:
+        close = get_close_matches(model_lower, [m.lower() for m in known_models], n=3, cutoff=0.5)
+        if close:
+            lower_to_orig = {m.lower(): m for m in known_models}
+            suggestions = [lower_to_orig.get(c, c) for c in close]
+            warning_message = f"Not in catalog — did you mean: {', '.join(f'`{s}`' for s in suggestions)}?"
+        else:
+            warning_message = f"`{new_model}` not in catalog — sending as-is."
+    elif not found and not known_models:
+        warning_message = f"No catalog for {target_provider} — accepting as-is."
+
+    # ── Step 7: Build result ──
    provider_label = _PROVIDER_LABELS.get(target_provider, target_provider)
    is_custom_target = target_provider == "custom" or (
-        base_url
-        and "openrouter.ai" not in (base_url or "")
+        base_url and "openrouter.ai" not in (base_url or "")
        and ("localhost" in (base_url or "") or "127.0.0.1" in (base_url or ""))
    )

    if target_provider in {"opencode-zen", "opencode-go"}:
-        # Recompute against the requested new model, not the currently-configured
-        # model used during runtime resolution. OpenCode mixes API surfaces by
-        # model family, so a same-provider model switch can change api_mode.
        api_mode = opencode_model_api_mode(target_provider, new_model)

    return ModelSwitchResult(
-        success=True,
-        new_model=new_model,
-        target_provider=target_provider,
-        provider_changed=provider_changed,
-        api_key=api_key,
-        base_url=base_url,
-        api_mode=api_mode,
-        persist=bool(validation.get("persist")),
-        warning_message=validation.get("message") or "",
-        is_custom_target=is_custom_target,
-        provider_label=provider_label,
+        success=True, new_model=new_model, target_provider=target_provider,
+        provider_changed=provider_changed, api_key=api_key, base_url=base_url,
+        api_mode=api_mode, persist=True, warning_message=warning_message,
+        is_custom_target=is_custom_target, provider_label=provider_label,
+        resolved_via_alias=resolved_alias,
    )


 def switch_to_custom_provider() -> CustomAutoResult:
-    """Handle bare '/model custom' — resolve endpoint and auto-detect model.
-
-    Returns a result object; the caller handles persistence and output.
-    """
+    """Handle bare '/model custom' — resolve endpoint and auto-detect model."""
    from hermes_cli.runtime_provider import (
        resolve_runtime_provider,
        _auto_detect_local_model,
@@ -207,7 +492,7 @@ def switch_to_custom_provider() -> CustomAutoResult:
    except Exception as e:
        return CustomAutoResult(
            success=False,
-            error_message=f"Could not resolve custom endpoint: {e}",
+            error_message=f"No custom endpoint configured.\nSet model.base_url in config.yaml or OPENAI_BASE_URL in .env.\nDetail: {e}",
        )

    cust_base = runtime.get("base_url", "")
@@ -216,29 +501,14 @@ def switch_to_custom_provider() -> CustomAutoResult:
    if not cust_base or "openrouter.ai" in cust_base:
        return CustomAutoResult(
            success=False,
-            error_message=(
-                "No custom endpoint configured. "
-                "Set model.base_url in config.yaml, or set OPENAI_BASE_URL "
-                "in .env, or run: hermes setup → Custom OpenAI-compatible endpoint"
-            ),
+            error_message="No custom endpoint configured.\nSet model.base_url in config.yaml or OPENAI_BASE_URL in .env.",
        )

    detected_model = _auto_detect_local_model(cust_base)
    if not detected_model:
        return CustomAutoResult(
-            success=False,
-            base_url=cust_base,
-            api_key=cust_key,
-            error_message=(
-                f"Custom endpoint at {cust_base} is reachable but no single "
-                f"model was auto-detected. Specify the model explicitly: "
-                f"/model custom:<model-name>"
-            ),
+            success=False, base_url=cust_base, api_key=cust_key,
+            error_message=f"Custom endpoint at {cust_base} responded but no model detected.\nSpecify explicitly: /model custom:<model-name>",
        )

-    return CustomAutoResult(
-        success=True,
-        model=detected_model,
-        base_url=cust_base,
-        api_key=cust_key,
-    )
+    return CustomAutoResult(success=True, model=detected_model, base_url=cust_base, api_key=cust_key)
@@ -28,7 +28,7 @@ GITHUB_MODELS_CATALOG_URL = COPILOT_MODELS_URL
 OPENROUTER_MODELS: list[tuple[str, str]] = [
    ("anthropic/claude-opus-4.6",       "recommended"),
    ("anthropic/claude-sonnet-4.6",     ""),
-    ("qwen/qwen3.6-plus-preview:free", "free"),
+    ("qwen/qwen3.6-plus:free", "free"),
    ("anthropic/claude-sonnet-4.5",     ""),
    ("anthropic/claude-haiku-4.5",      ""),
    ("openai/gpt-5.4",                  ""),
@@ -59,7 +59,7 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
    "nous": [
        "anthropic/claude-opus-4.6",
        "anthropic/claude-sonnet-4.6",
-        "qwen/qwen3.6-plus-preview:free",
+        "qwen/qwen3.6-plus:free",
        "anthropic/claude-sonnet-4.5",
        "anthropic/claude-haiku-4.5",
        "openai/gpt-5.4",
@@ -561,7 +561,7 @@ def _get_platform_tools(
    # MCP servers are expected to be available on all platforms by default.
    # If the platform explicitly lists one or more MCP server names, treat that
    # as an allowlist. Otherwise include every globally enabled MCP server.
-    mcp_servers = config.get("mcp_servers", {})
+    mcp_servers = config.get("mcp_servers") or {}
    enabled_mcp_servers = {
        name
        for name, server_cfg in mcp_servers.items()
@@ -32,7 +32,7 @@ from agent.memory_provider import MemoryProvider
 logger = logging.getLogger(__name__)

 # Timeouts
-_QUERY_TIMEOUT = 30   # brv query — should be fast
+_QUERY_TIMEOUT = 10   # brv query — should be fast
 _CURATE_TIMEOUT = 120  # brv curate — may involve LLM processing

 # Minimum lengths to filter noise
@@ -175,9 +175,6 @@ class ByteRoverMemoryProvider(MemoryProvider):
        self._cwd = ""
        self._session_id = ""
        self._turn_count = 0
-        self._prefetch_result = ""
-        self._prefetch_lock = threading.Lock()
-        self._prefetch_thread: Optional[threading.Thread] = None
        self._sync_thread: Optional[threading.Thread] = None

    @property
@@ -216,37 +213,26 @@ class ByteRoverMemoryProvider(MemoryProvider):
        )

    def prefetch(self, query: str, *, session_id: str = "") -> str:
-        if self._prefetch_thread and self._prefetch_thread.is_alive():
-            self._prefetch_thread.join(timeout=3.0)
-        with self._prefetch_lock:
-            result = self._prefetch_result
-            self._prefetch_result = ""
-        if not result:
+        """Run brv query synchronously before the agent's first LLM call.
+
+        Blocks until the query completes (up to _QUERY_TIMEOUT seconds), ensuring
+        the result is available as context before the model is called.
+        """
+        if not query or len(query.strip()) < _MIN_QUERY_LEN:
            return ""
-        return f"## ByteRover Context\n{result}"
+        result = _run_brv(
+            ["query", "--", query.strip()[:5000]],
+            timeout=_QUERY_TIMEOUT, cwd=self._cwd,
+        )
+        if result["success"] and result.get("output"):
+            output = result["output"].strip()
+            if len(output) > _MIN_OUTPUT_LEN:
+                return f"## ByteRover Context\n{output}"
+        return ""

    def queue_prefetch(self, query: str, *, session_id: str = "") -> None:
-        if not query or len(query.strip()) < _MIN_QUERY_LEN:
-            return
-
-        def _run():
-            try:
-                result = _run_brv(
-                    ["query", "--", query.strip()[:5000]],
-                    timeout=_QUERY_TIMEOUT, cwd=self._cwd,
-                )
-                if result["success"] and result.get("output"):
-                    output = result["output"].strip()
-                    if len(output) > _MIN_OUTPUT_LEN:
-                        with self._prefetch_lock:
-                            self._prefetch_result = output
-            except Exception as e:
-                logger.debug("ByteRover prefetch failed: %s", e)
-
-        self._prefetch_thread = threading.Thread(
-            target=_run, daemon=True, name="brv-prefetch"
-        )
-        self._prefetch_thread.start()
+        """No-op: prefetch() now runs synchronously at turn start."""
+        pass

    def sync_turn(self, user_content: str, assistant_content: str, *, session_id: str = "") -> None:
        """Curate the conversation turn in background (non-blocking)."""
@@ -338,9 +324,8 @@ class ByteRoverMemoryProvider(MemoryProvider):
        return json.dumps({"error": f"Unknown tool: {tool_name}"})

    def shutdown(self) -> None:
-        for t in (self._sync_thread, self._prefetch_thread):
-            if t and t.is_alive():
-                t.join(timeout=10.0)
+        if self._sync_thread and self._sync_thread.is_alive():
+            self._sync_thread.join(timeout=10.0)

    # -- Tool implementations ------------------------------------------------

@@ -18,6 +18,7 @@ from __future__ import annotations
 import json
 import logging
 import threading
+from pathlib import Path
 from typing import Any, Dict, List, Optional

 from agent.memory_provider import MemoryProvider
@@ -108,6 +109,9 @@ CONCLUDE_SCHEMA = {
 }


+ALL_TOOL_SCHEMAS = [PROFILE_SCHEMA, SEARCH_SCHEMA, CONTEXT_SCHEMA, CONCLUDE_SCHEMA]
+
+
 # ---------------------------------------------------------------------------
 # MemoryProvider implementation
 # ---------------------------------------------------------------------------
@@ -124,6 +128,34 @@ class HonchoMemoryProvider(MemoryProvider):
        self._prefetch_thread: Optional[threading.Thread] = None
        self._sync_thread: Optional[threading.Thread] = None

+        # B1: recall_mode — set during initialize from config
+        self._recall_mode = "hybrid"  # "context", "tools", or "hybrid"
+
+        # B4: First-turn context baking
+        self._first_turn_context: Optional[str] = None
+        self._first_turn_lock = threading.Lock()
+
+        # B5: Cost-awareness turn counting and cadence
+        self._turn_count = 0
+        self._injection_frequency = "every-turn"  # or "first-turn"
+        self._context_cadence = 1   # minimum turns between context API calls
+        self._dialectic_cadence = 1  # minimum turns between dialectic API calls
+        self._reasoning_level_cap: Optional[str] = None  # "minimal", "low", "mid", "high"
+        self._last_context_turn = -999
+        self._last_dialectic_turn = -999
+
+        # B2: peer_memory_mode gating (stub)
+        self._suppress_memory = False
+        self._suppress_user_profile = False
+
+        # Port #1957: lazy session init for tools-only mode
+        self._session_initialized = False
+        self._lazy_init_kwargs: Optional[dict] = None
+        self._lazy_init_session_id: Optional[str] = None
+
+        # Port #4053: cron guard — when True, plugin is fully inactive
+        self._cron_skipped = False
+
    @property
    def name(self) -> str:
        return "honcho"
@@ -133,6 +165,7 @@ class HonchoMemoryProvider(MemoryProvider):
        try:
            from plugins.memory.honcho.client import HonchoClientConfig
            cfg = HonchoClientConfig.from_global_config()
+            # Port #2645: baseUrl-only verification — api_key OR base_url suffices
            return cfg.enabled and bool(cfg.api_key or cfg.base_url)
        except Exception:
            return False
@@ -158,8 +191,22 @@ class HonchoMemoryProvider(MemoryProvider):
        ]

    def initialize(self, session_id: str, **kwargs) -> None:
-        """Initialize Honcho session manager."""
+        """Initialize Honcho session manager.
+
+        Handles: cron guard, recall_mode, session name resolution,
+        peer memory mode, SOUL.md ai_peer sync, memory file migration,
+        and pre-warming context at init.
+        """
        try:
+            # ----- Port #4053: cron guard -----
+            agent_context = kwargs.get("agent_context", "")
+            platform = kwargs.get("platform", "cli")
+            if agent_context in ("cron", "flush") or platform == "cron":
+                logger.debug("Honcho skipped: cron/flush context (agent_context=%s, platform=%s)",
+                             agent_context, platform)
+                self._cron_skipped = True
+                return
+
            from plugins.memory.honcho.client import HonchoClientConfig, get_honcho_client
            from plugins.memory.honcho.session import HonchoSessionManager

@@ -169,20 +216,78 @@ class HonchoMemoryProvider(MemoryProvider):
                return

            self._config = cfg
-            client = get_honcho_client(cfg)
-            self._manager = HonchoSessionManager(
-                honcho=client,
-                config=cfg,
-                context_tokens=cfg.context_tokens,
-            )

-            # Build session key from kwargs or session_id
-            platform = kwargs.get("platform", "cli")
-            user_id = kwargs.get("user_id", "")
-            if user_id:
-                self._session_key = f"{platform}:{user_id}"
-            else:
-                self._session_key = session_id
+            # ----- B1: recall_mode from config -----
+            self._recall_mode = cfg.recall_mode  # "context", "tools", or "hybrid"
+            logger.debug("Honcho recall_mode: %s", self._recall_mode)
+
+            # ----- B5: cost-awareness config -----
+            try:
+                raw = cfg.raw or {}
+                self._injection_frequency = raw.get("injectionFrequency", "every-turn")
+                self._context_cadence = int(raw.get("contextCadence", 1))
+                self._dialectic_cadence = int(raw.get("dialecticCadence", 1))
+                cap = raw.get("reasoningLevelCap")
+                if cap and cap in ("minimal", "low", "mid", "high"):
+                    self._reasoning_level_cap = cap
+            except Exception as e:
+                logger.debug("Honcho cost-awareness config parse error: %s", e)
+
+            # ----- Port #1969: aiPeer sync from SOUL.md -----
+            try:
+                hermes_home = kwargs.get("hermes_home", "")
+                if hermes_home and not cfg.raw.get("aiPeer"):
+                    soul_path = Path(hermes_home) / "SOUL.md"
+                    if soul_path.exists():
+                        soul_text = soul_path.read_text(encoding="utf-8").strip()
+                        if soul_text:
+                            # Try YAML frontmatter: "name: Foo"
+                            first_line = soul_text.split("\n")[0].strip()
+                            if first_line.startswith("---"):
+                                # Look for name: in frontmatter
+                                for line in soul_text.split("\n")[1:]:
+                                    line = line.strip()
+                                    if line == "---":
+                                        break
+                                    if line.lower().startswith("name:"):
+                                        name_val = line.split(":", 1)[1].strip().strip("\"'")
+                                        if name_val:
+                                            cfg.ai_peer = name_val
+                                            logger.debug("Honcho ai_peer set from SOUL.md: %s", name_val)
+                                        break
+                            elif first_line.startswith("# "):
+                                # Markdown heading: "# AgentName"
+                                name_val = first_line[2:].strip()
+                                if name_val:
+                                    cfg.ai_peer = name_val
+                                    logger.debug("Honcho ai_peer set from SOUL.md heading: %s", name_val)
+            except Exception as e:
+                logger.debug("Honcho SOUL.md ai_peer sync failed: %s", e)
+
+            # ----- B2: peer_memory_mode gating (stub) -----
+            try:
+                ai_mode = cfg.peer_memory_mode(cfg.ai_peer)
+                user_mode = cfg.peer_memory_mode(cfg.peer_name or "user")
+                # "honcho" means Honcho owns memory; suppress built-in
+                self._suppress_memory = (ai_mode == "honcho")
+                self._suppress_user_profile = (user_mode == "honcho")
+                logger.debug("Honcho peer_memory_mode: ai=%s (suppress_memory=%s), user=%s (suppress_user_profile=%s)",
+                             ai_mode, self._suppress_memory, user_mode, self._suppress_user_profile)
+            except Exception as e:
+                logger.debug("Honcho peer_memory_mode check failed: %s", e)
+
+            # ----- Port #1957: lazy session init for tools-only mode -----
+            if self._recall_mode == "tools":
+                # Defer actual session creation until first tool call
+                self._lazy_init_kwargs = kwargs
+                self._lazy_init_session_id = session_id
+                # Still need a client reference for _ensure_session
+                self._config = cfg
+                logger.debug("Honcho tools-only mode — deferring session init until first tool call")
+                return
+
+            # ----- Eager init (context or hybrid mode) -----
+            self._do_session_init(cfg, session_id, **kwargs)

        except ImportError:
            logger.debug("honcho-ai package not installed — plugin inactive")
@@ -190,19 +295,180 @@ class HonchoMemoryProvider(MemoryProvider):
            logger.warning("Honcho init failed: %s", e)
            self._manager = None

-    def system_prompt_block(self) -> str:
-        if not self._manager or not self._session_key:
-            return ""
-        return (
-            "# Honcho Memory\n"
-            "Active. AI-native cross-session user modeling.\n"
-            "Use honcho_profile for a quick factual snapshot, "
-            "honcho_search for raw excerpts, honcho_context for synthesized answers, "
-            "honcho_conclude to save facts about the user."
+    def _do_session_init(self, cfg, session_id: str, **kwargs) -> None:
+        """Shared session initialization logic for both eager and lazy paths."""
+        from plugins.memory.honcho.client import get_honcho_client
+        from plugins.memory.honcho.session import HonchoSessionManager
+
+        client = get_honcho_client(cfg)
+        self._manager = HonchoSessionManager(
+            honcho=client,
+            config=cfg,
+            context_tokens=cfg.context_tokens,
        )

+        # ----- B3: resolve_session_name -----
+        session_title = kwargs.get("session_title")
+        self._session_key = (
+            cfg.resolve_session_name(session_title=session_title, session_id=session_id)
+            or session_id
+            or "hermes-default"
+        )
+        logger.debug("Honcho session key resolved: %s", self._session_key)
+
+        # Create session eagerly
+        session = self._manager.get_or_create(self._session_key)
+        self._session_initialized = True
+
+        # ----- B6: Memory file migration (one-time, for new sessions) -----
+        try:
+            if not session.messages:
+                from hermes_constants import get_hermes_home
+                mem_dir = str(get_hermes_home() / "memories")
+                self._manager.migrate_memory_files(self._session_key, mem_dir)
+                logger.debug("Honcho memory file migration attempted for new session: %s", self._session_key)
+        except Exception as e:
+            logger.debug("Honcho memory file migration skipped: %s", e)
+
+        # ----- B7: Pre-warming context at init -----
+        if self._recall_mode in ("context", "hybrid"):
+            try:
+                self._manager.prefetch_context(self._session_key)
+                self._manager.prefetch_dialectic(self._session_key, "What should I know about this user?")
+                logger.debug("Honcho pre-warm threads started for session: %s", self._session_key)
+            except Exception as e:
+                logger.debug("Honcho pre-warm failed: %s", e)
+
+    def _ensure_session(self) -> bool:
+        """Lazily initialize the Honcho session (for tools-only mode).
+
+        Returns True if the manager is ready, False otherwise.
+        """
+        if self._manager and self._session_initialized:
+            return True
+        if self._cron_skipped:
+            return False
+        if not self._config or not self._lazy_init_kwargs:
+            return False
+
+        try:
+            self._do_session_init(
+                self._config,
+                self._lazy_init_session_id or "hermes-default",
+                **self._lazy_init_kwargs,
+            )
+            # Clear lazy refs
+            self._lazy_init_kwargs = None
+            self._lazy_init_session_id = None
+            return self._manager is not None
+        except Exception as e:
+            logger.warning("Honcho lazy session init failed: %s", e)
+            return False
+
+    def _format_first_turn_context(self, ctx: dict) -> str:
+        """Format the prefetch context dict into a readable system prompt block."""
+        parts = []
+
+        rep = ctx.get("representation", "")
+        if rep:
+            parts.append(f"## User Representation\n{rep}")
+
+        card = ctx.get("card", "")
+        if card:
+            parts.append(f"## User Peer Card\n{card}")
+
+        ai_rep = ctx.get("ai_representation", "")
+        if ai_rep:
+            parts.append(f"## AI Self-Representation\n{ai_rep}")
+
+        ai_card = ctx.get("ai_card", "")
+        if ai_card:
+            parts.append(f"## AI Identity Card\n{ai_card}")
+
+        if not parts:
+            return ""
+        return "\n\n".join(parts)
+
+    def system_prompt_block(self) -> str:
+        """Return system prompt text, adapted by recall_mode.
+
+        B4: On the FIRST call, fetch and bake the full Honcho context
+        (user representation, peer card, AI representation, continuity synthesis).
+        Subsequent calls return the cached block for prompt caching stability.
+        """
+        if self._cron_skipped:
+            return ""
+        if not self._manager or not self._session_key:
+            # tools-only mode without session yet still returns a minimal block
+            if self._recall_mode == "tools" and self._config:
+                return (
+                    "# Honcho Memory\n"
+                    "Active (tools-only mode). Use honcho_profile, honcho_search, "
+                    "honcho_context, and honcho_conclude tools to access user memory."
+                )
+            return ""
+
+        # ----- B4: First-turn context baking -----
+        first_turn_block = ""
+        if self._recall_mode in ("context", "hybrid"):
+            with self._first_turn_lock:
+                if self._first_turn_context is None:
+                    # First call — fetch and cache
+                    try:
+                        ctx = self._manager.get_prefetch_context(self._session_key)
+                        self._first_turn_context = self._format_first_turn_context(ctx) if ctx else ""
+                    except Exception as e:
+                        logger.debug("Honcho first-turn context fetch failed: %s", e)
+                        self._first_turn_context = ""
+                first_turn_block = self._first_turn_context
+
+        # ----- B1: adapt text based on recall_mode -----
+        if self._recall_mode == "context":
+            header = (
+                "# Honcho Memory\n"
+                "Active (context-injection mode). Relevant user context is automatically "
+                "injected before each turn. No memory tools are available — context is "
+                "managed automatically."
+            )
+        elif self._recall_mode == "tools":
+            header = (
+                "# Honcho Memory\n"
+                "Active (tools-only mode). Use honcho_profile for a quick factual snapshot, "
+                "honcho_search for raw excerpts, honcho_context for synthesized answers, "
+                "honcho_conclude to save facts about the user. "
+                "No automatic context injection — you must use tools to access memory."
+            )
+        else:  # hybrid
+            header = (
+                "# Honcho Memory\n"
+                "Active (hybrid mode). Relevant context is auto-injected AND memory tools are available. "
+                "Use honcho_profile for a quick factual snapshot, "
+                "honcho_search for raw excerpts, honcho_context for synthesized answers, "
+                "honcho_conclude to save facts about the user."
+            )
+
+        if first_turn_block:
+            return f"{header}\n\n{first_turn_block}"
+        return header
+
    def prefetch(self, query: str, *, session_id: str = "") -> str:
-        """Return prefetched dialectic context from background thread."""
+        """Return prefetched dialectic context from background thread.
+
+        B1: Returns empty when recall_mode is "tools" (no injection).
+        B5: Respects injection_frequency — "first-turn" returns cached/empty after turn 0.
+        Port #3265: Truncates to context_tokens budget.
+        """
+        if self._cron_skipped:
+            return ""
+
+        # B1: tools-only mode — no auto-injection
+        if self._recall_mode == "tools":
+            return ""
+
+        # B5: injection_frequency — if "first-turn" and past first turn, return empty
+        if self._injection_frequency == "first-turn" and self._turn_count > 0:
+            return ""
+
        if self._prefetch_thread and self._prefetch_thread.is_alive():
            self._prefetch_thread.join(timeout=3.0)
        with self._prefetch_lock:
@@ -210,13 +476,49 @@ class HonchoMemoryProvider(MemoryProvider):
            self._prefetch_result = ""
        if not result:
            return ""
+
+        # ----- Port #3265: token budget enforcement -----
+        result = self._truncate_to_budget(result)
+
        return f"## Honcho Context\n{result}"

+    def _truncate_to_budget(self, text: str) -> str:
+        """Truncate text to fit within context_tokens budget if set."""
+        if not self._config or not self._config.context_tokens:
+            return text
+        budget_chars = self._config.context_tokens * 4  # conservative char estimate
+        if len(text) <= budget_chars:
+            return text
+        # Truncate at word boundary
+        truncated = text[:budget_chars]
+        last_space = truncated.rfind(" ")
+        if last_space > budget_chars * 0.8:
+            truncated = truncated[:last_space]
+        return truncated + " …"
+
    def queue_prefetch(self, query: str, *, session_id: str = "") -> None:
-        """Fire a background dialectic query for the upcoming turn."""
+        """Fire a background dialectic query for the upcoming turn.
+
+        B5: Checks cadence before firing background threads.
+        """
+        if self._cron_skipped:
+            return
        if not self._manager or not self._session_key or not query:
            return

+        # B1: tools-only mode — no prefetch
+        if self._recall_mode == "tools":
+            return
+
+        # B5: cadence check — skip if too soon since last dialectic call
+        if self._dialectic_cadence > 1:
+            if (self._turn_count - self._last_dialectic_turn) < self._dialectic_cadence:
+                logger.debug("Honcho dialectic prefetch skipped: cadence %d, turns since last: %d",
+                             self._dialectic_cadence, self._turn_count - self._last_dialectic_turn)
+                return
+
+        self._last_dialectic_turn = self._turn_count
+
        def _run():
            try:
                result = self._manager.dialectic_query(
@@ -233,14 +535,28 @@ class HonchoMemoryProvider(MemoryProvider):
        )
        self._prefetch_thread.start()

+        # Also fire context prefetch if cadence allows
+        if self._context_cadence <= 1 or (self._turn_count - self._last_context_turn) >= self._context_cadence:
+            self._last_context_turn = self._turn_count
+            try:
+                self._manager.prefetch_context(self._session_key, query)
+            except Exception as e:
+                logger.debug("Honcho context prefetch failed: %s", e)
+
+    def on_turn_start(self, turn_number: int, message: str, **kwargs) -> None:
+        """Track turn count for cadence and injection_frequency logic."""
+        self._turn_count = turn_number
+
    def sync_turn(self, user_content: str, assistant_content: str, *, session_id: str = "") -> None:
        """Record the conversation turn in Honcho (non-blocking)."""
+        if self._cron_skipped:
+            return
        if not self._manager or not self._session_key:
            return

        def _sync():
            try:
-                session = self._manager.get_or_create_session(self._session_key)
+                session = self._manager.get_or_create(self._session_key)
                session.add_message("user", user_content[:4000])
                session.add_message("assistant", assistant_content[:4000])
                # Flush to Honcho API
@@ -259,6 +575,8 @@ class HonchoMemoryProvider(MemoryProvider):
        """Mirror built-in user profile writes as Honcho conclusions."""
        if action != "add" or target != "user" or not content:
            return
+        if self._cron_skipped:
+            return
        if not self._manager or not self._session_key:
            return

@@ -273,6 +591,8 @@ class HonchoMemoryProvider(MemoryProvider):

    def on_session_end(self, messages: List[Dict[str, Any]]) -> None:
        """Flush all pending messages to Honcho on session end."""
+        if self._cron_skipped:
+            return
        if not self._manager:
            return
        # Wait for pending sync
@@ -284,9 +604,26 @@ class HonchoMemoryProvider(MemoryProvider):
            logger.debug("Honcho session-end flush failed: %s", e)

    def get_tool_schemas(self) -> List[Dict[str, Any]]:
-        return [PROFILE_SCHEMA, SEARCH_SCHEMA, CONTEXT_SCHEMA, CONCLUDE_SCHEMA]
+        """Return tool schemas, respecting recall_mode.
+
+        B1: context-only mode hides all tools.
+        """
+        if self._cron_skipped:
+            return []
+        if self._recall_mode == "context":
+            return []
+        return list(ALL_TOOL_SCHEMAS)

    def handle_tool_call(self, tool_name: str, args: dict, **kwargs) -> str:
+        """Handle a Honcho tool call, with lazy session init for tools-only mode."""
+        if self._cron_skipped:
+            return json.dumps({"error": "Honcho is not active (cron context)."})
+
+        # Port #1957: ensure session is initialized for tools-only mode
+        if not self._session_initialized:
+            if not self._ensure_session():
+                return json.dumps({"error": "Honcho session could not be initialized."})
+
        if not self._manager or not self._session_key:
            return json.dumps({"error": "Honcho is not active for this session."})

@@ -85,6 +85,16 @@ def _normalize_recall_mode(val: str) -> str:
    return val if val in _VALID_RECALL_MODES else "hybrid"


+_VALID_OBSERVATION_MODES = {"unified", "directional"}
+_OBSERVATION_MODE_ALIASES = {"shared": "unified", "separate": "directional", "cross": "directional"}
+
+
+def _normalize_observation_mode(val: str) -> str:
+    """Normalize observation mode values."""
+    val = _OBSERVATION_MODE_ALIASES.get(val, val)
+    return val if val in _VALID_OBSERVATION_MODES else "unified"
+
+
 def _resolve_memory_mode(
    global_val: str | dict,
    host_val: str | dict | None,
@@ -154,6 +164,10 @@ class HonchoClientConfig:
    # "context" — auto-injected context only, Honcho tools removed
    # "tools"   — Honcho tools only, no auto-injected context
    recall_mode: str = "hybrid"
+    # Observation mode: how Honcho peers observe each other.
+    # "unified"      — user peer observes self; all agents share one observation pool
+    # "directional"  — AI peer observes user; each agent keeps its own view
+    observation_mode: str = "unified"
    # Session resolution
    session_strategy: str = "per-directory"
    session_peer_prefix: bool = False
@@ -313,6 +327,11 @@ class HonchoClientConfig:
                or raw.get("recallMode")
                or "hybrid"
            ),
+            observation_mode=_normalize_observation_mode(
+                host_block.get("observationMode")
+                or raw.get("observationMode")
+                or "unified"
+            ),
            session_strategy=session_strategy,
            session_peer_prefix=session_peer_prefix,
            sessions=raw.get("sessions", {}),
@@ -110,6 +110,9 @@ class HonchoSessionManager:
        self._dialectic_max_chars: int = (
            config.dialectic_max_chars if config else 600
        )
+        self._observation_mode: str = (
+            config.observation_mode if config else "unified"
+        )

        # Async write queue — started lazily on first enqueue
        self._async_queue: queue.Queue | None = None
@@ -159,13 +162,18 @@ class HonchoSessionManager:

        session = self.honcho.session(session_id)

-        # Configure peer observation settings.
-        # observe_me=True for AI peer so Honcho watches what the agent says
-        # and builds its representation over time — enabling identity formation.
+        # Configure peer observation settings based on observation_mode.
+        # Unified: user peer observes self, AI peer passive — all agents share
+        #          one observation pool via user self-observations.
+        # Directional: AI peer observes user — each agent keeps its own view.
        try:
            from honcho.session import SessionPeerConfig
-            user_config = SessionPeerConfig(observe_me=True, observe_others=True)
-            ai_config = SessionPeerConfig(observe_me=True, observe_others=True)
+            if self._observation_mode == "directional":
+                user_config = SessionPeerConfig(observe_me=True, observe_others=False)
+                ai_config = SessionPeerConfig(observe_me=False, observe_others=True)
+            else:  # unified (default)
+                user_config = SessionPeerConfig(observe_me=True, observe_others=False)
+                ai_config = SessionPeerConfig(observe_me=False, observe_others=False)

            session.add_peers([(user_peer, user_config), (assistant_peer, ai_config)])
        except Exception as e:
@@ -493,12 +501,27 @@ class HonchoSessionManager:
        if not session:
            return ""

-        peer_id = session.assistant_peer_id if peer == "ai" else session.user_peer_id
-        target_peer = self._get_or_create_peer(peer_id)
        level = reasoning_level or self._dynamic_reasoning_level(query)

        try:
-            result = target_peer.chat(query, reasoning_level=level) or ""
+            if self._observation_mode == "directional":
+                # AI peer queries about the user (cross-observation)
+                if peer == "ai":
+                    ai_peer_obj = self._get_or_create_peer(session.assistant_peer_id)
+                    result = ai_peer_obj.chat(query, reasoning_level=level) or ""
+                else:
+                    ai_peer_obj = self._get_or_create_peer(session.assistant_peer_id)
+                    result = ai_peer_obj.chat(
+                        query,
+                        target=session.user_peer_id,
+                        reasoning_level=level,
+                    ) or ""
+            else:
+                # Unified: user peer queries self, or AI peer queries self
+                peer_id = session.assistant_peer_id if peer == "ai" else session.user_peer_id
+                target_peer = self._get_or_create_peer(peer_id)
+                result = target_peer.chat(query, reasoning_level=level) or ""
+
            # Apply Hermes-side char cap before caching
            if result and self._dialectic_max_chars and len(result) > self._dialectic_max_chars:
                result = result[:self._dialectic_max_chars].rsplit(" ", 1)[0] + " …"
@@ -895,9 +918,16 @@ class HonchoSessionManager:
            logger.warning("No session cached for '%s', skipping conclusion", session_key)
            return False

-        assistant_peer = self._get_or_create_peer(session.assistant_peer_id)
        try:
-            conclusions_scope = assistant_peer.conclusions_of(session.user_peer_id)
+            if self._observation_mode == "directional":
+                # AI peer creates conclusion about user (cross-observation)
+                assistant_peer = self._get_or_create_peer(session.assistant_peer_id)
+                conclusions_scope = assistant_peer.conclusions_of(session.user_peer_id)
+            else:
+                # Unified: user peer creates self-conclusion
+                user_peer = self._get_or_create_peer(session.user_peer_id)
+                conclusions_scope = user_peer.conclusions_of(session.user_peer_id)
+
            conclusions_scope.create([{
                "content": content.strip(),
                "session_id": session.honcho_session_id,
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

 [project]
 name = "hermes-agent"
-version = "0.6.0"
+version = "0.7.0"
 description = "The self-improving AI agent — creates skills from experience, improves them during use, and runs anywhere"
 readme = "README.md"
 requires-python = ">=3.11"
@@ -1267,7 +1267,153 @@ class AIAgent:
            self.context_compressor._context_probe_persistable = False
            # Iterative summary from previous session must not bleed into new one (#2635)
            self.context_compressor._previous_summary = None
-    
+
+    # ── Mid-chat model switching ──────────────────────────────────────────
+
+    def switch_model(
+        self,
+        new_model: str,
+        new_provider: str,
+        api_key: str = "",
+        base_url: str = "",
+        api_mode: str = "",
+    ) -> None:
+        """Switch the agent to a different model/provider mid-conversation.
+
+        Follows the same pattern as ``_try_activate_fallback()`` for the
+        client/state swap, but differs in two critical ways:
+
+        1. Updates ``_primary_runtime`` — this is a permanent switch, not
+           a temporary fallback that gets restored next turn.
+        2. Invalidates ``_cached_system_prompt`` — the system prompt
+           contains model-dependent content (tool enforcement guidance,
+           Google model guidance, Alibaba self-identification) that must
+           be rebuilt for the new model.
+
+        The caller (CLI or gateway handler) is responsible for:
+        - Parsing user input via ``model_switch.switch_model()``
+        - Credential resolution (api_key, base_url)
+        - Persisting the change to config.yaml
+        - Formatting output messages
+
+        Args:
+            new_model: The new model slug (e.g. ``"claude-sonnet-4"``).
+            new_provider: The provider ID (e.g. ``"openrouter"``).
+            api_key: API key for the target provider.
+            base_url: Base URL for the target provider.
+            api_mode: Explicit api_mode override.  If empty, auto-detected
+                from provider/base_url.
+        """
+        old_model = self.model
+
+        # ── Determine api_mode ──
+        if not api_mode:
+            api_mode = "chat_completions"
+            if new_provider == "openai-codex":
+                api_mode = "codex_responses"
+            elif new_provider == "anthropic":
+                api_mode = "anthropic_messages"
+            elif base_url:
+                _bu_lower = base_url.rstrip("/").lower()
+                if _bu_lower.endswith("/anthropic"):
+                    api_mode = "anthropic_messages"
+                elif self._is_direct_openai_url(base_url):
+                    api_mode = "codex_responses"
+
+        self.model = new_model
+        self.provider = new_provider
+        self.base_url = base_url
+        self.api_mode = api_mode
+
+        # ── Build new client ──
+        if api_mode == "anthropic_messages":
+            from agent.anthropic_adapter import (
+                build_anthropic_client,
+                resolve_anthropic_token,
+                _is_oauth_token,
+            )
+            effective_key = api_key or (
+                resolve_anthropic_token() if new_provider == "anthropic" else ""
+            )
+            self.api_key = effective_key
+            self._anthropic_api_key = effective_key
+            self._anthropic_base_url = base_url or None
+            self._anthropic_client = build_anthropic_client(
+                effective_key, self._anthropic_base_url,
+            )
+            self._is_anthropic_oauth = _is_oauth_token(effective_key)
+            self.client = None
+            self._client_kwargs = {}
+        else:
+            self.api_key = api_key
+            new_kwargs = {"api_key": api_key, "base_url": base_url}
+            self._client_kwargs = new_kwargs
+            self.client = self._create_openai_client(
+                dict(new_kwargs), reason="model_switch", shared=True,
+            )
+            # Clear anthropic state if we were previously on anthropic
+            self._anthropic_client = None
+
+        # ── Re-evaluate prompt caching for the new model/provider ──
+        is_native_anthropic = api_mode == "anthropic_messages"
+        self._use_prompt_caching = (
+            ("openrouter" in (base_url or "").lower() and "claude" in new_model.lower())
+            or is_native_anthropic
+        )
+
+        # ── Update context compressor for new model's context window ──
+        if hasattr(self, "context_compressor") and self.context_compressor:
+            from agent.model_metadata import get_model_context_length
+            new_context_length = get_model_context_length(
+                new_model,
+                base_url=base_url,
+                api_key=api_key,
+                provider=new_provider,
+            )
+            self.context_compressor.model = new_model
+            self.context_compressor.base_url = base_url
+            self.context_compressor.api_key = api_key
+            self.context_compressor.provider = new_provider
+            self.context_compressor.context_length = new_context_length
+            self.context_compressor.threshold_tokens = int(
+                new_context_length * self.context_compressor.threshold_percent
+            )
+
+        # ── Invalidate system prompt — it contains model-dependent content ──
+        self._invalidate_system_prompt()
+
+        # ── Update _primary_runtime snapshot (permanent switch) ──
+        _cc = self.context_compressor
+        self._primary_runtime = {
+            "model": self.model,
+            "provider": self.provider,
+            "base_url": self.base_url,
+            "api_mode": self.api_mode,
+            "api_key": getattr(self, "api_key", ""),
+            "client_kwargs": dict(self._client_kwargs),
+            "use_prompt_caching": self._use_prompt_caching,
+            "compressor_model": _cc.model,
+            "compressor_base_url": _cc.base_url,
+            "compressor_api_key": getattr(_cc, "api_key", ""),
+            "compressor_provider": _cc.provider,
+            "compressor_context_length": _cc.context_length,
+            "compressor_threshold_tokens": _cc.threshold_tokens,
+        }
+        if self.api_mode == "anthropic_messages":
+            self._primary_runtime.update({
+                "anthropic_api_key": self._anthropic_api_key,
+                "anthropic_base_url": self._anthropic_base_url,
+                "is_anthropic_oauth": self._is_anthropic_oauth,
+            })
+
+        # ── Reset fallback state — new primary means fresh fallback chain ──
+        self._fallback_activated = False
+        self._fallback_index = 0
+
+        logging.info(
+            "Model switched: %s → %s (%s)", old_model, new_model, new_provider,
+        )
+
    def _safe_print(self, *args, **kwargs):
        """Print that silently handles broken pipes / closed stdout.

@@ -6009,6 +6155,30 @@ class AIAgent:
                        spinner.stop(cute_msg)
                    elif self.quiet_mode:
                        self._vprint(f"  {cute_msg}")
+            elif self._memory_manager and self._memory_manager.has_tool(function_name):
+                # Memory provider tools (hindsight_retain, honcho_search, etc.)
+                # These are not in the tool registry — route through MemoryManager.
+                spinner = None
+                if self.quiet_mode and not self.tool_progress_callback:
+                    face = random.choice(KawaiiSpinner.KAWAII_WAITING)
+                    emoji = _get_tool_emoji(function_name)
+                    preview = _build_tool_preview(function_name, function_args) or function_name
+                    spinner = KawaiiSpinner(f"{face} {emoji} {preview}", spinner_type='dots', print_fn=self._print_fn)
+                    spinner.start()
+                _mem_result = None
+                try:
+                    function_result = self._memory_manager.handle_tool_call(function_name, function_args)
+                    _mem_result = function_result
+                except Exception as tool_error:
+                    function_result = json.dumps({"error": f"Memory tool '{function_name}' failed: {tool_error}"})
+                    logger.error("memory_manager.handle_tool_call raised for %s: %s", function_name, tool_error, exc_info=True)
+                finally:
+                    tool_duration = time.time() - tool_start_time
+                    cute_msg = _get_cute_tool_message_impl(function_name, function_args, tool_duration, result=_mem_result)
+                    if spinner:
+                        spinner.stop(cute_msg)
+                    elif self.quiet_mode:
+                        self._vprint(f"  {cute_msg}")
            elif self.quiet_mode:
                spinner = None
                if not self.tool_progress_callback:
@@ -6656,10 +6826,21 @@ class AIAgent:
            if self.step_callback is not None:
                try:
                    prev_tools = []
-                    for _m in reversed(messages):
+                    for _idx, _m in enumerate(reversed(messages)):
                        if _m.get("role") == "assistant" and _m.get("tool_calls"):
+                            _fwd_start = len(messages) - _idx
+                            _results_by_id = {}
+                            for _tm in messages[_fwd_start:]:
+                                if _tm.get("role") != "tool":
+                                    break
+                                _tcid = _tm.get("tool_call_id")
+                                if _tcid:
+                                    _results_by_id[_tcid] = _tm.get("content", "")
                            prev_tools = [
-                                tc["function"]["name"]
+                                {
+                                    "name": tc["function"]["name"],
+                                    "result": _results_by_id.get(tc.get("id")),
+                                }
                                for tc in _m["tool_calls"]
                                if isinstance(tc, dict)
                            ]
@@ -7369,6 +7550,61 @@ class AIAgent:
                    # compress history and retry, not abort immediately.
                    status_code = getattr(api_error, "status_code", None)

+                    # ── Anthropic Sonnet long-context tier gate ───────────
+                    # Anthropic returns HTTP 429 "Extra usage is required for
+                    # long context requests" when a Claude Max (or similar)
+                    # subscription doesn't include the 1M-context tier.  This
+                    # is NOT a transient rate limit — retrying or switching
+                    # credentials won't help.  Reduce context to 200k (the
+                    # standard tier) and compress.
+                    # Only applies to Sonnet — Opus 1M is general access.
+                    _is_long_context_tier_error = (
+                        status_code == 429
+                        and "extra usage" in error_msg
+                        and "long context" in error_msg
+                        and "sonnet" in self.model.lower()
+                    )
+                    if _is_long_context_tier_error:
+                        _reduced_ctx = 200000
+                        compressor = self.context_compressor
+                        old_ctx = compressor.context_length
+                        if old_ctx > _reduced_ctx:
+                            compressor.context_length = _reduced_ctx
+                            compressor.threshold_tokens = int(
+                                _reduced_ctx * compressor.threshold_percent
+                            )
+                            compressor._context_probed = True
+                            # Don't persist — this is a subscription-tier
+                            # limitation, not a model capability.  If the user
+                            # later enables extra usage the 1M limit should
+                            # come back automatically.
+                            compressor._context_probe_persistable = False
+                            self._vprint(
+                                f"{self.log_prefix}⚠️  Anthropic long-context tier "
+                                f"requires extra usage — reducing context: "
+                                f"{old_ctx:,} → {_reduced_ctx:,} tokens",
+                                force=True,
+                            )
+
+                        compression_attempts += 1
+                        if compression_attempts <= max_compression_attempts:
+                            original_len = len(messages)
+                            messages, active_system_prompt = self._compress_context(
+                                messages, system_message,
+                                approx_tokens=approx_tokens,
+                                task_id=effective_task_id,
+                            )
+                            if len(messages) < original_len or old_ctx > _reduced_ctx:
+                                self._emit_status(
+                                    f"🗜️ Context reduced to {_reduced_ctx:,} tokens "
+                                    f"(was {old_ctx:,}), retrying..."
+                                )
+                                time.sleep(2)
+                                restart_with_compressed_messages = True
+                                break
+                        # Fall through to normal error handling if compression
+                        # is exhausted or didn't help.
+
                    # Eager fallback for rate-limit errors (429 or quota exhaustion).
                    # When a fallback model is configured, switch immediately instead
                    # of burning through retries with exponential backoff -- the
@@ -7474,7 +7710,33 @@ class AIAgent:
                                f"treating as probable context overflow.",
                                force=True,
                            )
-                    
+
+                    # Server disconnects on large sessions are often caused by
+                    # the request exceeding the provider's context/payload limit
+                    # without a proper HTTP error response.  Treat these as
+                    # context-length errors to trigger compression rather than
+                    # burning through retries that will all fail the same way.
+                    # This breaks the death spiral: disconnect → no token data
+                    # → no compression → bigger session → more disconnects.
+                    # (#2153)
+                    if not is_context_length_error and not status_code:
+                        _is_server_disconnect = (
+                            'server disconnected' in error_msg
+                            or 'peer closed connection' in error_msg
+                            or error_type in ('ReadError', 'RemoteProtocolError', 'ServerDisconnectedError')
+                        )
+                        if _is_server_disconnect:
+                            ctx_len = getattr(getattr(self, 'context_compressor', None), 'context_length', 200000)
+                            _is_large = approx_tokens > ctx_len * 0.6 or len(api_messages) > 200
+                            if _is_large:
+                                is_context_length_error = True
+                                self._vprint(
+                                    f"{self.log_prefix}⚠️  Server disconnected with large session "
+                                    f"(~{approx_tokens:,} tokens, {len(api_messages)} msgs) — "
+                                    f"treating as context-length error, attempting compression.",
+                                    force=True,
+                                )
+
                    if is_context_length_error:
                        compressor = self.context_compressor
                        old_ctx = compressor.context_length
@@ -8109,11 +8371,20 @@ class AIAgent:
                    # threshold (default 50%) leaves ample headroom; if tool
                    # results push past it, the next API call will report the
                    # real total and trigger compression then.
+                    #
+                    # If last_prompt_tokens is 0 (stale after API disconnect
+                    # or provider returned no usage data), fall back to rough
+                    # estimate to avoid missing compression.  Without this,
+                    # a session can grow unbounded after disconnects because
+                    # should_compress(0) never fires.  (#2153)
                    _compressor = self.context_compressor
-                    _real_tokens = (
-                        _compressor.last_prompt_tokens
-                        + _compressor.last_completion_tokens
-                    )
+                    if _compressor.last_prompt_tokens > 0:
+                        _real_tokens = (
+                            _compressor.last_prompt_tokens
+                            + _compressor.last_completion_tokens
+                        )
+                    else:
+                        _real_tokens = estimate_messages_tokens_rough(messages)

                    # ── Context pressure warnings (user-facing only) ──────────
                    # Notify the user (NOT the LLM) as context approaches the
@@ -62,6 +62,33 @@ function formatOutgoingMessage(message) {
  return REPLY_PREFIX ? `${REPLY_PREFIX}${message}` : message;
 }

+function normalizeWhatsAppId(value) {
+  if (!value) return '';
+  return String(value).replace(':', '@');
+}
+
+function getMessageContent(msg) {
+  const content = msg?.message || {};
+  if (content.ephemeralMessage?.message) return content.ephemeralMessage.message;
+  if (content.viewOnceMessage?.message) return content.viewOnceMessage.message;
+  if (content.viewOnceMessageV2?.message) return content.viewOnceMessageV2.message;
+  if (content.documentWithCaptionMessage?.message) return content.documentWithCaptionMessage.message;
+  if (content.templateMessage?.hydratedTemplate) return content.templateMessage.hydratedTemplate;
+  if (content.buttonsMessage) return content.buttonsMessage;
+  if (content.listMessage) return content.listMessage;
+  return content;
+}
+
+function getContextInfo(messageContent) {
+  if (!messageContent || typeof messageContent !== 'object') return {};
+  for (const value of Object.values(messageContent)) {
+    if (value && typeof value === 'object' && value.contextInfo) {
+      return value.contextInfo;
+    }
+  }
+  return {};
+}
+
 mkdirSync(SESSION_DIR, { recursive: true });

 // Build LID → phone reverse map from session files (lid-mapping-{phone}.json)
@@ -157,6 +184,11 @@ async function startSocket() {
    // than 'notify'. Accept both and filter agent echo-backs below.
    if (type !== 'notify' && type !== 'append') return;

+    const botIds = Array.from(new Set([
+      normalizeWhatsAppId(sock.user?.id),
+      normalizeWhatsAppId(sock.user?.lid),
+    ].filter(Boolean)));
+
    for (const msg of messages) {
      if (!msg.message) continue;

@@ -200,23 +232,28 @@ async function startSocket() {
        continue;
      }

+      const messageContent = getMessageContent(msg);
+      const contextInfo = getContextInfo(messageContent);
+      const mentionedIds = Array.from(new Set((contextInfo?.mentionedJid || []).map(normalizeWhatsAppId).filter(Boolean)));
+      const quotedParticipant = normalizeWhatsAppId(contextInfo?.participant || contextInfo?.remoteJid || '');
+
      // Extract message body
      let body = '';
      let hasMedia = false;
      let mediaType = '';
      const mediaUrls = [];

-      if (msg.message.conversation) {
-        body = msg.message.conversation;
-      } else if (msg.message.extendedTextMessage?.text) {
-        body = msg.message.extendedTextMessage.text;
-      } else if (msg.message.imageMessage) {
-        body = msg.message.imageMessage.caption || '';
+      if (messageContent.conversation) {
+        body = messageContent.conversation;
+      } else if (messageContent.extendedTextMessage?.text) {
+        body = messageContent.extendedTextMessage.text;
+      } else if (messageContent.imageMessage) {
+        body = messageContent.imageMessage.caption || '';
        hasMedia = true;
        mediaType = 'image';
        try {
          const buf = await downloadMediaMessage(msg, 'buffer', {}, { logger, reuploadRequest: sock.updateMediaMessage });
-          const mime = msg.message.imageMessage.mimetype || 'image/jpeg';
+          const mime = messageContent.imageMessage.mimetype || 'image/jpeg';
          const extMap = { 'image/jpeg': '.jpg', 'image/png': '.png', 'image/webp': '.webp', 'image/gif': '.gif' };
          const ext = extMap[mime] || '.jpg';
          mkdirSync(IMAGE_CACHE_DIR, { recursive: true });
@@ -226,13 +263,13 @@ async function startSocket() {
        } catch (err) {
          console.error('[bridge] Failed to download image:', err.message);
        }
-      } else if (msg.message.videoMessage) {
-        body = msg.message.videoMessage.caption || '';
+      } else if (messageContent.videoMessage) {
+        body = messageContent.videoMessage.caption || '';
        hasMedia = true;
        mediaType = 'video';
        try {
          const buf = await downloadMediaMessage(msg, 'buffer', {}, { logger, reuploadRequest: sock.updateMediaMessage });
-          const mime = msg.message.videoMessage.mimetype || 'video/mp4';
+          const mime = messageContent.videoMessage.mimetype || 'video/mp4';
          const ext = mime.includes('mp4') ? '.mp4' : '.mkv';
          mkdirSync(DOCUMENT_CACHE_DIR, { recursive: true });
          const filePath = path.join(DOCUMENT_CACHE_DIR, `vid_${randomBytes(6).toString('hex')}${ext}`);
@@ -241,11 +278,11 @@ async function startSocket() {
        } catch (err) {
          console.error('[bridge] Failed to download video:', err.message);
        }
-      } else if (msg.message.audioMessage || msg.message.pttMessage) {
+      } else if (messageContent.audioMessage || messageContent.pttMessage) {
        hasMedia = true;
-        mediaType = msg.message.pttMessage ? 'ptt' : 'audio';
+        mediaType = messageContent.pttMessage ? 'ptt' : 'audio';
        try {
-          const audioMsg = msg.message.pttMessage || msg.message.audioMessage;
+          const audioMsg = messageContent.pttMessage || messageContent.audioMessage;
          const buf = await downloadMediaMessage(msg, 'buffer', {}, { logger, reuploadRequest: sock.updateMediaMessage });
          const mime = audioMsg.mimetype || 'audio/ogg';
          const ext = mime.includes('ogg') ? '.ogg' : mime.includes('mp4') ? '.m4a' : '.ogg';
@@ -256,11 +293,11 @@ async function startSocket() {
        } catch (err) {
          console.error('[bridge] Failed to download audio:', err.message);
        }
-      } else if (msg.message.documentMessage) {
-        body = msg.message.documentMessage.caption || '';
+      } else if (messageContent.documentMessage) {
+        body = messageContent.documentMessage.caption || '';
        hasMedia = true;
        mediaType = 'document';
-        const fileName = msg.message.documentMessage.fileName || 'document';
+        const fileName = messageContent.documentMessage.fileName || 'document';
        try {
          const buf = await downloadMediaMessage(msg, 'buffer', {}, { logger, reuploadRequest: sock.updateMediaMessage });
          mkdirSync(DOCUMENT_CACHE_DIR, { recursive: true });
@@ -309,6 +346,9 @@ async function startSocket() {
        hasMedia,
        mediaType,
        mediaUrls,
+        mentionedIds,
+        quotedParticipant,
+        botIds,
        timestamp: msg.messageTimestamp,
      };

@@ -205,6 +205,47 @@ class TestStepCallback:
        assert "read_file" not in tool_call_ids
        mock_rcts.assert_called_once()

+    def test_result_passed_to_build_tool_complete(self, mock_conn, event_loop_fixture):
+        """Tool result from prev_tools dict is forwarded to build_tool_complete."""
+        from collections import deque
+
+        tool_call_ids = {"terminal": deque(["tc-xyz789"])}
+        loop = event_loop_fixture
+
+        cb = make_step_cb(mock_conn, "session-1", loop, tool_call_ids)
+
+        with patch("acp_adapter.events.asyncio.run_coroutine_threadsafe") as mock_rcts, \
+             patch("acp_adapter.events.build_tool_complete") as mock_btc:
+            future = MagicMock(spec=Future)
+            future.result.return_value = None
+            mock_rcts.return_value = future
+
+            # Provide a result string in the tool info dict
+            cb(1, [{"name": "terminal", "result": '{"output": "hello"}'}])
+
+        mock_btc.assert_called_once_with(
+            "tc-xyz789", "terminal", result='{"output": "hello"}'
+        )
+
+    def test_none_result_passed_through(self, mock_conn, event_loop_fixture):
+        """When result is None (e.g. first iteration), None is passed through."""
+        from collections import deque
+
+        tool_call_ids = {"web_search": deque(["tc-aaa"])}
+        loop = event_loop_fixture
+
+        cb = make_step_cb(mock_conn, "session-1", loop, tool_call_ids)
+
+        with patch("acp_adapter.events.asyncio.run_coroutine_threadsafe") as mock_rcts, \
+             patch("acp_adapter.events.build_tool_complete") as mock_btc:
+            future = MagicMock(spec=Future)
+            future.result.return_value = None
+            mock_rcts.return_value = future
+
+            cb(1, [{"name": "web_search", "result": None}])
+
+        mock_btc.assert_called_once_with("tc-aaa", "web_search", result=None)
+

 # ---------------------------------------------------------------------------
 # Message callback
@@ -0,0 +1,349 @@
+"""End-to-end tests for ACP MCP server registration and tool-result reporting.
+
+Exercises the full flow through the ACP server layer:
+  new_session(mcpServers) → MCP tools registered → prompt() →
+    tool_progress_callback (ToolCallStart) →
+    step_callback with results (ToolCallUpdate with rawOutput) →
+    session_update events arrive at the mock client
+"""
+
+import asyncio
+from collections import deque
+from types import SimpleNamespace
+from unittest.mock import AsyncMock, MagicMock, patch
+
+import pytest
+
+import acp
+from acp.schema import (
+    EnvVariable,
+    HttpHeader,
+    McpServerHttp,
+    McpServerStdio,
+    NewSessionResponse,
+    PromptResponse,
+    TextContentBlock,
+    ToolCallProgress,
+    ToolCallStart,
+)
+
+from acp_adapter.server import HermesACPAgent
+from acp_adapter.session import SessionManager
+
+
+# ---------------------------------------------------------------------------
+# Fixtures
+# ---------------------------------------------------------------------------
+
+
+@pytest.fixture()
+def mock_manager():
+    return SessionManager(agent_factory=lambda: MagicMock(name="MockAIAgent"))
+
+
+@pytest.fixture()
+def acp_agent(mock_manager):
+    return HermesACPAgent(session_manager=mock_manager)
+
+
+# ---------------------------------------------------------------------------
+# E2E: MCP registration → prompt → tool events
+# ---------------------------------------------------------------------------
+
+
+class TestMcpRegistrationE2E:
+    """Full flow: session with MCP servers → prompt with tool calls → ACP events."""
+
+    @pytest.mark.asyncio
+    async def test_session_with_mcp_servers_registers_tools(self, acp_agent, mock_manager):
+        """new_session with mcpServers converts them to Hermes config and registers."""
+        servers = [
+            McpServerStdio(
+                name="test-fs",
+                command="/usr/bin/mcp-fs",
+                args=["--root", "/tmp"],
+                env=[EnvVariable(name="DEBUG", value="1")],
+            ),
+            McpServerHttp(
+                name="test-api",
+                url="https://api.example.com/mcp",
+                headers=[HttpHeader(name="Authorization", value="Bearer tok123")],
+            ),
+        ]
+
+        registered_configs = {}
+
+        def mock_register(config_map):
+            registered_configs.update(config_map)
+            return ["mcp_test_fs_read", "mcp_test_fs_write", "mcp_test_api_search"]
+
+        fake_tools = [
+            {"function": {"name": "mcp_test_fs_read"}},
+            {"function": {"name": "mcp_test_fs_write"}},
+            {"function": {"name": "mcp_test_api_search"}},
+            {"function": {"name": "terminal"}},
+        ]
+
+        with patch("tools.mcp_tool.register_mcp_servers", side_effect=mock_register), \
+             patch("model_tools.get_tool_definitions", return_value=fake_tools):
+            resp = await acp_agent.new_session(cwd="/tmp", mcp_servers=servers)
+
+        assert isinstance(resp, NewSessionResponse)
+        state = mock_manager.get_session(resp.session_id)
+
+        # Verify stdio server was converted correctly
+        assert "test-fs" in registered_configs
+        fs_cfg = registered_configs["test-fs"]
+        assert fs_cfg["command"] == "/usr/bin/mcp-fs"
+        assert fs_cfg["args"] == ["--root", "/tmp"]
+        assert fs_cfg["env"] == {"DEBUG": "1"}
+
+        # Verify HTTP server was converted correctly
+        assert "test-api" in registered_configs
+        api_cfg = registered_configs["test-api"]
+        assert api_cfg["url"] == "https://api.example.com/mcp"
+        assert api_cfg["headers"] == {"Authorization": "Bearer tok123"}
+
+        # Verify agent tool surface was refreshed
+        assert state.agent.tools == fake_tools
+        assert state.agent.valid_tool_names == {
+            "mcp_test_fs_read", "mcp_test_fs_write", "mcp_test_api_search", "terminal"
+        }
+
+    @pytest.mark.asyncio
+    async def test_prompt_with_tool_calls_emits_acp_events(self, acp_agent, mock_manager):
+        """Prompt → agent fires callbacks → ACP ToolCallStart + ToolCallUpdate events."""
+        resp = await acp_agent.new_session(cwd="/tmp")
+        session_id = resp.session_id
+        state = mock_manager.get_session(session_id)
+
+        # Wire up a mock ACP client connection
+        mock_conn = MagicMock(spec=acp.Client)
+        mock_conn.session_update = AsyncMock()
+        mock_conn.request_permission = AsyncMock()
+        acp_agent._conn = mock_conn
+
+        def mock_run_conversation(user_message, conversation_history=None, task_id=None):
+            """Simulate an agent turn that calls terminal, gets a result, then responds."""
+            agent = state.agent
+
+            # 1) Agent fires tool_progress_callback (ToolCallStart)
+            if agent.tool_progress_callback:
+                agent.tool_progress_callback(
+                    "terminal", "$ echo hello", {"command": "echo hello"}
+                )
+
+            # 2) Agent fires step_callback with tool results (ToolCallUpdate)
+            if agent.step_callback:
+                agent.step_callback(1, [
+                    {"name": "terminal", "result": '{"output": "hello\\n", "exit_code": 0}'}
+                ])
+
+            return {
+                "final_response": "The command output 'hello'.",
+                "messages": [
+                    {"role": "user", "content": user_message},
+                    {"role": "assistant", "content": "The command output 'hello'."},
+                ],
+            }
+
+        state.agent.run_conversation = mock_run_conversation
+
+        prompt = [TextContentBlock(type="text", text="run echo hello")]
+        resp = await acp_agent.prompt(prompt=prompt, session_id=session_id)
+
+        assert isinstance(resp, PromptResponse)
+        assert resp.stop_reason == "end_turn"
+
+        # Collect all session_update calls
+        updates = []
+        for call in mock_conn.session_update.call_args_list:
+            # session_update(session_id, update) — grab the update
+            update_arg = call[1].get("update") or call[0][1]
+            updates.append(update_arg)
+
+        # Find tool_call (start) and tool_call_update (completion) events
+        starts = [u for u in updates if getattr(u, "session_update", None) == "tool_call"]
+        completions = [u for u in updates if getattr(u, "session_update", None) == "tool_call_update"]
+
+        # Should have at least one ToolCallStart for "terminal"
+        assert len(starts) >= 1, f"Expected ToolCallStart, got updates: {[getattr(u, 'session_update', '?') for u in updates]}"
+        start_event = starts[0]
+        assert isinstance(start_event, ToolCallStart)
+        assert start_event.title.startswith("terminal:")
+
+        # Should have at least one ToolCallUpdate (completion) with rawOutput
+        assert len(completions) >= 1, f"Expected ToolCallUpdate, got updates: {[getattr(u, 'session_update', '?') for u in updates]}"
+        complete_event = completions[0]
+        assert isinstance(complete_event, ToolCallProgress)
+        assert complete_event.status == "completed"
+        # rawOutput should contain the tool result string
+        assert complete_event.raw_output is not None
+        assert "hello" in str(complete_event.raw_output)
+
+    @pytest.mark.asyncio
+    async def test_prompt_tool_results_paired_by_call_id(self, acp_agent, mock_manager):
+        """The ToolCallUpdate's toolCallId must match the ToolCallStart's."""
+        resp = await acp_agent.new_session(cwd="/tmp")
+        session_id = resp.session_id
+        state = mock_manager.get_session(session_id)
+
+        mock_conn = MagicMock(spec=acp.Client)
+        mock_conn.session_update = AsyncMock()
+        mock_conn.request_permission = AsyncMock()
+        acp_agent._conn = mock_conn
+
+        def mock_run(user_message, conversation_history=None, task_id=None):
+            agent = state.agent
+            # Fire two tool calls
+            if agent.tool_progress_callback:
+                agent.tool_progress_callback("read_file", "read: /etc/hosts", {"path": "/etc/hosts"})
+                agent.tool_progress_callback("web_search", "web search: test", {"query": "test"})
+
+            if agent.step_callback:
+                agent.step_callback(1, [
+                    {"name": "read_file", "result": '{"content": "127.0.0.1 localhost"}'},
+                    {"name": "web_search", "result": '{"data": {"web": []}}'},
+                ])
+
+            return {"final_response": "Done.", "messages": []}
+
+        state.agent.run_conversation = mock_run
+
+        prompt = [TextContentBlock(type="text", text="test")]
+        await acp_agent.prompt(prompt=prompt, session_id=session_id)
+
+        updates = []
+        for call in mock_conn.session_update.call_args_list:
+            update_arg = call[1].get("update") or call[0][1]
+            updates.append(update_arg)
+
+        starts = [u for u in updates if getattr(u, "session_update", None) == "tool_call"]
+        completions = [u for u in updates if getattr(u, "session_update", None) == "tool_call_update"]
+
+        assert len(starts) == 2, f"Expected 2 starts, got {len(starts)}"
+        assert len(completions) == 2, f"Expected 2 completions, got {len(completions)}"
+
+        # Each completion's toolCallId must match a start's toolCallId
+        start_ids = {s.tool_call_id for s in starts}
+        completion_ids = {c.tool_call_id for c in completions}
+        assert start_ids == completion_ids, (
+            f"IDs must match: starts={start_ids}, completions={completion_ids}"
+        )
+
+
+class TestMcpSanitizationE2E:
+    """Verify server names with special chars work end-to-end."""
+
+    @pytest.mark.asyncio
+    async def test_slashed_server_name_registers_cleanly(self, acp_agent, mock_manager):
+        """Server name 'ai.exa/exa' should not crash — tools get sanitized names."""
+        servers = [
+            McpServerHttp(
+                name="ai.exa/exa",
+                url="https://exa.ai/mcp",
+                headers=[],
+            ),
+        ]
+
+        registered_configs = {}
+        def mock_register(config_map):
+            registered_configs.update(config_map)
+            return ["mcp_ai_exa_exa_search"]
+
+        fake_tools = [{"function": {"name": "mcp_ai_exa_exa_search"}}]
+
+        with patch("tools.mcp_tool.register_mcp_servers", side_effect=mock_register), \
+             patch("model_tools.get_tool_definitions", return_value=fake_tools):
+            resp = await acp_agent.new_session(cwd="/tmp", mcp_servers=servers)
+
+        state = mock_manager.get_session(resp.session_id)
+
+        # Raw server name preserved as config key
+        assert "ai.exa/exa" in registered_configs
+        # Agent tools refreshed with sanitized name
+        assert "mcp_ai_exa_exa_search" in state.agent.valid_tool_names
+
+
+class TestSessionLifecycleMcpE2E:
+    """Verify MCP servers are registered on all session lifecycle methods."""
+
+    @pytest.mark.asyncio
+    async def test_load_session_registers_mcp(self, acp_agent, mock_manager):
+        """load_session re-registers MCP servers (spec says agents may not retain them)."""
+        # Create a session first
+        create_resp = await acp_agent.new_session(cwd="/tmp")
+        sid = create_resp.session_id
+
+        servers = [
+            McpServerStdio(name="srv", command="/bin/test", args=[], env=[]),
+        ]
+
+        registered = {}
+        def mock_register(config_map):
+            registered.update(config_map)
+            return []
+
+        state = mock_manager.get_session(sid)
+        state.agent.enabled_toolsets = ["hermes-acp"]
+        state.agent.disabled_toolsets = None
+        state.agent.tools = []
+        state.agent.valid_tool_names = set()
+
+        with patch("tools.mcp_tool.register_mcp_servers", side_effect=mock_register), \
+             patch("model_tools.get_tool_definitions", return_value=[]):
+            await acp_agent.load_session(cwd="/tmp", session_id=sid, mcp_servers=servers)
+
+        assert "srv" in registered
+
+    @pytest.mark.asyncio
+    async def test_resume_session_registers_mcp(self, acp_agent, mock_manager):
+        """resume_session re-registers MCP servers."""
+        create_resp = await acp_agent.new_session(cwd="/tmp")
+        sid = create_resp.session_id
+
+        servers = [
+            McpServerStdio(name="srv2", command="/bin/test2", args=[], env=[]),
+        ]
+
+        registered = {}
+        def mock_register(config_map):
+            registered.update(config_map)
+            return []
+
+        state = mock_manager.get_session(sid)
+        state.agent.enabled_toolsets = ["hermes-acp"]
+        state.agent.disabled_toolsets = None
+        state.agent.tools = []
+        state.agent.valid_tool_names = set()
+
+        with patch("tools.mcp_tool.register_mcp_servers", side_effect=mock_register), \
+             patch("model_tools.get_tool_definitions", return_value=[]):
+            await acp_agent.resume_session(cwd="/tmp", session_id=sid, mcp_servers=servers)
+
+        assert "srv2" in registered
+
+    @pytest.mark.asyncio
+    async def test_fork_session_registers_mcp(self, acp_agent, mock_manager):
+        """fork_session registers MCP servers on the new forked session."""
+        create_resp = await acp_agent.new_session(cwd="/tmp")
+        sid = create_resp.session_id
+
+        servers = [
+            McpServerHttp(name="api", url="https://api.test/mcp", headers=[]),
+        ]
+
+        registered = {}
+        def mock_register(config_map):
+            registered.update(config_map)
+            return []
+
+        # Need to set up the forked session's agent too
+        with patch("tools.mcp_tool.register_mcp_servers", side_effect=mock_register), \
+             patch("model_tools.get_tool_definitions", return_value=[]):
+            fork_resp = await acp_agent.fork_session(
+                cwd="/tmp", session_id=sid, mcp_servers=servers
+            )
+
+        assert fork_resp.session_id != ""
+        assert "api" in registered
@@ -505,3 +505,179 @@ class TestSlashCommands:
        assert state.agent.provider == "anthropic"
        assert state.agent.base_url == "https://anthropic.example/v1"
        assert runtime_calls[-1] == "anthropic"
+
+
+# ---------------------------------------------------------------------------
+# _register_session_mcp_servers
+# ---------------------------------------------------------------------------
+
+
+class TestRegisterSessionMcpServers:
+    """Tests for ACP MCP server registration in session lifecycle."""
+
+    @pytest.mark.asyncio
+    async def test_noop_when_no_servers(self, agent, mock_manager):
+        """No-op when mcp_servers is None or empty."""
+        state = mock_manager.create_session(cwd="/tmp")
+        # Should not raise
+        await agent._register_session_mcp_servers(state, None)
+        await agent._register_session_mcp_servers(state, [])
+
+    @pytest.mark.asyncio
+    async def test_registers_stdio_servers(self, agent, mock_manager):
+        """McpServerStdio servers are converted and passed to register_mcp_servers."""
+        from acp.schema import McpServerStdio, EnvVariable
+
+        state = mock_manager.create_session(cwd="/tmp")
+        # Give the mock agent the attributes _register_session_mcp_servers reads
+        state.agent.enabled_toolsets = ["hermes-acp"]
+        state.agent.disabled_toolsets = None
+        state.agent.tools = []
+        state.agent.valid_tool_names = set()
+
+        server = McpServerStdio(
+            name="test-server",
+            command="/usr/bin/test",
+            args=["--flag"],
+            env=[EnvVariable(name="KEY", value="val")],
+        )
+
+        registered_config = {}
+        def capture_register(config_map):
+            registered_config.update(config_map)
+            return ["mcp_test_server_tool1"]
+
+        with patch("tools.mcp_tool.register_mcp_servers", side_effect=capture_register), \
+             patch("model_tools.get_tool_definitions", return_value=[]):
+            await agent._register_session_mcp_servers(state, [server])
+
+        assert "test-server" in registered_config
+        cfg = registered_config["test-server"]
+        assert cfg["command"] == "/usr/bin/test"
+        assert cfg["args"] == ["--flag"]
+        assert cfg["env"] == {"KEY": "val"}
+
+    @pytest.mark.asyncio
+    async def test_registers_http_servers(self, agent, mock_manager):
+        """McpServerHttp servers are converted correctly."""
+        from acp.schema import McpServerHttp, HttpHeader
+
+        state = mock_manager.create_session(cwd="/tmp")
+        state.agent.enabled_toolsets = ["hermes-acp"]
+        state.agent.disabled_toolsets = None
+        state.agent.tools = []
+        state.agent.valid_tool_names = set()
+
+        server = McpServerHttp(
+            name="http-server",
+            url="https://api.example.com/mcp",
+            headers=[HttpHeader(name="Authorization", value="Bearer tok")],
+        )
+
+        registered_config = {}
+        def capture_register(config_map):
+            registered_config.update(config_map)
+            return []
+
+        with patch("tools.mcp_tool.register_mcp_servers", side_effect=capture_register), \
+             patch("model_tools.get_tool_definitions", return_value=[]):
+            await agent._register_session_mcp_servers(state, [server])
+
+        assert "http-server" in registered_config
+        cfg = registered_config["http-server"]
+        assert cfg["url"] == "https://api.example.com/mcp"
+        assert cfg["headers"] == {"Authorization": "Bearer tok"}
+
+    @pytest.mark.asyncio
+    async def test_refreshes_agent_tool_surface(self, agent, mock_manager):
+        """After MCP registration, agent.tools and valid_tool_names are refreshed."""
+        from acp.schema import McpServerStdio
+
+        state = mock_manager.create_session(cwd="/tmp")
+        state.agent.enabled_toolsets = ["hermes-acp"]
+        state.agent.disabled_toolsets = None
+        state.agent.tools = []
+        state.agent.valid_tool_names = set()
+        state.agent._cached_system_prompt = "old prompt"
+
+        server = McpServerStdio(
+            name="srv",
+            command="/bin/test",
+            args=[],
+            env=[],
+        )
+
+        fake_tools = [
+            {"function": {"name": "mcp_srv_search"}},
+            {"function": {"name": "terminal"}},
+        ]
+
+        with patch("tools.mcp_tool.register_mcp_servers", return_value=["mcp_srv_search"]), \
+             patch("model_tools.get_tool_definitions", return_value=fake_tools):
+            await agent._register_session_mcp_servers(state, [server])
+
+        assert state.agent.tools == fake_tools
+        assert state.agent.valid_tool_names == {"mcp_srv_search", "terminal"}
+        # _invalidate_system_prompt should have been called
+        state.agent._invalidate_system_prompt.assert_called_once()
+
+    @pytest.mark.asyncio
+    async def test_register_failure_logs_warning(self, agent, mock_manager):
+        """If register_mcp_servers raises, warning is logged but no crash."""
+        from acp.schema import McpServerStdio
+
+        state = mock_manager.create_session(cwd="/tmp")
+        server = McpServerStdio(
+            name="bad",
+            command="/nonexistent",
+            args=[],
+            env=[],
+        )
+
+        with patch("tools.mcp_tool.register_mcp_servers", side_effect=RuntimeError("boom")):
+            # Should not raise
+            await agent._register_session_mcp_servers(state, [server])
+
+    @pytest.mark.asyncio
+    async def test_new_session_calls_register(self, agent, mock_manager):
+        """new_session passes mcp_servers to _register_session_mcp_servers."""
+        with patch.object(agent, "_register_session_mcp_servers", new_callable=AsyncMock) as mock_reg:
+            resp = await agent.new_session(cwd="/tmp", mcp_servers=["fake"])
+            assert resp is not None
+            mock_reg.assert_called_once()
+            # Second arg should be the mcp_servers list
+            assert mock_reg.call_args[0][1] == ["fake"]
+
+    @pytest.mark.asyncio
+    async def test_load_session_calls_register(self, agent, mock_manager):
+        """load_session passes mcp_servers to _register_session_mcp_servers."""
+        # Create a session first so load can find it
+        state = mock_manager.create_session(cwd="/tmp")
+        sid = state.session_id
+
+        with patch.object(agent, "_register_session_mcp_servers", new_callable=AsyncMock) as mock_reg:
+            resp = await agent.load_session(cwd="/tmp", session_id=sid, mcp_servers=["fake"])
+            assert resp is not None
+            mock_reg.assert_called_once()
+
+    @pytest.mark.asyncio
+    async def test_resume_session_calls_register(self, agent, mock_manager):
+        """resume_session passes mcp_servers to _register_session_mcp_servers."""
+        state = mock_manager.create_session(cwd="/tmp")
+        sid = state.session_id
+
+        with patch.object(agent, "_register_session_mcp_servers", new_callable=AsyncMock) as mock_reg:
+            resp = await agent.resume_session(cwd="/tmp", session_id=sid, mcp_servers=["fake"])
+            assert resp is not None
+            mock_reg.assert_called_once()
+
+    @pytest.mark.asyncio
+    async def test_fork_session_calls_register(self, agent, mock_manager):
+        """fork_session passes mcp_servers to _register_session_mcp_servers."""
+        state = mock_manager.create_session(cwd="/tmp")
+        sid = state.session_id
+
+        with patch.object(agent, "_register_session_mcp_servers", new_callable=AsyncMock) as mock_reg:
+            resp = await agent.fork_session(cwd="/tmp", session_id=sid, mcp_servers=["fake"])
+            assert resp is not None
+            mock_reg.assert_called_once()
@@ -34,8 +34,8 @@ def _ensure_discord_mock():
    discord_mod.Thread = type("Thread", (), {})
    discord_mod.ForumChannel = type("ForumChannel", (), {})
    discord_mod.ui = SimpleNamespace(View=object, button=lambda *a, **k: (lambda fn: fn), Button=object)
-    discord_mod.ButtonStyle = SimpleNamespace(success=1, primary=2, danger=3, green=1, blurple=2, red=3)
-    discord_mod.Color = SimpleNamespace(orange=lambda: 1, green=lambda: 2, blue=lambda: 3, red=lambda: 4)
+    discord_mod.ButtonStyle = SimpleNamespace(success=1, primary=2, secondary=2, danger=3, green=1, grey=2, blurple=2, red=3)
+    discord_mod.Color = SimpleNamespace(orange=lambda: 1, green=lambda: 2, blue=lambda: 3, red=lambda: 4, purple=lambda: 5)
    discord_mod.Interaction = object
    discord_mod.Embed = MagicMock
    discord_mod.app_commands = SimpleNamespace(
@@ -23,8 +23,8 @@ def _ensure_discord_mock():
    discord_mod.Thread = type("Thread", (), {})
    discord_mod.ForumChannel = type("ForumChannel", (), {})
    discord_mod.ui = SimpleNamespace(View=object, button=lambda *a, **k: (lambda fn: fn), Button=object)
-    discord_mod.ButtonStyle = SimpleNamespace(success=1, primary=2, danger=3, green=1, blurple=2, red=3)
-    discord_mod.Color = SimpleNamespace(orange=lambda: 1, green=lambda: 2, blue=lambda: 3, red=lambda: 4)
+    discord_mod.ButtonStyle = SimpleNamespace(success=1, primary=2, secondary=2, danger=3, green=1, grey=2, blurple=2, red=3)
+    discord_mod.Color = SimpleNamespace(orange=lambda: 1, green=lambda: 2, blue=lambda: 3, red=lambda: 4, purple=lambda: 5)
    discord_mod.Interaction = object
    discord_mod.Embed = MagicMock
    discord_mod.app_commands = SimpleNamespace(
@@ -19,8 +19,8 @@ def _ensure_discord_mock():
    discord_mod.Thread = type("Thread", (), {})
    discord_mod.ForumChannel = type("ForumChannel", (), {})
    discord_mod.ui = SimpleNamespace(View=object, button=lambda *a, **k: (lambda fn: fn), Button=object)
-    discord_mod.ButtonStyle = SimpleNamespace(success=1, primary=2, danger=3, green=1, blurple=2, red=3)
-    discord_mod.Color = SimpleNamespace(orange=lambda: 1, green=lambda: 2, blue=lambda: 3, red=lambda: 4)
+    discord_mod.ButtonStyle = SimpleNamespace(success=1, primary=2, secondary=2, danger=3, green=1, grey=2, blurple=2, red=3)
+    discord_mod.Color = SimpleNamespace(orange=lambda: 1, green=lambda: 2, blue=lambda: 3, red=lambda: 4, purple=lambda: 5)
    discord_mod.Interaction = object
    discord_mod.Embed = MagicMock
    discord_mod.app_commands = SimpleNamespace(
@@ -0,0 +1,133 @@
+"""Tests for step_callback backward compatibility.
+
+Verifies that the gateway's step_callback normalization keeps
+``tool_names`` as a list of strings for backward-compatible hooks,
+while also providing the enriched ``tools`` list with results.
+"""
+
+import asyncio
+from unittest.mock import AsyncMock, MagicMock, patch
+
+import pytest
+
+
+class TestStepCallbackNormalization:
+    """The gateway's _step_callback_sync normalizes prev_tools from run_agent."""
+
+    def _extract_step_callback(self):
+        """Build a minimal _step_callback_sync using the same logic as gateway/run.py.
+
+        We replicate the closure so we can test normalisation in isolation
+        without spinning up the full gateway.
+        """
+        captured_events = []
+
+        class FakeHooks:
+            async def emit(self, event_type, data):
+                captured_events.append((event_type, data))
+
+        hooks_ref = FakeHooks()
+        loop = asyncio.new_event_loop()
+
+        def _step_callback_sync(iteration: int, prev_tools: list) -> None:
+            _names: list[str] = []
+            for _t in (prev_tools or []):
+                if isinstance(_t, dict):
+                    _names.append(_t.get("name") or "")
+                else:
+                    _names.append(str(_t))
+            asyncio.run_coroutine_threadsafe(
+                hooks_ref.emit("agent:step", {
+                    "iteration": iteration,
+                    "tool_names": _names,
+                    "tools": prev_tools,
+                }),
+                loop,
+            )
+
+        return _step_callback_sync, captured_events, loop
+
+    def test_dict_prev_tools_produce_string_tool_names(self):
+        """When prev_tools is list[dict], tool_names should be list[str]."""
+        cb, events, loop = self._extract_step_callback()
+
+        # Simulate the enriched format from run_agent.py
+        prev_tools = [
+            {"name": "terminal", "result": '{"output": "hello"}'},
+            {"name": "read_file", "result": '{"content": "..."}'},
+        ]
+
+        try:
+            loop.run_until_complete(asyncio.sleep(0))  # prime the loop
+            import threading
+            t = threading.Thread(target=cb, args=(1, prev_tools))
+            t.start()
+            t.join(timeout=2)
+            loop.run_until_complete(asyncio.sleep(0.1))
+        finally:
+            loop.close()
+
+        assert len(events) == 1
+        _, data = events[0]
+        # tool_names must be strings for backward compat
+        assert data["tool_names"] == ["terminal", "read_file"]
+        assert all(isinstance(n, str) for n in data["tool_names"])
+        # tools should be the enriched dicts
+        assert data["tools"] == prev_tools
+
+    def test_string_prev_tools_still_work(self):
+        """When prev_tools is list[str] (legacy), tool_names should pass through."""
+        cb, events, loop = self._extract_step_callback()
+
+        prev_tools = ["terminal", "read_file"]
+
+        try:
+            loop.run_until_complete(asyncio.sleep(0))
+            import threading
+            t = threading.Thread(target=cb, args=(2, prev_tools))
+            t.start()
+            t.join(timeout=2)
+            loop.run_until_complete(asyncio.sleep(0.1))
+        finally:
+            loop.close()
+
+        assert len(events) == 1
+        _, data = events[0]
+        assert data["tool_names"] == ["terminal", "read_file"]
+
+    def test_empty_prev_tools(self):
+        """Empty or None prev_tools should produce empty tool_names."""
+        cb, events, loop = self._extract_step_callback()
+
+        try:
+            loop.run_until_complete(asyncio.sleep(0))
+            import threading
+            t = threading.Thread(target=cb, args=(1, []))
+            t.start()
+            t.join(timeout=2)
+            loop.run_until_complete(asyncio.sleep(0.1))
+        finally:
+            loop.close()
+
+        assert len(events) == 1
+        _, data = events[0]
+        assert data["tool_names"] == []
+
+    def test_joinable_for_hook_example(self):
+        """The documented hook example: ', '.join(tool_names) should work."""
+        # This is the exact pattern from the docs
+        prev_tools = [
+            {"name": "terminal", "result": "ok"},
+            {"name": "web_search", "result": None},
+        ]
+
+        _names = []
+        for _t in prev_tools:
+            if isinstance(_t, dict):
+                _names.append(_t.get("name") or "")
+            else:
+                _names.append(str(_t))
+
+        # This must not raise — documented hook pattern
+        result = ", ".join(_names)
+        assert result == "terminal, web_search"
@@ -25,8 +25,8 @@ def _ensure_discord_mock():
    discord_mod.Thread = type("Thread", (), {})
    discord_mod.ForumChannel = type("ForumChannel", (), {})
    discord_mod.ui = SimpleNamespace(View=object, button=lambda *a, **k: (lambda fn: fn), Button=object)
-    discord_mod.ButtonStyle = SimpleNamespace(success=1, primary=2, danger=3, green=1, blurple=2, red=3)
-    discord_mod.Color = SimpleNamespace(orange=lambda: 1, green=lambda: 2, blue=lambda: 3, red=lambda: 4)
+    discord_mod.ButtonStyle = SimpleNamespace(success=1, primary=2, secondary=2, danger=3, green=1, grey=2, blurple=2, red=3)
+    discord_mod.Color = SimpleNamespace(orange=lambda: 1, green=lambda: 2, blue=lambda: 3, red=lambda: 4, purple=lambda: 5)
    discord_mod.Interaction = object
    discord_mod.Embed = MagicMock
    discord_mod.app_commands = SimpleNamespace(
@@ -0,0 +1,142 @@
+import json
+from unittest.mock import AsyncMock
+
+from gateway.config import Platform, PlatformConfig, load_gateway_config
+
+
+def _make_adapter(require_mention=None, mention_patterns=None, free_response_chats=None):
+    from gateway.platforms.whatsapp import WhatsAppAdapter
+
+    extra = {}
+    if require_mention is not None:
+        extra["require_mention"] = require_mention
+    if mention_patterns is not None:
+        extra["mention_patterns"] = mention_patterns
+    if free_response_chats is not None:
+        extra["free_response_chats"] = free_response_chats
+
+    adapter = object.__new__(WhatsAppAdapter)
+    adapter.platform = Platform.WHATSAPP
+    adapter.config = PlatformConfig(enabled=True, extra=extra)
+    adapter._message_handler = AsyncMock()
+    adapter._mention_patterns = adapter._compile_mention_patterns()
+    return adapter
+
+
+def _group_message(body="hello", **overrides):
+    data = {
+        "isGroup": True,
+        "body": body,
+        "chatId": "120363001234567890@g.us",
+        "mentionedIds": [],
+        "botIds": ["15551230000@s.whatsapp.net", "15551230000@lid"],
+        "quotedParticipant": "",
+    }
+    data.update(overrides)
+    return data
+
+
+def test_group_messages_can_be_opened_via_config():
+    adapter = _make_adapter(require_mention=False)
+
+    assert adapter._should_process_message(_group_message("hello everyone")) is True
+
+
+def test_group_messages_can_require_direct_trigger_via_config():
+    adapter = _make_adapter(require_mention=True)
+
+    assert adapter._should_process_message(_group_message("hello everyone")) is False
+    assert adapter._should_process_message(
+        _group_message(
+            "hi there",
+            mentionedIds=["15551230000@s.whatsapp.net"],
+        )
+    ) is True
+    assert adapter._should_process_message(
+        _group_message(
+            "replying",
+            quotedParticipant="15551230000@lid",
+        )
+    ) is True
+    assert adapter._should_process_message(_group_message("/status")) is True
+
+
+def test_regex_mention_patterns_allow_custom_wake_words():
+    adapter = _make_adapter(require_mention=True, mention_patterns=[r"^\s*chompy\b"])
+
+    assert adapter._should_process_message(_group_message("chompy status")) is True
+    assert adapter._should_process_message(_group_message("   chompy help")) is True
+    assert adapter._should_process_message(_group_message("hey chompy")) is False
+
+
+def test_invalid_regex_patterns_are_ignored():
+    adapter = _make_adapter(require_mention=True, mention_patterns=[r"(", r"^\s*chompy\b"])
+
+    assert adapter._should_process_message(_group_message("chompy status")) is True
+    assert adapter._should_process_message(_group_message("hello everyone")) is False
+
+
+def test_config_bridges_whatsapp_group_settings(monkeypatch, tmp_path):
+    hermes_home = tmp_path / ".hermes"
+    hermes_home.mkdir()
+    (hermes_home / "config.yaml").write_text(
+        "whatsapp:\n"
+        "  require_mention: true\n"
+        "  mention_patterns:\n"
+        "    - \"^\\\\s*chompy\\\\b\"\n",
+        encoding="utf-8",
+    )
+
+    monkeypatch.setenv("HERMES_HOME", str(hermes_home))
+    monkeypatch.delenv("WHATSAPP_REQUIRE_MENTION", raising=False)
+    monkeypatch.delenv("WHATSAPP_MENTION_PATTERNS", raising=False)
+
+    config = load_gateway_config()
+
+    assert config is not None
+    assert config.platforms[Platform.WHATSAPP].extra["require_mention"] is True
+    assert config.platforms[Platform.WHATSAPP].extra["mention_patterns"] == [r"^\s*chompy\b"]
+    assert __import__("os").environ["WHATSAPP_REQUIRE_MENTION"] == "true"
+    assert json.loads(__import__("os").environ["WHATSAPP_MENTION_PATTERNS"]) == [r"^\s*chompy\b"]
+
+
+def test_free_response_chats_bypass_mention_gating():
+    adapter = _make_adapter(
+        require_mention=True,
+        free_response_chats=["120363001234567890@g.us"],
+    )
+
+    assert adapter._should_process_message(_group_message("hello everyone")) is True
+
+
+def test_free_response_chats_does_not_bypass_other_groups():
+    adapter = _make_adapter(
+        require_mention=True,
+        free_response_chats=["999999999999@g.us"],
+    )
+
+    assert adapter._should_process_message(_group_message("hello everyone")) is False
+
+
+def test_dm_always_passes_even_with_require_mention():
+    adapter = _make_adapter(require_mention=True)
+
+    dm = {"isGroup": False, "body": "hello", "botIds": [], "mentionedIds": []}
+    assert adapter._should_process_message(dm) is True
+
+
+def test_mention_stripping_removes_bot_phone_from_body():
+    adapter = _make_adapter(require_mention=True)
+
+    data = _group_message("@15551230000 what is the weather?")
+    cleaned = adapter._clean_bot_mention_text(data["body"], data)
+    assert "15551230000" not in cleaned
+    assert "weather" in cleaned
+
+
+def test_mention_stripping_preserves_body_when_no_mention():
+    adapter = _make_adapter(require_mention=True)
+
+    data = _group_message("just a normal message")
+    cleaned = adapter._clean_bot_mention_text(data["body"], data)
+    assert cleaned == "just a normal message"
@@ -587,3 +587,44 @@ class TestTelegramMenuCommands:
            assert 1 <= len(name) <= _TG_NAME_LIMIT, (
                f"Command '{name}' is {len(name)} chars (limit {_TG_NAME_LIMIT})"
            )
+
+    def test_excludes_telegram_disabled_skills(self, tmp_path, monkeypatch):
+        """Skills disabled for telegram should not appear in the menu."""
+        from unittest.mock import patch, MagicMock
+
+        # Set up a config with a telegram-specific disabled list
+        config_file = tmp_path / "config.yaml"
+        config_file.write_text(
+            "skills:\n"
+            "  platform_disabled:\n"
+            "    telegram:\n"
+            "      - my-disabled-skill\n"
+        )
+        monkeypatch.setenv("HERMES_HOME", str(tmp_path))
+
+        # Mock get_skill_commands to return two skills
+        fake_skills_dir = str(tmp_path / "skills")
+        fake_cmds = {
+            "/my-disabled-skill": {
+                "name": "my-disabled-skill",
+                "description": "Should be hidden",
+                "skill_md_path": f"{fake_skills_dir}/my-disabled-skill/SKILL.md",
+                "skill_dir": f"{fake_skills_dir}/my-disabled-skill",
+            },
+            "/my-enabled-skill": {
+                "name": "my-enabled-skill",
+                "description": "Should be visible",
+                "skill_md_path": f"{fake_skills_dir}/my-enabled-skill/SKILL.md",
+                "skill_dir": f"{fake_skills_dir}/my-enabled-skill",
+            },
+        }
+        with (
+            patch("agent.skill_commands.get_skill_commands", return_value=fake_cmds),
+            patch("tools.skills_tool.SKILLS_DIR", tmp_path / "skills"),
+        ):
+            (tmp_path / "skills").mkdir(exist_ok=True)
+            menu, hidden = telegram_menu_commands(max_commands=100)
+
+        menu_names = {n for n, _ in menu}
+        assert "my_enabled_skill" in menu_names
+        assert "my_disabled_skill" not in menu_names
@@ -466,6 +466,51 @@ class TestGeneratedUnitIncludesLocalBin:
        assert "/.local/bin" in unit


+class TestSystemServiceIdentityRootHandling:
+    """Root user handling in _system_service_identity()."""
+
+    def test_auto_detected_root_is_rejected(self, monkeypatch):
+        """When root is auto-detected (not explicitly requested), raise."""
+        import pwd
+        import grp
+
+        monkeypatch.delenv("SUDO_USER", raising=False)
+        monkeypatch.setenv("USER", "root")
+        monkeypatch.setenv("LOGNAME", "root")
+
+        import pytest
+        with pytest.raises(ValueError, match="pass --run-as-user root to override"):
+            gateway_cli._system_service_identity(run_as_user=None)
+
+    def test_explicit_root_is_allowed(self, monkeypatch):
+        """When root is explicitly passed via --run-as-user root, allow it."""
+        import pwd
+        import grp
+
+        root_info = pwd.getpwnam("root")
+        root_group = grp.getgrgid(root_info.pw_gid).gr_name
+
+        username, group, home = gateway_cli._system_service_identity(run_as_user="root")
+        assert username == "root"
+        assert home == root_info.pw_dir
+
+    def test_non_root_user_passes_through(self, monkeypatch):
+        """Normal non-root user works as before."""
+        import pwd
+        import grp
+
+        monkeypatch.delenv("SUDO_USER", raising=False)
+        monkeypatch.setenv("USER", "nobody")
+        monkeypatch.setenv("LOGNAME", "nobody")
+
+        try:
+            username, group, home = gateway_cli._system_service_identity(run_as_user=None)
+            assert username == "nobody"
+        except ValueError as e:
+            # "nobody" might not exist on all systems
+            assert "Unknown user" in str(e)
+
+
 class TestEnsureUserSystemdEnv:
    """Tests for _ensure_user_systemd_env() D-Bus session bus auto-detection."""

@@ -141,6 +141,109 @@ class TestIsSkillDisabled:
        assert _is_skill_disabled("discord-skill") is True


+# ---------------------------------------------------------------------------
+# get_disabled_skill_names — explicit platform param & env var fallback
+# ---------------------------------------------------------------------------
+
+class TestGetDisabledSkillNames:
+    """Tests for agent.skill_utils.get_disabled_skill_names."""
+
+    def test_explicit_platform_param(self, tmp_path, monkeypatch):
+        """Explicit platform= parameter should resolve per-platform list."""
+        config = tmp_path / "config.yaml"
+        config.write_text(
+            "skills:\n"
+            "  disabled:\n"
+            "    - global-skill\n"
+            "  platform_disabled:\n"
+            "    telegram:\n"
+            "      - tg-only-skill\n"
+        )
+        monkeypatch.setenv("HERMES_HOME", str(tmp_path))
+        monkeypatch.delenv("HERMES_PLATFORM", raising=False)
+        monkeypatch.delenv("HERMES_SESSION_PLATFORM", raising=False)
+
+        from agent.skill_utils import get_disabled_skill_names
+        result = get_disabled_skill_names(platform="telegram")
+        assert result == {"tg-only-skill"}
+
+    def test_session_platform_env_var(self, tmp_path, monkeypatch):
+        """HERMES_SESSION_PLATFORM should be used when HERMES_PLATFORM is unset."""
+        config = tmp_path / "config.yaml"
+        config.write_text(
+            "skills:\n"
+            "  disabled:\n"
+            "    - global-skill\n"
+            "  platform_disabled:\n"
+            "    discord:\n"
+            "      - discord-skill\n"
+        )
+        monkeypatch.setenv("HERMES_HOME", str(tmp_path))
+        monkeypatch.delenv("HERMES_PLATFORM", raising=False)
+        monkeypatch.setenv("HERMES_SESSION_PLATFORM", "discord")
+
+        from agent.skill_utils import get_disabled_skill_names
+        result = get_disabled_skill_names()
+        assert result == {"discord-skill"}
+
+    def test_hermes_platform_takes_precedence(self, tmp_path, monkeypatch):
+        """HERMES_PLATFORM should win over HERMES_SESSION_PLATFORM."""
+        config = tmp_path / "config.yaml"
+        config.write_text(
+            "skills:\n"
+            "  platform_disabled:\n"
+            "    telegram:\n"
+            "      - tg-skill\n"
+            "    discord:\n"
+            "      - discord-skill\n"
+        )
+        monkeypatch.setenv("HERMES_HOME", str(tmp_path))
+        monkeypatch.setenv("HERMES_PLATFORM", "telegram")
+        monkeypatch.setenv("HERMES_SESSION_PLATFORM", "discord")
+
+        from agent.skill_utils import get_disabled_skill_names
+        result = get_disabled_skill_names()
+        assert result == {"tg-skill"}
+
+    def test_explicit_param_overrides_env_vars(self, tmp_path, monkeypatch):
+        """Explicit platform= param should override all env vars."""
+        config = tmp_path / "config.yaml"
+        config.write_text(
+            "skills:\n"
+            "  platform_disabled:\n"
+            "    telegram:\n"
+            "      - tg-skill\n"
+            "    slack:\n"
+            "      - slack-skill\n"
+        )
+        monkeypatch.setenv("HERMES_HOME", str(tmp_path))
+        monkeypatch.setenv("HERMES_PLATFORM", "telegram")
+        monkeypatch.setenv("HERMES_SESSION_PLATFORM", "telegram")
+
+        from agent.skill_utils import get_disabled_skill_names
+        result = get_disabled_skill_names(platform="slack")
+        assert result == {"slack-skill"}
+
+    def test_no_platform_returns_global(self, tmp_path, monkeypatch):
+        """No platform env vars or param should return global list."""
+        config = tmp_path / "config.yaml"
+        config.write_text(
+            "skills:\n"
+            "  disabled:\n"
+            "    - global-skill\n"
+            "  platform_disabled:\n"
+            "    telegram:\n"
+            "      - tg-skill\n"
+        )
+        monkeypatch.setenv("HERMES_HOME", str(tmp_path))
+        monkeypatch.delenv("HERMES_PLATFORM", raising=False)
+        monkeypatch.delenv("HERMES_SESSION_PLATFORM", raising=False)
+
+        from agent.skill_utils import get_disabled_skill_names
+        result = get_disabled_skill_names()
+        assert result == {"global-skill"}
+
+
 # ---------------------------------------------------------------------------
 # _find_all_skills — disabled filtering
 # ---------------------------------------------------------------------------
@@ -32,6 +32,8 @@ def test_stash_local_changes_if_needed_returns_specific_stash_commit(monkeypatch
        calls.append((cmd, kwargs))
        if cmd[-2:] == ["status", "--porcelain"]:
            return SimpleNamespace(stdout=" M hermes_cli/main.py\n?? notes.txt\n", returncode=0)
+        if cmd[-2:] == ["ls-files", "--unmerged"]:
+            return SimpleNamespace(stdout="", returncode=0)
        if cmd[1:4] == ["stash", "push", "--include-untracked"]:
            return SimpleNamespace(stdout="Saved working directory\n", returncode=0)
        if cmd[-3:] == ["rev-parse", "--verify", "refs/stash"]:
@@ -43,8 +45,9 @@ def test_stash_local_changes_if_needed_returns_specific_stash_commit(monkeypatch
    stash_ref = hermes_main._stash_local_changes_if_needed(["git"], tmp_path)

    assert stash_ref == "abc123"
-    assert calls[1][0][1:4] == ["stash", "push", "--include-untracked"]
-    assert calls[2][0][-3:] == ["rev-parse", "--verify", "refs/stash"]
+    assert calls[1][0][-2:] == ["ls-files", "--unmerged"]
+    assert calls[2][0][1:4] == ["stash", "push", "--include-untracked"]
+    assert calls[3][0][-3:] == ["rev-parse", "--verify", "refs/stash"]


 def test_resolve_stash_selector_returns_matching_entry(monkeypatch, tmp_path):
@@ -296,6 +299,8 @@ def test_stash_local_changes_if_needed_raises_when_stash_ref_missing(monkeypatch
    def fake_run(cmd, **kwargs):
        if cmd[-2:] == ["status", "--porcelain"]:
            return SimpleNamespace(stdout=" M hermes_cli/main.py\n", returncode=0)
+        if cmd[-2:] == ["ls-files", "--unmerged"]:
+            return SimpleNamespace(stdout="", returncode=0)
        if cmd[1:4] == ["stash", "push", "--include-untracked"]:
            return SimpleNamespace(stdout="Saved working directory\n", returncode=0)
        if cmd[-3:] == ["rev-parse", "--verify", "refs/stash"]:
@@ -307,21 +307,14 @@ class TestCmdUpdateLaunchdRestart:

        # Mock get_running_pid to return a PID
        with patch("gateway.status.get_running_pid", return_value=12345), \
-             patch("gateway.status.remove_pid_file"):
+             patch("gateway.status.remove_pid_file"), \
+             patch.object(gateway_cli, "launchd_restart") as mock_launchd_restart:
            cmd_update(mock_args)

        captured = capsys.readouterr().out
-        assert "Gateway restarted via launchd" in captured
+        assert "Restarting gateway service" in captured
        assert "Restart it with: hermes gateway run" not in captured
-        # Verify launchctl stop + start were called (not manual SIGTERM)
-        launchctl_calls = [
-            c for c in mock_run.call_args_list
-            if len(c.args[0]) > 0 and c.args[0][0] == "launchctl"
-        ]
-        stop_calls = [c for c in launchctl_calls if "stop" in c.args[0]]
-        start_calls = [c for c in launchctl_calls if "start" in c.args[0]]
-        assert len(stop_calls) >= 1
-        assert len(start_calls) >= 1
+        mock_launchd_restart.assert_called_once_with()

    @patch("shutil.which", return_value=None)
    @patch("subprocess.run")
@@ -191,6 +191,60 @@ class TestHistoryDisplay:
        assert "A" * 250 in output
        assert "A" * 250 + "..." not in output

+    def test_history_shows_recent_sessions_when_current_chat_is_empty(self, capsys):
+        cli = _make_cli()
+        cli.session_id = "current"
+        cli._session_db = MagicMock()
+        cli._session_db.list_sessions_rich.return_value = [
+            {
+                "id": "current",
+                "title": "Current",
+                "preview": "Current preview",
+                "last_active": 0,
+            },
+            {
+                "id": "20260401_201329_d85961",
+                "title": "Checking Running Hermes Agent",
+                "preview": "check running gateways for hermes agent",
+                "last_active": 0,
+            },
+        ]
+
+        cli.show_history()
+        output = capsys.readouterr().out
+
+        assert "No messages in the current chat yet" in output
+        assert "Checking Running Hermes Agent" in output
+        assert "20260401_201329_d85961" in output
+        assert "/resume" in output
+        assert "Current preview" not in output
+
+    def test_resume_without_target_lists_recent_sessions(self, capsys):
+        cli = _make_cli()
+        cli.session_id = "current"
+        cli._session_db = MagicMock()
+        cli._session_db.list_sessions_rich.return_value = [
+            {
+                "id": "current",
+                "title": "Current",
+                "preview": "Current preview",
+                "last_active": 0,
+            },
+            {
+                "id": "20260401_201329_d85961",
+                "title": "Checking Running Hermes Agent",
+                "preview": "check running gateways for hermes agent",
+                "last_active": 0,
+            },
+        ]
+
+        cli._handle_resume_command("/resume")
+        output = capsys.readouterr().out
+
+        assert "Recent sessions" in output
+        assert "Checking Running Hermes Agent" in output
+        assert "Use /resume <session id or title> to continue" in output
+

 class TestRootLevelProviderOverride:
    """Root-level provider/base_url in config.yaml must NOT override model.provider."""
@@ -0,0 +1,209 @@
+"""Tests for Anthropic Sonnet long-context tier 429 handling.
+
+When Claude Max users without "extra usage" hit the 1M context tier
+on Sonnet, Anthropic returns HTTP 429 "Extra usage is required for long
+context requests."  This is NOT a transient rate limit — the agent should
+reduce context_length to 200k and compress instead of retrying.
+
+Only Sonnet is affected — Opus 1M is general access.
+"""
+
+import pytest
+from types import SimpleNamespace
+from unittest.mock import MagicMock, patch
+
+
+# ---------------------------------------------------------------------------
+# Detection logic
+# ---------------------------------------------------------------------------
+
+
+class TestLongContextTierDetection:
+    """Verify the detection heuristic matches the Anthropic error."""
+
+    @staticmethod
+    def _is_long_context_tier_error(status_code, error_msg, model="claude-sonnet-4.6"):
+        error_msg = error_msg.lower()
+        return (
+            status_code == 429
+            and "extra usage" in error_msg
+            and "long context" in error_msg
+            and "sonnet" in model.lower()
+        )
+
+    def test_matches_anthropic_error(self):
+        assert self._is_long_context_tier_error(
+            429,
+            "Extra usage is required for long context requests.",
+        )
+
+    def test_matches_lowercase(self):
+        assert self._is_long_context_tier_error(
+            429,
+            "extra usage is required for long context requests.",
+        )
+
+    def test_matches_openrouter_model_id(self):
+        assert self._is_long_context_tier_error(
+            429,
+            "Extra usage is required for long context requests.",
+            model="anthropic/claude-sonnet-4.6",
+        )
+
+    def test_matches_nous_model_id(self):
+        assert self._is_long_context_tier_error(
+            429,
+            "Extra usage is required for long context requests.",
+            model="claude-sonnet-4-6",
+        )
+
+    def test_rejects_opus(self):
+        """Opus 1M is general access — should NOT trigger reduction."""
+        assert not self._is_long_context_tier_error(
+            429,
+            "Extra usage is required for long context requests.",
+            model="claude-opus-4.6",
+        )
+
+    def test_rejects_opus_openrouter(self):
+        assert not self._is_long_context_tier_error(
+            429,
+            "Extra usage is required for long context requests.",
+            model="anthropic/claude-opus-4.6",
+        )
+
+    def test_rejects_normal_429(self):
+        assert not self._is_long_context_tier_error(
+            429,
+            "Rate limit exceeded. Please retry after 30 seconds.",
+        )
+
+    def test_rejects_wrong_status(self):
+        assert not self._is_long_context_tier_error(
+            400,
+            "Extra usage is required for long context requests.",
+        )
+
+    def test_rejects_partial_match(self):
+        """Both 'extra usage' AND 'long context' must be present."""
+        assert not self._is_long_context_tier_error(
+            429, "extra usage required"
+        )
+        assert not self._is_long_context_tier_error(
+            429, "long context requests not supported"
+        )
+
+
+# ---------------------------------------------------------------------------
+# Context reduction
+# ---------------------------------------------------------------------------
+
+
+class TestContextReduction:
+    """When the long-context tier error fires, context_length should
+    drop to 200k and the reduced flag should be set correctly."""
+
+    def _make_compressor(self, context_length=1_000_000, threshold_percent=0.5):
+        c = SimpleNamespace(
+            context_length=context_length,
+            threshold_percent=threshold_percent,
+            threshold_tokens=int(context_length * threshold_percent),
+            _context_probed=False,
+            _context_probe_persistable=False,
+        )
+        return c
+
+    def test_reduces_1m_to_200k(self):
+        comp = self._make_compressor(1_000_000)
+        reduced_ctx = 200_000
+
+        if comp.context_length > reduced_ctx:
+            comp.context_length = reduced_ctx
+            comp.threshold_tokens = int(reduced_ctx * comp.threshold_percent)
+            comp._context_probed = True
+            comp._context_probe_persistable = False
+
+        assert comp.context_length == 200_000
+        assert comp.threshold_tokens == 100_000
+        assert comp._context_probed is True
+        # Must NOT persist — subscription tier, not model capability
+        assert comp._context_probe_persistable is False
+
+    def test_no_reduction_when_already_200k(self):
+        comp = self._make_compressor(200_000)
+        reduced_ctx = 200_000
+
+        original = comp.context_length
+        if comp.context_length > reduced_ctx:
+            comp.context_length = reduced_ctx
+
+        assert comp.context_length == original  # unchanged
+
+    def test_no_reduction_when_below_200k(self):
+        comp = self._make_compressor(128_000)
+        reduced_ctx = 200_000
+
+        original = comp.context_length
+        if comp.context_length > reduced_ctx:
+            comp.context_length = reduced_ctx
+
+        assert comp.context_length == original  # unchanged
+
+
+# ---------------------------------------------------------------------------
+# Integration: agent error handler path
+# ---------------------------------------------------------------------------
+
+
+class TestAgentErrorPath:
+    """Verify the long-context 429 doesn't hit the generic rate-limit
+    or client-error handlers."""
+
+    def test_long_context_429_not_treated_as_rate_limit(self):
+        """The error should be intercepted before the generic
+        is_rate_limited check fires a fallback switch."""
+        error_msg = "extra usage is required for long context requests."
+        status_code = 429
+        model = "claude-sonnet-4.6"
+
+        _is_long_context_tier_error = (
+            status_code == 429
+            and "extra usage" in error_msg
+            and "long context" in error_msg
+            and "sonnet" in model.lower()
+        )
+        assert _is_long_context_tier_error
+
+    def test_opus_429_falls_through_to_rate_limit(self):
+        """Opus should NOT match — falls through to generic rate-limit."""
+        error_msg = "extra usage is required for long context requests."
+        status_code = 429
+        model = "claude-opus-4.6"
+
+        _is_long_context_tier_error = (
+            status_code == 429
+            and "extra usage" in error_msg
+            and "long context" in error_msg
+            and "sonnet" in model.lower()
+        )
+        assert not _is_long_context_tier_error
+
+    def test_normal_429_still_treated_as_rate_limit(self):
+        """A normal 429 should NOT match the long-context check."""
+        error_msg = "rate limit exceeded"
+        status_code = 429
+        model = "claude-sonnet-4.6"
+
+        _is_long_context_tier_error = (
+            status_code == 429
+            and "extra usage" in error_msg
+            and "long context" in error_msg
+            and "sonnet" in model.lower()
+        )
+        assert not _is_long_context_tier_error
+
+        is_rate_limited = (
+            status_code == 429
+            or "rate limit" in error_msg
+        )
+        assert is_rate_limited
@@ -0,0 +1,627 @@
+"""Tests for mid-chat /model switching.
+
+Covers the full model-switching stack:
+- Model aliases (sonnet, opus, gpt5, etc.)
+- Fuzzy matching and suggestions
+- CommandDef registration (commands.py)
+- Switch pipeline (model_switch.py)
+- AIAgent.switch_model() method (run_agent.py)
+- CLI handler (cli.py)
+- Gateway handler (gateway/run.py)
+- Edge cases and error paths
+"""
+
+import os
+import sys
+from pathlib import Path
+from unittest.mock import MagicMock, patch, AsyncMock
+
+import pytest
+
+# Ensure project root is importable
+PROJECT_ROOT = Path(__file__).parent.parent
+if str(PROJECT_ROOT) not in sys.path:
+    sys.path.insert(0, str(PROJECT_ROOT))
+
+
+# ═══════════════════════════════════════════════════════════════════════
+# Model aliases
+# ═══════════════════════════════════════════════════════════════════════
+
+class TestModelAliases:
+    """Verify the alias system resolves short names to full model slugs."""
+
+    def test_sonnet_alias(self):
+        from hermes_cli.model_switch import resolve_alias
+        result = resolve_alias("sonnet")
+        assert result is not None
+        provider, model, alias = result
+        assert "claude" in model.lower()
+        assert "sonnet" in model.lower()
+        assert alias == "sonnet"
+
+    def test_opus_alias(self):
+        from hermes_cli.model_switch import resolve_alias
+        result = resolve_alias("opus")
+        assert result is not None
+        _, model, _ = result
+        assert "opus" in model.lower()
+
+    def test_haiku_alias(self):
+        from hermes_cli.model_switch import resolve_alias
+        result = resolve_alias("haiku")
+        assert result is not None
+        _, model, _ = result
+        assert "haiku" in model.lower()
+
+    def test_gpt5_alias(self):
+        from hermes_cli.model_switch import resolve_alias
+        result = resolve_alias("gpt5")
+        assert result is not None
+        _, model, _ = result
+        assert "gpt-5" in model.lower()
+
+    def test_gemini_alias(self):
+        from hermes_cli.model_switch import resolve_alias
+        result = resolve_alias("gemini")
+        assert result is not None
+        _, model, _ = result
+        assert "gemini" in model.lower()
+
+    def test_deepseek_alias(self):
+        """deepseek not in static OpenRouter catalog — falls through to pipeline."""
+        from hermes_cli.model_switch import resolve_alias
+        # deepseek-chat isn't in our curated OPENROUTER_MODELS,
+        # so alias returns None and the pipeline handles it via
+        # aggregator resolution or provider detection
+        result = resolve_alias("deepseek", "openrouter")
+        # May or may not resolve depending on catalog state — just don't crash
+        if result:
+            _, model, _ = result
+            assert "deepseek" in model.lower()
+
+    def test_codex_alias(self):
+        from hermes_cli.model_switch import resolve_alias
+        result = resolve_alias("codex")
+        assert result is not None
+        _, model, _ = result
+        assert "codex" in model.lower()
+
+    def test_case_insensitive(self):
+        from hermes_cli.model_switch import resolve_alias
+        assert resolve_alias("SONNET") is not None
+        assert resolve_alias("Opus") is not None
+        assert resolve_alias("GPT5") is not None
+
+    def test_unknown_returns_none(self):
+        from hermes_cli.model_switch import resolve_alias
+        assert resolve_alias("nonexistent-model-xyz") is None
+        assert resolve_alias("") is None
+
+    def test_all_aliases_have_valid_identities(self):
+        """Every alias must have a vendor and family."""
+        from hermes_cli.model_switch import MODEL_ALIASES
+        for alias, identity in MODEL_ALIASES.items():
+            assert identity.vendor, f"Alias '{alias}' has empty vendor"
+            assert identity.family, f"Alias '{alias}' has empty family"
+
+    def test_alias_provider_aware_openrouter(self):
+        """On OpenRouter, sonnet resolves with vendor/ prefix."""
+        from hermes_cli.model_switch import resolve_alias
+        result = resolve_alias("sonnet", "openrouter")
+        assert result is not None
+        provider, model, _ = result
+        assert provider == "openrouter"
+        assert model.startswith("anthropic/")
+
+    def test_alias_provider_aware_anthropic(self):
+        """On native Anthropic, sonnet resolves with hyphens."""
+        from hermes_cli.model_switch import resolve_alias
+        result = resolve_alias("sonnet", "anthropic")
+        assert result is not None
+        provider, model, _ = result
+        assert provider == "anthropic"
+        assert "." not in model  # hyphens, not dots
+        assert "claude-sonnet" in model
+
+    def test_alias_unavailable_on_provider(self):
+        """GPT5 on native Anthropic returns None (not available)."""
+        from hermes_cli.model_switch import resolve_alias
+        result = resolve_alias("gpt5", "anthropic")
+        assert result is None
+
+
+# ═══════════════════════════════════════════════════════════════════════
+# Fuzzy matching and suggestions
+# ═══════════════════════════════════════════════════════════════════════
+
+class TestFuzzyMatching:
+    """Verify fuzzy matching suggests alternatives for typos."""
+
+    def test_close_typo_gets_suggestion(self):
+        from hermes_cli.model_switch import suggest_models
+        suggestions = suggest_models("sonet")  # missing 'n'
+        assert len(suggestions) > 0
+        # Should suggest "sonnet" or something close
+        assert any("sonnet" in s.lower() for s in suggestions)
+
+    def test_partial_name_gets_suggestion(self):
+        from hermes_cli.model_switch import suggest_models
+        suggestions = suggest_models("claude-sonn")
+        assert len(suggestions) > 0
+
+    def test_completely_wrong_gets_empty(self):
+        from hermes_cli.model_switch import suggest_models
+        suggestions = suggest_models("zzzzzzzzzzz")
+        # May or may not return suggestions — just shouldn't crash
+        assert isinstance(suggestions, list)
+
+    def test_suggestion_limit(self):
+        from hermes_cli.model_switch import suggest_models
+        suggestions = suggest_models("gpt", limit=2)
+        assert len(suggestions) <= 2
+
+
+# ═══════════════════════════════════════════════════════════════════════
+# CommandDef registration
+# ═══════════════════════════════════════════════════════════════════════
+
+class TestCommandRegistration:
+    """Verify /model is registered correctly in the command system."""
+
+    def test_model_command_exists(self):
+        from hermes_cli.commands import COMMAND_REGISTRY
+        names = [c.name for c in COMMAND_REGISTRY]
+        assert "model" in names
+
+    def test_model_command_properties(self):
+        from hermes_cli.commands import COMMAND_REGISTRY
+        cmd = next(c for c in COMMAND_REGISTRY if c.name == "model")
+        assert cmd.category == "Configuration"
+        assert not cmd.cli_only
+        assert not cmd.gateway_only
+        assert cmd.args_hint
+
+    def test_model_command_resolves(self):
+        from hermes_cli.commands import resolve_command
+        result = resolve_command("model")
+        assert result is not None
+        assert result.name == "model"
+
+    def test_model_in_gateway_known_commands(self):
+        from hermes_cli.commands import GATEWAY_KNOWN_COMMANDS
+        assert "model" in GATEWAY_KNOWN_COMMANDS
+
+
+# ═══════════════════════════════════════════════════════════════════════
+# Switch pipeline
+# ═══════════════════════════════════════════════════════════════════════
+
+class TestSwitchPipeline:
+    """Test the rebuilt model switch pipeline."""
+
+    def test_empty_input_error(self):
+        from hermes_cli.model_switch import switch_model
+        result = switch_model("", current_provider="openrouter")
+        assert not result.success
+        assert "No model" in result.error_message
+
+    def test_alias_resolves_in_pipeline(self):
+        """Typing 'sonnet' should resolve through the alias table."""
+        from hermes_cli.model_switch import switch_model
+        with patch("hermes_cli.runtime_provider.resolve_runtime_provider", return_value={
+                 "api_key": "test-key", "base_url": "https://openrouter.ai/api/v1", "api_mode": "",
+             }):
+            result = switch_model("sonnet", current_provider="openrouter")
+            assert result.success
+            assert "sonnet" in result.new_model.lower()
+            assert result.resolved_via_alias == "sonnet"
+
+    def test_vendor_colon_on_aggregator(self):
+        """openai:gpt-5.4 on OpenRouter becomes openai/gpt-5.4 (stays on aggregator)."""
+        from hermes_cli.model_switch import switch_model
+        with patch("hermes_cli.runtime_provider.resolve_runtime_provider", return_value={
+                 "api_key": "key", "base_url": "https://openrouter.ai/api/v1", "api_mode": "",
+             }):
+            result = switch_model("openai:gpt-5.4", current_provider="openrouter")
+            assert result.success
+            assert result.new_model == "openai/gpt-5.4"
+            assert result.target_provider == "openrouter"
+            assert not result.provider_changed  # stays on aggregator
+
+    def test_explicit_hermes_provider_model(self):
+        """anthropic:claude-opus-4 switches to the anthropic hermes provider."""
+        from hermes_cli.model_switch import switch_model
+        with patch("hermes_cli.models.parse_model_input", return_value=("anthropic", "claude-opus-4")), \
+             patch("hermes_cli.runtime_provider.resolve_runtime_provider", return_value={
+                 "api_key": "sk-ant", "base_url": "https://api.anthropic.com", "api_mode": "anthropic_messages",
+             }):
+            result = switch_model("anthropic:claude-opus-4", current_provider="openrouter")
+            assert result.success
+            assert result.target_provider == "anthropic"
+            assert result.provider_changed
+
+    def test_missing_credentials_actionable_error(self):
+        """Error message should be actionable when creds are missing."""
+        from hermes_cli.model_switch import switch_model
+        with patch("hermes_cli.models.parse_model_input", return_value=("anthropic", "claude-opus")), \
+             patch("hermes_cli.runtime_provider.resolve_runtime_provider",
+                   side_effect=Exception("No Anthropic credentials found")):
+            result = switch_model("anthropic:claude-opus", current_provider="openrouter")
+            assert not result.success
+            assert "hermes setup" in result.error_message.lower()
+
+    def test_unrecognized_model_warning(self):
+        """Unrecognized model gets a warning but still succeeds."""
+        from hermes_cli.model_switch import switch_model
+        with patch("hermes_cli.runtime_provider.resolve_runtime_provider", return_value={
+                 "api_key": "key", "base_url": "https://openrouter.ai/api/v1", "api_mode": "",
+             }), \
+             patch("hermes_cli.models.parse_model_input", return_value=("openrouter", "weird-unknown-model")), \
+             patch("hermes_cli.models.detect_provider_for_model", return_value=None):
+            result = switch_model("weird-unknown-model", current_provider="openrouter")
+            assert result.success
+            assert result.warning_message  # should have a warning
+
+    def test_custom_provider_error_message(self):
+        """Custom endpoint error gives specific guidance."""
+        from hermes_cli.model_switch import switch_model
+        with patch("hermes_cli.models.parse_model_input", return_value=("custom", "local-model")), \
+             patch("hermes_cli.runtime_provider.resolve_runtime_provider",
+                   side_effect=Exception("no endpoint")):
+            result = switch_model("custom:local-model", current_provider="openrouter")
+            assert not result.success
+            assert "config.yaml" in result.error_message or "hermes setup" in result.error_message
+
+    def test_persist_always_true_on_success(self):
+        """Successful switches should always persist."""
+        from hermes_cli.model_switch import switch_model
+        with patch("hermes_cli.runtime_provider.resolve_runtime_provider", return_value={
+                 "api_key": "key", "base_url": "https://openrouter.ai/api/v1", "api_mode": "",
+             }):
+            result = switch_model("opus", current_provider="openrouter")
+            assert result.success
+            assert result.persist is True
+
+
+# ═══════════════════════════════════════════════════════════════════════
+# AIAgent.switch_model()
+# ═══════════════════════════════════════════════════════════════════════
+
+class TestAgentSwitchModel:
+    """Test the AIAgent.switch_model() method."""
+
+    def _make_agent(self, model="test-model", provider="openrouter"):
+        """Create a minimal mock agent with the attributes switch_model needs."""
+        from run_agent import AIAgent
+        with patch.object(AIAgent, "__init__", lambda self: None):
+            agent = AIAgent()
+        agent.model = model
+        agent.provider = provider
+        agent.base_url = "https://openrouter.ai/api/v1"
+        agent.api_mode = "chat_completions"
+        agent.api_key = "test-key"
+        agent.client = MagicMock()
+        agent._client_kwargs = {"api_key": "test-key", "base_url": "https://openrouter.ai/api/v1"}
+        agent._use_prompt_caching = True
+        agent._cached_system_prompt = "cached prompt"
+        agent._fallback_activated = False
+        agent._fallback_index = 0
+        agent._anthropic_client = None
+        agent._anthropic_api_key = ""
+        agent._anthropic_base_url = None
+        agent._is_anthropic_oauth = False
+        agent._memory_store = None
+        cc = MagicMock()
+        cc.model = model
+        cc.base_url = "https://openrouter.ai/api/v1"
+        cc.api_key = "test-key"
+        cc.provider = provider
+        cc.context_length = 200000
+        cc.threshold_tokens = 160000
+        cc.threshold_percent = 0.8
+        agent.context_compressor = cc
+        agent._primary_runtime = {
+            "model": model, "provider": provider,
+            "base_url": "https://openrouter.ai/api/v1",
+            "api_mode": "chat_completions", "api_key": "test-key",
+            "client_kwargs": dict(agent._client_kwargs),
+            "use_prompt_caching": True,
+            "compressor_model": model, "compressor_base_url": "https://openrouter.ai/api/v1",
+            "compressor_api_key": "test-key", "compressor_provider": provider,
+            "compressor_context_length": 200000, "compressor_threshold_tokens": 160000,
+        }
+        agent._create_openai_client = MagicMock(return_value=MagicMock())
+        agent._is_direct_openai_url = MagicMock(return_value=False)
+        agent._invalidate_system_prompt = MagicMock()
+        return agent
+
+    def test_basic_switch(self):
+        agent = self._make_agent()
+        agent.switch_model(
+            new_model="claude-sonnet-4",
+            new_provider="openrouter",
+            api_key="test-key",
+            base_url="https://openrouter.ai/api/v1",
+        )
+        assert agent.model == "claude-sonnet-4"
+
+    def test_system_prompt_invalidated(self):
+        agent = self._make_agent()
+        agent.switch_model(
+            new_model="new-model", new_provider="openrouter",
+            api_key="key", base_url="https://openrouter.ai/api/v1",
+        )
+        agent._invalidate_system_prompt.assert_called_once()
+
+    def test_primary_runtime_updated(self):
+        agent = self._make_agent()
+        agent.switch_model(
+            new_model="gpt-5", new_provider="openai",
+            api_key="sk-test", base_url="https://api.openai.com/v1",
+        )
+        assert agent._primary_runtime["model"] == "gpt-5"
+        assert agent._primary_runtime["provider"] == "openai"
+
+    def test_prompt_caching_claude_on_openrouter(self):
+        agent = self._make_agent()
+        agent._use_prompt_caching = False
+        agent.switch_model(
+            new_model="anthropic/claude-sonnet-4",
+            new_provider="openrouter",
+            api_key="key", base_url="https://openrouter.ai/api/v1",
+        )
+        assert agent._use_prompt_caching is True
+
+    def test_prompt_caching_non_claude(self):
+        agent = self._make_agent()
+        agent._use_prompt_caching = True
+        agent.switch_model(
+            new_model="openai/gpt-5",
+            new_provider="openrouter",
+            api_key="key", base_url="https://openrouter.ai/api/v1",
+        )
+        assert agent._use_prompt_caching is False
+
+    def test_cross_api_mode_to_anthropic(self):
+        agent = self._make_agent()
+        with patch("agent.anthropic_adapter.build_anthropic_client", return_value=MagicMock()), \
+             patch("agent.anthropic_adapter.resolve_anthropic_token", return_value="sk-ant"), \
+             patch("agent.anthropic_adapter._is_oauth_token", return_value=False):
+            agent.switch_model(
+                new_model="claude-opus-4", new_provider="anthropic",
+                api_key="sk-ant",
+            )
+        assert agent.api_mode == "anthropic_messages"
+        assert agent.client is None
+
+    def test_switch_from_anthropic_clears_state(self):
+        agent = self._make_agent()
+        agent.api_mode = "anthropic_messages"
+        agent._anthropic_client = MagicMock()
+        agent.switch_model(
+            new_model="gpt-5", new_provider="openai",
+            api_key="sk-test", base_url="https://api.openai.com/v1",
+        )
+        assert agent.api_mode == "chat_completions"
+        assert agent._anthropic_client is None
+
+    def test_context_compressor_updated(self):
+        agent = self._make_agent()
+        with patch("agent.model_metadata.get_model_context_length", return_value=128000):
+            agent.switch_model(
+                new_model="gpt-4o", new_provider="openai",
+                api_key="key", base_url="https://api.openai.com/v1",
+            )
+        assert agent.context_compressor.context_length == 128000
+
+    def test_fallback_state_reset(self):
+        agent = self._make_agent()
+        agent._fallback_activated = True
+        agent._fallback_index = 2
+        agent.switch_model(
+            new_model="new", new_provider="openrouter",
+            api_key="key", base_url="https://openrouter.ai/api/v1",
+        )
+        assert agent._fallback_activated is False
+        assert agent._fallback_index == 0
+
+
+# ═══════════════════════════════════════════════════════════════════════
+# CLI handler
+# ═══════════════════════════════════════════════════════════════════════
+
+class TestCLIHandler:
+    """Test the CLI /model handler."""
+
+    def _make_cli(self, model="test-model", provider="openrouter"):
+        cli = MagicMock()
+        cli.model = model
+        cli.provider = provider
+        cli.base_url = "https://openrouter.ai/api/v1"
+        cli.api_key = "test-key"
+        cli.api_mode = "chat_completions"
+        cli.agent = MagicMock()
+        cli.agent.switch_model = MagicMock()
+        from cli import HermesCLI
+        cli._handle_model_switch = HermesCLI._handle_model_switch.__get__(cli)
+        return cli
+
+    def test_no_args_shows_aliases(self, capsys):
+        cli = self._make_cli()
+        with patch("hermes_cli.models._PROVIDER_LABELS", {"openrouter": "OpenRouter"}):
+            cli._handle_model_switch("/model")
+        captured = capsys.readouterr()
+        assert "sonnet" in captured.out
+        assert "opus" in captured.out
+        assert "gpt5" in captured.out
+
+    def test_alias_switch(self, capsys):
+        cli = self._make_cli()
+        mock_result = MagicMock()
+        mock_result.success = True
+        mock_result.new_model = "anthropic/claude-sonnet-4.6"
+        mock_result.target_provider = "openrouter"
+        mock_result.provider_changed = False
+        mock_result.api_key = "key"
+        mock_result.base_url = "https://openrouter.ai/api/v1"
+        mock_result.api_mode = ""
+        mock_result.persist = True
+        mock_result.warning_message = ""
+        mock_result.resolved_via_alias = "sonnet"
+
+        with patch("hermes_cli.model_switch.switch_model", return_value=mock_result), \
+             patch("hermes_cli.models._PROVIDER_LABELS", {"openrouter": "OpenRouter"}), \
+             patch("cli.save_config_value"):
+            cli._handle_model_switch("/model sonnet")
+
+        captured = capsys.readouterr()
+        assert "sonnet" in captured.out
+        assert "claude-sonnet" in captured.out
+        assert cli.model == "anthropic/claude-sonnet-4.6"
+
+    def test_failed_switch_shows_suggestions(self, capsys):
+        cli = self._make_cli()
+        mock_result = MagicMock()
+        mock_result.success = False
+        mock_result.error_message = "No credentials"
+
+        with patch("hermes_cli.model_switch.switch_model", return_value=mock_result), \
+             patch("hermes_cli.model_switch.suggest_models", return_value=["sonnet", "opus"]):
+            cli._handle_model_switch("/model sonet")
+
+        captured = capsys.readouterr()
+        assert "Did you mean" in captured.out
+
+    def test_same_model_noop(self, capsys):
+        cli = self._make_cli(model="anthropic/claude-sonnet-4.6")
+        cli._handle_model_switch("/model anthropic/claude-sonnet-4.6")
+        captured = capsys.readouterr()
+        assert "Already using" in captured.out
+
+
+# ═══════════════════════════════════════════════════════════════════════
+# Gateway handler
+# ═══════════════════════════════════════════════════════════════════════
+
+class TestGatewayHandler:
+    """Test the gateway /model handler."""
+
+    def _make_gateway_config(self, tmp_path, model="test-model", provider="openrouter"):
+        import yaml
+        config_dir = tmp_path / ".hermes"
+        config_dir.mkdir(exist_ok=True)
+        config_path = config_dir / "config.yaml"
+        config = {"model": {"default": model, "provider": provider}}
+        with open(config_path, "w") as f:
+            yaml.dump(config, f)
+        return config_path, config_dir
+
+    @pytest.mark.asyncio
+    async def test_no_args_shows_aliases(self, tmp_path, monkeypatch):
+        config_path, config_dir = self._make_gateway_config(tmp_path)
+        monkeypatch.setattr("gateway.run._hermes_home", config_dir)
+
+        from gateway.run import GatewayRunner
+        runner = MagicMock(spec=GatewayRunner)
+        runner._handle_model_command = GatewayRunner._handle_model_command.__get__(runner)
+
+        event = MagicMock()
+        event.get_command_args.return_value = ""
+
+        result = await runner._handle_model_command(event)
+        assert "sonnet" in result
+        assert "opus" in result
+        assert "gpt5" in result
+
+    @pytest.mark.asyncio
+    async def test_successful_switch_evicts_agent(self, tmp_path, monkeypatch):
+        config_path, config_dir = self._make_gateway_config(tmp_path)
+        monkeypatch.setattr("gateway.run._hermes_home", config_dir)
+
+        from gateway.run import GatewayRunner
+        runner = MagicMock(spec=GatewayRunner)
+        runner._handle_model_command = GatewayRunner._handle_model_command.__get__(runner)
+        runner._session_key_for_source = MagicMock(return_value="test-key")
+        runner._evict_cached_agent = MagicMock()
+
+        event = MagicMock()
+        event.get_command_args.return_value = "sonnet"
+        event.source = MagicMock()
+
+        mock_result = MagicMock()
+        mock_result.success = True
+        mock_result.new_model = "anthropic/claude-sonnet-4.6"
+        mock_result.target_provider = "openrouter"
+        mock_result.provider_changed = False
+        mock_result.persist = True
+        mock_result.warning_message = ""
+        mock_result.resolved_via_alias = "sonnet"
+        mock_result.is_custom_target = False
+
+        with patch("hermes_cli.model_switch.switch_model", return_value=mock_result), \
+             patch("hermes_cli.config.save_config"):
+            result = await runner._handle_model_command(event)
+
+        assert "sonnet" in result
+        runner._evict_cached_agent.assert_called_once()
+
+    @pytest.mark.asyncio
+    async def test_error_with_suggestions(self, tmp_path, monkeypatch):
+        config_path, config_dir = self._make_gateway_config(tmp_path)
+        monkeypatch.setattr("gateway.run._hermes_home", config_dir)
+
+        from gateway.run import GatewayRunner
+        runner = MagicMock(spec=GatewayRunner)
+        runner._handle_model_command = GatewayRunner._handle_model_command.__get__(runner)
+
+        event = MagicMock()
+        event.get_command_args.return_value = "sonet"  # typo
+        event.source = MagicMock()
+
+        mock_result = MagicMock()
+        mock_result.success = False
+        mock_result.error_message = "No credentials"
+
+        with patch("hermes_cli.model_switch.switch_model", return_value=mock_result), \
+             patch("hermes_cli.model_switch.suggest_models", return_value=["sonnet"]):
+            result = await runner._handle_model_command(event)
+
+        assert "Did you mean" in result
+        assert "sonnet" in result
+
+
+# ═══════════════════════════════════════════════════════════════════════
+# Edge cases
+# ═══════════════════════════════════════════════════════════════════════
+
+class TestEdgeCases:
+
+    def test_custom_auto_result(self):
+        from hermes_cli.model_switch import CustomAutoResult
+        r = CustomAutoResult(success=True, model="llama-3.3", base_url="http://localhost:11434")
+        assert r.success
+
+    def test_result_has_alias_field(self):
+        from hermes_cli.model_switch import ModelSwitchResult
+        r = ModelSwitchResult(success=True, resolved_via_alias="sonnet")
+        assert r.resolved_via_alias == "sonnet"
+
+    def test_switch_to_custom_no_endpoint(self):
+        from hermes_cli.model_switch import switch_to_custom_provider
+        with patch("hermes_cli.runtime_provider.resolve_runtime_provider",
+                   side_effect=Exception("no endpoint")):
+            result = switch_to_custom_provider()
+            assert not result.success
+            assert "config.yaml" in result.error_message
+
+    def test_opencode_api_mode_recompute(self):
+        from hermes_cli.model_switch import switch_model
+        with patch("hermes_cli.models.parse_model_input", return_value=("opencode-zen", "claude-opus")), \
+             patch("hermes_cli.runtime_provider.resolve_runtime_provider", return_value={
+                 "api_key": "key", "base_url": "https://example.com", "api_mode": "chat_completions",
+             }), \
+             patch("hermes_cli.models.opencode_model_api_mode", return_value="anthropic_messages") as mock_oc:
+            result = switch_model("opencode-zen:claude-opus", current_provider="openrouter")
+            assert result.success
+            assert result.api_mode == "anthropic_messages"
@@ -9,10 +9,13 @@ import pytest

 from tools.mcp_oauth import (
    HermesTokenStorage,
+    OAuthNonInteractiveError,
    build_oauth_auth,
    remove_oauth_tokens,
    _find_free_port,
    _can_open_browser,
+    _is_interactive,
+    _wait_for_callback,
 )


@@ -236,3 +239,99 @@ class TestRemoveOAuthTokens:
    def test_no_error_when_files_missing(self, tmp_path, monkeypatch):
        monkeypatch.setenv("HERMES_HOME", str(tmp_path))
        remove_oauth_tokens("nonexistent")  # should not raise
+
+
+# ---------------------------------------------------------------------------
+# Non-interactive / startup-safety tests (issue #4462)
+# ---------------------------------------------------------------------------
+
+class TestIsInteractive:
+    """_is_interactive() detects headless/daemon/container environments."""
+
+    def test_false_when_stdin_not_tty(self, monkeypatch):
+        mock_stdin = MagicMock()
+        mock_stdin.isatty.return_value = False
+        monkeypatch.setattr("tools.mcp_oauth.sys.stdin", mock_stdin)
+        assert _is_interactive() is False
+
+    def test_true_when_stdin_is_tty(self, monkeypatch):
+        mock_stdin = MagicMock()
+        mock_stdin.isatty.return_value = True
+        monkeypatch.setattr("tools.mcp_oauth.sys.stdin", mock_stdin)
+        assert _is_interactive() is True
+
+    def test_false_when_stdin_has_no_isatty(self, monkeypatch):
+        """Some environments replace stdin with an object without isatty()."""
+        mock_stdin = object()  # no isatty attribute
+        monkeypatch.setattr("tools.mcp_oauth.sys.stdin", mock_stdin)
+        assert _is_interactive() is False
+
+
+class TestWaitForCallbackNoBlocking:
+    """_wait_for_callback() must never call input() — it raises instead."""
+
+    def test_raises_on_timeout_instead_of_input(self):
+        """When no auth code arrives, raises OAuthNonInteractiveError."""
+        import tools.mcp_oauth as mod
+        import asyncio
+
+        mod._oauth_port = _find_free_port()
+
+        async def instant_sleep(_seconds):
+            pass
+
+        with patch.object(mod.asyncio, "sleep", instant_sleep):
+            with patch("builtins.input", side_effect=AssertionError("input() must not be called")):
+                with pytest.raises(OAuthNonInteractiveError, match="callback timed out"):
+                    asyncio.run(_wait_for_callback())
+
+
+class TestBuildOAuthAuthNonInteractive:
+    """build_oauth_auth() in non-interactive mode."""
+
+    def test_noninteractive_without_cached_tokens_warns(self, tmp_path, monkeypatch, caplog):
+        """Without cached tokens, non-interactive mode logs a clear warning."""
+        try:
+            from mcp.client.auth import OAuthClientProvider
+        except ImportError:
+            pytest.skip("MCP SDK auth not available")
+
+        monkeypatch.setenv("HERMES_HOME", str(tmp_path))
+        mock_stdin = MagicMock()
+        mock_stdin.isatty.return_value = False
+        monkeypatch.setattr("tools.mcp_oauth.sys.stdin", mock_stdin)
+
+        import logging
+        with caplog.at_level(logging.WARNING, logger="tools.mcp_oauth"):
+            auth = build_oauth_auth("atlassian", "https://mcp.atlassian.com/v1/mcp")
+
+        assert auth is not None
+        assert "no cached tokens found" in caplog.text.lower()
+        assert "non-interactive" in caplog.text.lower()
+
+    def test_noninteractive_with_cached_tokens_no_warning(self, tmp_path, monkeypatch, caplog):
+        """With cached tokens, non-interactive mode logs no 'no cached tokens' warning."""
+        try:
+            from mcp.client.auth import OAuthClientProvider
+        except ImportError:
+            pytest.skip("MCP SDK auth not available")
+
+        monkeypatch.setenv("HERMES_HOME", str(tmp_path))
+        mock_stdin = MagicMock()
+        mock_stdin.isatty.return_value = False
+        monkeypatch.setattr("tools.mcp_oauth.sys.stdin", mock_stdin)
+
+        # Pre-populate cached tokens
+        d = tmp_path / "mcp-tokens"
+        d.mkdir(parents=True)
+        (d / "atlassian.json").write_text(json.dumps({
+            "access_token": "cached",
+            "token_type": "Bearer",
+        }))
+
+        import logging
+        with caplog.at_level(logging.WARNING, logger="tools.mcp_oauth"):
+            auth = build_oauth_auth("atlassian", "https://mcp.atlassian.com/v1/mcp")
+
+        assert auth is not None
+        assert "no cached tokens found" not in caplog.text.lower()
@@ -0,0 +1,143 @@
+"""Tests for MCP stability fixes — event loop handler, PID tracking, shutdown robustness."""
+
+import asyncio
+import os
+import signal
+import threading
+from unittest.mock import patch, MagicMock
+
+import pytest
+
+
+# ---------------------------------------------------------------------------
+# Fix 1: MCP event loop exception handler
+# ---------------------------------------------------------------------------
+
+class TestMCPLoopExceptionHandler:
+    """_mcp_loop_exception_handler suppresses benign 'Event loop is closed'."""
+
+    def test_suppresses_event_loop_closed(self):
+        from tools.mcp_tool import _mcp_loop_exception_handler
+        loop = MagicMock()
+        context = {"exception": RuntimeError("Event loop is closed")}
+        # Should NOT call default handler
+        _mcp_loop_exception_handler(loop, context)
+        loop.default_exception_handler.assert_not_called()
+
+    def test_forwards_other_runtime_errors(self):
+        from tools.mcp_tool import _mcp_loop_exception_handler
+        loop = MagicMock()
+        context = {"exception": RuntimeError("some other error")}
+        _mcp_loop_exception_handler(loop, context)
+        loop.default_exception_handler.assert_called_once_with(context)
+
+    def test_forwards_non_runtime_errors(self):
+        from tools.mcp_tool import _mcp_loop_exception_handler
+        loop = MagicMock()
+        context = {"exception": ValueError("bad value")}
+        _mcp_loop_exception_handler(loop, context)
+        loop.default_exception_handler.assert_called_once_with(context)
+
+    def test_forwards_contexts_without_exception(self):
+        from tools.mcp_tool import _mcp_loop_exception_handler
+        loop = MagicMock()
+        context = {"message": "just a message"}
+        _mcp_loop_exception_handler(loop, context)
+        loop.default_exception_handler.assert_called_once_with(context)
+
+    def test_handler_installed_on_mcp_loop(self):
+        """_ensure_mcp_loop installs the exception handler on the new loop."""
+        import tools.mcp_tool as mcp_mod
+        try:
+            mcp_mod._ensure_mcp_loop()
+            with mcp_mod._lock:
+                loop = mcp_mod._mcp_loop
+            assert loop is not None
+            assert loop.get_exception_handler() is mcp_mod._mcp_loop_exception_handler
+        finally:
+            mcp_mod._stop_mcp_loop()
+
+
+# ---------------------------------------------------------------------------
+# Fix 2: stdio PID tracking
+# ---------------------------------------------------------------------------
+
+class TestStdioPidTracking:
+    """_snapshot_child_pids and _stdio_pids track subprocess PIDs."""
+
+    def test_snapshot_returns_set(self):
+        from tools.mcp_tool import _snapshot_child_pids
+        result = _snapshot_child_pids()
+        assert isinstance(result, set)
+        # All elements should be ints
+        for pid in result:
+            assert isinstance(pid, int)
+
+    def test_stdio_pids_starts_empty(self):
+        from tools.mcp_tool import _stdio_pids, _lock
+        with _lock:
+            # Might have residual state from other tests, just check type
+            assert isinstance(_stdio_pids, set)
+
+    def test_kill_orphaned_noop_when_empty(self):
+        """_kill_orphaned_mcp_children does nothing when no PIDs tracked."""
+        from tools.mcp_tool import _kill_orphaned_mcp_children, _stdio_pids, _lock
+
+        with _lock:
+            _stdio_pids.clear()
+
+        # Should not raise
+        _kill_orphaned_mcp_children()
+
+    def test_kill_orphaned_handles_dead_pids(self):
+        """_kill_orphaned_mcp_children gracefully handles already-dead PIDs."""
+        from tools.mcp_tool import _kill_orphaned_mcp_children, _stdio_pids, _lock
+
+        # Use a PID that definitely doesn't exist
+        fake_pid = 999999999
+        with _lock:
+            _stdio_pids.add(fake_pid)
+
+        # Should not raise (ProcessLookupError is caught)
+        _kill_orphaned_mcp_children()
+
+        with _lock:
+            assert fake_pid not in _stdio_pids
+
+
+# ---------------------------------------------------------------------------
+# Fix 3: MCP reload timeout (cli.py)
+# ---------------------------------------------------------------------------
+
+class TestMCPReloadTimeout:
+    """_check_config_mcp_changes uses a timeout on _reload_mcp."""
+
+    def test_reload_timeout_does_not_block_forever(self, tmp_path, monkeypatch):
+        """If _reload_mcp hangs, the config watcher times out and returns."""
+        import time
+
+        # Create a mock HermesCLI-like object with the needed attributes
+        class FakeCLI:
+            _config_mtime = 0.0
+            _config_mcp_servers = {}
+            _last_config_check = 0.0
+            _command_running = False
+            config = {}
+            agent = None
+
+            def _reload_mcp(self):
+                # Simulate a hang — sleep longer than the timeout
+                time.sleep(60)
+
+            def _slow_command_status(self, cmd):
+                return cmd
+
+        # This test verifies the timeout mechanism exists in the code
+        # by checking that _check_config_mcp_changes doesn't call
+        # _reload_mcp directly (it uses a thread now)
+        import inspect
+        from cli import HermesCLI
+        source = inspect.getsource(HermesCLI._check_config_mcp_changes)
+        # The fix adds threading.Thread for _reload_mcp
+        assert "Thread" in source or "thread" in source.lower(), \
+            "_check_config_mcp_changes should use a thread for _reload_mcp"
@@ -2900,3 +2900,164 @@ class TestMCPBuiltinCollisionGuard:
        assert mock_registry.get_toolset_for_tool("mcp_srv_do_thing") == "mcp-srv"

        _servers.pop("srv", None)
+
+
+# ---------------------------------------------------------------------------
+# sanitize_mcp_name_component
+# ---------------------------------------------------------------------------
+
+
+class TestSanitizeMcpNameComponent:
+    """Verify sanitize_mcp_name_component handles all edge cases."""
+
+    def test_hyphens_replaced(self):
+        from tools.mcp_tool import sanitize_mcp_name_component
+        assert sanitize_mcp_name_component("my-server") == "my_server"
+
+    def test_dots_replaced(self):
+        from tools.mcp_tool import sanitize_mcp_name_component
+        assert sanitize_mcp_name_component("ai.exa") == "ai_exa"
+
+    def test_slashes_replaced(self):
+        from tools.mcp_tool import sanitize_mcp_name_component
+        assert sanitize_mcp_name_component("ai.exa/exa") == "ai_exa_exa"
+
+    def test_mixed_special_characters(self):
+        from tools.mcp_tool import sanitize_mcp_name_component
+        assert sanitize_mcp_name_component("@scope/my-pkg.v2") == "_scope_my_pkg_v2"
+
+    def test_alphanumeric_and_underscores_preserved(self):
+        from tools.mcp_tool import sanitize_mcp_name_component
+        assert sanitize_mcp_name_component("my_server_123") == "my_server_123"
+
+    def test_empty_string(self):
+        from tools.mcp_tool import sanitize_mcp_name_component
+        assert sanitize_mcp_name_component("") == ""
+
+    def test_none_returns_empty(self):
+        from tools.mcp_tool import sanitize_mcp_name_component
+        assert sanitize_mcp_name_component(None) == ""
+
+    def test_slash_in_convert_mcp_schema(self):
+        """Server names with slashes produce valid tool names via _convert_mcp_schema."""
+        from tools.mcp_tool import _convert_mcp_schema
+
+        mcp_tool = _make_mcp_tool(name="search")
+        schema = _convert_mcp_schema("ai.exa/exa", mcp_tool)
+        assert schema["name"] == "mcp_ai_exa_exa_search"
+        # Must match Anthropic's pattern: ^[a-zA-Z0-9_-]{1,128}$
+        import re
+        assert re.match(r"^[a-zA-Z0-9_-]{1,128}$", schema["name"])
+
+    def test_slash_in_build_utility_schemas(self):
+        """Server names with slashes produce valid utility tool names."""
+        from tools.mcp_tool import _build_utility_schemas
+
+        schemas = _build_utility_schemas("ai.exa/exa")
+        for s in schemas:
+            name = s["schema"]["name"]
+            assert "/" not in name
+            assert "." not in name
+
+    def test_slash_in_sync_mcp_toolsets(self):
+        """_sync_mcp_toolsets uses sanitize consistently with _convert_mcp_schema."""
+        from tools.mcp_tool import sanitize_mcp_name_component
+
+        # Verify the prefix generation matches what _convert_mcp_schema produces
+        server_name = "ai.exa/exa"
+        safe_prefix = f"mcp_{sanitize_mcp_name_component(server_name)}_"
+        assert safe_prefix == "mcp_ai_exa_exa_"
+
+
+# ---------------------------------------------------------------------------
+# register_mcp_servers public API
+# ---------------------------------------------------------------------------
+
+
+class TestRegisterMcpServers:
+    """Verify the new register_mcp_servers() public API."""
+
+    def test_empty_servers_returns_empty(self):
+        from tools.mcp_tool import register_mcp_servers
+
+        with patch("tools.mcp_tool._MCP_AVAILABLE", True):
+            result = register_mcp_servers({})
+        assert result == []
+
+    def test_mcp_not_available_returns_empty(self):
+        from tools.mcp_tool import register_mcp_servers
+
+        with patch("tools.mcp_tool._MCP_AVAILABLE", False):
+            result = register_mcp_servers({"srv": {"command": "test"}})
+        assert result == []
+
+    def test_skips_already_connected_servers(self):
+        from tools.mcp_tool import register_mcp_servers, _servers
+
+        mock_server = _make_mock_server("existing")
+        _servers["existing"] = mock_server
+
+        try:
+            with patch("tools.mcp_tool._MCP_AVAILABLE", True), \
+                 patch("tools.mcp_tool._existing_tool_names", return_value=["mcp_existing_tool"]):
+                result = register_mcp_servers({"existing": {"command": "test"}})
+            assert result == ["mcp_existing_tool"]
+        finally:
+            _servers.pop("existing", None)
+
+    def test_skips_disabled_servers(self):
+        from tools.mcp_tool import register_mcp_servers, _servers
+
+        try:
+            with patch("tools.mcp_tool._MCP_AVAILABLE", True), \
+                 patch("tools.mcp_tool._existing_tool_names", return_value=[]):
+                result = register_mcp_servers({"srv": {"command": "test", "enabled": False}})
+            assert result == []
+        finally:
+            _servers.pop("srv", None)
+
+    def test_connects_new_servers(self):
+        from tools.mcp_tool import register_mcp_servers, _servers, _ensure_mcp_loop
+
+        fake_config = {"my_server": {"command": "npx", "args": ["test"]}}
+
+        async def fake_register(name, cfg):
+            server = _make_mock_server(name)
+            server._registered_tool_names = ["mcp_my_server_tool1"]
+            _servers[name] = server
+            return ["mcp_my_server_tool1"]
+
+        with patch("tools.mcp_tool._MCP_AVAILABLE", True), \
+             patch("tools.mcp_tool._discover_and_register_server", side_effect=fake_register), \
+             patch("tools.mcp_tool._existing_tool_names", return_value=["mcp_my_server_tool1"]):
+            _ensure_mcp_loop()
+            result = register_mcp_servers(fake_config)
+
+        assert "mcp_my_server_tool1" in result
+        _servers.pop("my_server", None)
+
+    def test_logs_summary_on_success(self):
+        from tools.mcp_tool import register_mcp_servers, _servers, _ensure_mcp_loop
+
+        fake_config = {"srv": {"command": "npx", "args": ["test"]}}
+
+        async def fake_register(name, cfg):
+            server = _make_mock_server(name)
+            server._registered_tool_names = ["mcp_srv_t1", "mcp_srv_t2"]
+            _servers[name] = server
+            return ["mcp_srv_t1", "mcp_srv_t2"]
+
+        with patch("tools.mcp_tool._MCP_AVAILABLE", True), \
+             patch("tools.mcp_tool._discover_and_register_server", side_effect=fake_register), \
+             patch("tools.mcp_tool._existing_tool_names", return_value=["mcp_srv_t1", "mcp_srv_t2"]):
+            _ensure_mcp_loop()
+
+            with patch("tools.mcp_tool.logger") as mock_logger:
+                register_mcp_servers(fake_config)
+
+                info_calls = [str(c) for c in mock_logger.info.call_args_list]
+                assert any("2 tool(s)" in c and "1 server(s)" in c for c in info_calls), (
+                    f"Summary should report 2 tools from 1 server, got: {info_calls}"
+                )
+
+        _servers.pop("srv", None)
@@ -5,6 +5,12 @@ Wraps the MCP SDK's built-in ``OAuthClientProvider`` (which implements
 authorization.  The SDK handles all of the heavy lifting: PKCE generation,
 metadata discovery, dynamic client registration, token exchange, and refresh.

+Startup safety:
+    The callback handler never calls blocking ``input()`` on the event loop.
+    In non-interactive environments (no TTY, SSH, headless), the OAuth flow
+    raises ``OAuthNonInteractiveError`` instead of blocking, so that the
+    server degrades gracefully and other MCP servers are not affected.
+
 Usage in mcp_tool.py::

    from tools.mcp_oauth import build_oauth_auth
@@ -19,6 +25,7 @@ import json
 import logging
 import os
 import socket
+import sys
 import threading
 import webbrowser
 from http.server import BaseHTTPRequestHandler, HTTPServer
@@ -28,6 +35,11 @@ from urllib.parse import parse_qs, urlparse

 logger = logging.getLogger(__name__)

+
+class OAuthNonInteractiveError(RuntimeError):
+    """Raised when OAuth requires user interaction but the environment is non-interactive."""
+    pass
+
 _TOKEN_DIR_NAME = "mcp-tokens"


@@ -164,7 +176,13 @@ async def _redirect_to_browser(auth_url: str) -> None:


 async def _wait_for_callback() -> tuple[str, str | None]:
-    """Start a local HTTP server on the pre-registered port and wait for the OAuth redirect."""
+    """Start a local HTTP server on the pre-registered port and wait for the OAuth redirect.
+
+    If the callback times out, raises ``OAuthNonInteractiveError`` instead of
+    calling blocking ``input()`` — the old ``input()`` call would block the
+    entire MCP asyncio event loop, preventing all other MCP servers from
+    connecting and potentially hanging Hermes startup indefinitely.
+    """
    global _oauth_port
    port = _oauth_port or _find_free_port()
    HandlerClass, result = _make_callback_handler()
@@ -186,8 +204,10 @@ async def _wait_for_callback() -> tuple[str, str | None]:
    code = result["auth_code"] or ""
    state = result["state"]
    if not code:
-        print("  Browser callback timed out. Paste the authorization code manually:")
-        code = input("  Code: ").strip()
+        raise OAuthNonInteractiveError(
+            "OAuth browser callback timed out after 120 seconds. "
+            "Run 'hermes mcp auth <server-name>' to authorize interactively."
+        )
    return code, state


@@ -199,6 +219,17 @@ def _can_open_browser() -> bool:
    return True


+def _is_interactive() -> bool:
+    """Check if the current environment can support interactive OAuth flows.
+
+    Returns False in headless/daemon/container environments where no user
+    can interact with a browser or paste an auth code.
+    """
+    if not hasattr(sys.stdin, "isatty") or not sys.stdin.isatty():
+        return False
+    return True
+
+
 # ---------------------------------------------------------------------------
 # Public API
 # ---------------------------------------------------------------------------
@@ -209,6 +240,11 @@ def build_oauth_auth(server_name: str, server_url: str):
    Uses the MCP SDK's ``OAuthClientProvider`` which handles discovery,
    registration, PKCE, token exchange, and refresh automatically.

+    In non-interactive environments (no TTY), this still returns a provider
+    so that **cached tokens and refresh flows work**.  Only the interactive
+    authorization-code grant will fail fast with a clear error instead of
+    blocking the event loop.
+
    Returns an ``OAuthClientProvider`` instance (implements ``httpx.Auth``),
    or ``None`` if the MCP SDK auth module is not available.
    """
@@ -219,6 +255,25 @@ def build_oauth_auth(server_name: str, server_url: str):
        logger.warning("MCP SDK auth module not available — OAuth disabled")
        return None

+    storage = HermesTokenStorage(server_name)
+    interactive = _is_interactive()
+
+    if not interactive:
+        # Check whether cached tokens exist.  If they do, the SDK can still
+        # use them (and refresh them) without any user interaction.  If not,
+        # we still build the provider — the callback_handler will raise
+        # OAuthNonInteractiveError if a fresh authorization is actually
+        # needed, which surfaces as a clean connection failure for this
+        # server only (other MCP servers are unaffected).
+        has_cached = storage._read_json(storage._tokens_path()) is not None
+        if not has_cached:
+            logger.warning(
+                "MCP server '%s' requires OAuth but no cached tokens found "
+                "and environment is non-interactive. The server will fail to "
+                "connect. Run 'hermes mcp auth %s' to authorize interactively.",
+                server_name, server_name,
+            )
+
    global _oauth_port
    _oauth_port = _find_free_port()
    redirect_uri = f"http://127.0.0.1:{_oauth_port}/callback"
@@ -232,14 +287,36 @@ def build_oauth_auth(server_name: str, server_url: str):
        token_endpoint_auth_method="none",
    )

-    storage = HermesTokenStorage(server_name)
+    # In non-interactive mode, the redirect handler logs the URL and the
+    # callback handler raises immediately — no blocking, no input().
+    redirect_handler = _redirect_to_browser
+    callback_handler = _wait_for_callback
+
+    if not interactive:
+        async def _noninteractive_redirect(auth_url: str) -> None:
+            logger.warning(
+                "MCP server '%s' needs OAuth authorization (non-interactive, "
+                "cannot open browser). URL: %s",
+                server_name, auth_url,
+            )
+
+        async def _noninteractive_callback() -> tuple[str, str | None]:
+            raise OAuthNonInteractiveError(
+                f"MCP server '{server_name}' requires interactive OAuth "
+                f"authorization but the environment is non-interactive "
+                f"(no TTY). Run 'hermes mcp auth {server_name}' to "
+                f"authorize, then restart."
+            )
+
+        redirect_handler = _noninteractive_redirect
+        callback_handler = _noninteractive_callback

    return OAuthClientProvider(
        server_url=server_url,
        client_metadata=client_metadata,
        storage=storage,
-        redirect_handler=_redirect_to_browser,
-        callback_handler=_wait_for_callback,
+        redirect_handler=redirect_handler,
+        callback_handler=callback_handler,
        timeout=120.0,
    )

@@ -842,13 +842,25 @@ class MCPServerTask:
        sampling_kwargs = self._sampling.session_kwargs() if self._sampling else {}
        if _MCP_NOTIFICATION_TYPES and _MCP_MESSAGE_HANDLER_SUPPORTED:
            sampling_kwargs["message_handler"] = self._make_message_handler()
+
+        # Snapshot child PIDs before spawning so we can track the new one.
+        pids_before = _snapshot_child_pids()
        async with stdio_client(server_params) as (read_stream, write_stream):
+            # Capture the newly spawned subprocess PID for force-kill cleanup.
+            new_pids = _snapshot_child_pids() - pids_before
+            if new_pids:
+                with _lock:
+                    _stdio_pids.update(new_pids)
            async with ClientSession(read_stream, write_stream, **sampling_kwargs) as session:
                await session.initialize()
                self.session = session
                await self._discover_tools()
                self._ready.set()
                await self._shutdown_event.wait()
+        # Context exited cleanly — subprocess was terminated by the SDK.
+        if new_pids:
+            with _lock:
+                _stdio_pids.difference_update(new_pids)

    async def _run_http(self, config: dict):
        """Run the server using HTTP/StreamableHTTP transport."""
@@ -863,7 +875,10 @@ class MCPServerTask:
        headers = dict(config.get("headers") or {})
        connect_timeout = config.get("connect_timeout", _DEFAULT_CONNECT_TIMEOUT)

-        # OAuth 2.1 PKCE: build httpx.Auth handler using the MCP SDK
+        # OAuth 2.1 PKCE: build httpx.Auth handler using the MCP SDK.
+        # If OAuth setup fails (e.g. non-interactive environment without
+        # cached tokens), re-raise so this server is reported as failed
+        # without blocking other MCP servers from connecting.
        _oauth_auth = None
        if self._auth_type == "oauth":
            try:
@@ -871,6 +886,7 @@ class MCPServerTask:
                _oauth_auth = build_oauth_auth(self.name, url)
            except Exception as exc:
                logger.warning("MCP OAuth setup failed for '%s': %s", self.name, exc)
+                raise

        sampling_kwargs = self._sampling.session_kwargs() if self._sampling else {}
        if _MCP_NOTIFICATION_TYPES and _MCP_MESSAGE_HANDLER_SUPPORTED:
@@ -1044,9 +1060,56 @@ _servers: Dict[str, MCPServerTask] = {}
 _mcp_loop: Optional[asyncio.AbstractEventLoop] = None
 _mcp_thread: Optional[threading.Thread] = None

-# Protects _mcp_loop, _mcp_thread, and _servers from concurrent access.
+# Protects _mcp_loop, _mcp_thread, _servers, and _stdio_pids.
 _lock = threading.Lock()

+# PIDs of stdio MCP server subprocesses.  Tracked so we can force-kill
+# them on shutdown if the graceful cleanup (SDK context-manager teardown)
+# fails or times out.  PIDs are added after connection and removed on
+# normal server shutdown.
+_stdio_pids: set = set()
+
+
+def _snapshot_child_pids() -> set:
+    """Return a set of current child process PIDs.
+
+    Uses /proc on Linux, falls back to psutil, then empty set.
+    Used by _run_stdio to identify the subprocess spawned by stdio_client.
+    """
+    my_pid = os.getpid()
+
+    # Linux: read from /proc
+    try:
+        children_path = f"/proc/{my_pid}/task/{my_pid}/children"
+        with open(children_path) as f:
+            return {int(p) for p in f.read().split() if p.strip()}
+    except (FileNotFoundError, OSError, ValueError):
+        pass
+
+    # Fallback: psutil
+    try:
+        import psutil
+        return {c.pid for c in psutil.Process(my_pid).children()}
+    except Exception:
+        pass
+
+    return set()
+
+
+def _mcp_loop_exception_handler(loop, context):
+    """Suppress benign 'Event loop is closed' noise during shutdown.
+
+    When the MCP event loop is stopped and closed, httpx/httpcore async
+    transports may fire __del__ finalizers that call call_soon() on the
+    dead loop.  asyncio catches that RuntimeError and routes it here.
+    We silence it because the connection is being torn down anyway; all
+    other exceptions are forwarded to the default handler.
+    """
+    exc = context.get("exception")
+    if isinstance(exc, RuntimeError) and "Event loop is closed" in str(exc):
+        return  # benign shutdown race — suppress
+    loop.default_exception_handler(context)
+

 def _ensure_mcp_loop():
    """Start the background event loop thread if not already running."""
@@ -1055,6 +1118,7 @@ def _ensure_mcp_loop():
        if _mcp_loop is not None and _mcp_loop.is_running():
            return
        _mcp_loop = asyncio.new_event_loop()
+        _mcp_loop.set_exception_handler(_mcp_loop_exception_handler)
        _mcp_thread = threading.Thread(
            target=_mcp_loop.run_forever,
            name="mcp-event-loop",
@@ -1406,6 +1470,17 @@ def _normalize_mcp_input_schema(schema: dict | None) -> dict:
    return schema


+def sanitize_mcp_name_component(value: str) -> str:
+    """Return an MCP name component safe for tool and prefix generation.
+
+    Preserves Hermes's historical behavior of converting hyphens to
+    underscores, and also replaces any other character outside
+    ``[A-Za-z0-9_]`` with ``_`` so generated tool names are compatible with
+    provider validation rules.
+    """
+    return re.sub(r"[^A-Za-z0-9_]", "_", str(value or ""))
+
+
 def _convert_mcp_schema(server_name: str, mcp_tool) -> dict:
    """Convert an MCP tool listing to the Hermes registry schema format.

@@ -1417,9 +1492,8 @@ def _convert_mcp_schema(server_name: str, mcp_tool) -> dict:
    Returns:
        A dict suitable for ``registry.register(schema=...)``.
    """
-    # Sanitize: replace hyphens and dots with underscores for LLM API compatibility
-    safe_tool_name = mcp_tool.name.replace("-", "_").replace(".", "_")
-    safe_server_name = server_name.replace("-", "_").replace(".", "_")
+    safe_tool_name = sanitize_mcp_name_component(mcp_tool.name)
+    safe_server_name = sanitize_mcp_name_component(server_name)
    prefixed_name = f"mcp_{safe_server_name}_{safe_tool_name}"
    return {
        "name": prefixed_name,
@@ -1449,7 +1523,7 @@ def _sync_mcp_toolsets(server_names: Optional[List[str]] = None) -> None:
    all_mcp_tools: List[str] = []

    for server_name in server_names:
-        safe_prefix = f"mcp_{server_name.replace('-', '_').replace('.', '_')}_"
+        safe_prefix = f"mcp_{sanitize_mcp_name_component(server_name)}_"
        server_tools = sorted(
            t for t in existing if t.startswith(safe_prefix)
        )
@@ -1485,7 +1559,7 @@ def _build_utility_schemas(server_name: str) -> List[dict]:
    Returns a list of (schema, handler_factory_name) tuples encoded as dicts
    with keys: schema, handler_key.
    """
-    safe_name = server_name.replace("-", "_").replace(".", "_")
+    safe_name = sanitize_mcp_name_component(server_name)
    return [
        {
            "schema": {
@@ -1772,6 +1846,86 @@ async def _discover_and_register_server(name: str, config: dict) -> List[str]:
 # Public API
 # ---------------------------------------------------------------------------

+def register_mcp_servers(servers: Dict[str, dict]) -> List[str]:
+    """Connect to explicit MCP servers and register their tools.
+
+    Idempotent for already-connected server names. Servers with
+    ``enabled: false`` are skipped without disconnecting existing sessions.
+
+    Args:
+        servers: Mapping of ``{server_name: server_config}``.
+
+    Returns:
+        List of all currently registered MCP tool names.
+    """
+    if not _MCP_AVAILABLE:
+        logger.debug("MCP SDK not available -- skipping explicit MCP registration")
+        return []
+
+    if not servers:
+        logger.debug("No explicit MCP servers provided")
+        return []
+
+    # Only attempt servers that aren't already connected and are enabled
+    # (enabled: false skips the server entirely without removing its config)
+    with _lock:
+        new_servers = {
+            k: v
+            for k, v in servers.items()
+            if k not in _servers and _parse_boolish(v.get("enabled", True), default=True)
+        }
+
+    if not new_servers:
+        _sync_mcp_toolsets(list(servers.keys()))
+        return _existing_tool_names()
+
+    # Start the background event loop for MCP connections
+    _ensure_mcp_loop()
+
+    async def _discover_one(name: str, cfg: dict) -> List[str]:
+        """Connect to a single server and return its registered tool names."""
+        return await _discover_and_register_server(name, cfg)
+
+    async def _discover_all():
+        server_names = list(new_servers.keys())
+        # Connect to all servers in PARALLEL
+        results = await asyncio.gather(
+            *(_discover_one(name, cfg) for name, cfg in new_servers.items()),
+            return_exceptions=True,
+        )
+        for name, result in zip(server_names, results):
+            if isinstance(result, Exception):
+                command = new_servers.get(name, {}).get("command")
+                logger.warning(
+                    "Failed to connect to MCP server '%s'%s: %s",
+                    name,
+                    f" (command={command})" if command else "",
+                    _format_connect_error(result),
+                )
+
+    # Per-server timeouts are handled inside _discover_and_register_server.
+    # The outer timeout is generous: 120s total for parallel discovery.
+    _run_on_mcp_loop(_discover_all(), timeout=120)
+
+    _sync_mcp_toolsets(list(servers.keys()))
+
+    # Log a summary so ACP callers get visibility into what was registered.
+    with _lock:
+        connected = [n for n in new_servers if n in _servers]
+        new_tool_count = sum(
+            len(getattr(_servers[n], "_registered_tool_names", []))
+            for n in connected
+        )
+    failed = len(new_servers) - len(connected)
+    if new_tool_count or failed:
+        summary = f"MCP: registered {new_tool_count} tool(s) from {len(connected)} server(s)"
+        if failed:
+            summary += f" ({failed} failed)"
+        logger.info(summary)
+
+    return _existing_tool_names()
+
+
 def discover_mcp_tools() -> List[str]:
    """Entry point: load config, connect to MCP servers, register tools.

@@ -1793,69 +1947,32 @@ def discover_mcp_tools() -> List[str]:
        logger.debug("No MCP servers configured")
        return []

-    # Only attempt servers that aren't already connected and are enabled
-    # (enabled: false skips the server entirely without removing its config)
    with _lock:
-        new_servers = {
-            k: v
-            for k, v in servers.items()
-            if k not in _servers and _parse_boolish(v.get("enabled", True), default=True)
-        }
+        new_server_names = [
+            name
+            for name, cfg in servers.items()
+            if name not in _servers and _parse_boolish(cfg.get("enabled", True), default=True)
+        ]

-    if not new_servers:
-        _sync_mcp_toolsets(list(servers.keys()))
-        return _existing_tool_names()
+    tool_names = register_mcp_servers(servers)
+    if not new_server_names:
+        return tool_names

-    # Start the background event loop for MCP connections
-    _ensure_mcp_loop()
-
-    all_tools: List[str] = []
-    failed_count = 0
-
-    async def _discover_one(name: str, cfg: dict) -> List[str]:
-        """Connect to a single server and return its registered tool names."""
-        return await _discover_and_register_server(name, cfg)
-
-    async def _discover_all():
-        nonlocal failed_count
-        server_names = list(new_servers.keys())
-        # Connect to all servers in PARALLEL
-        results = await asyncio.gather(
-            *(_discover_one(name, cfg) for name, cfg in new_servers.items()),
-            return_exceptions=True,
+    with _lock:
+        connected_server_names = [name for name in new_server_names if name in _servers]
+        new_tool_count = sum(
+            len(getattr(_servers[name], "_registered_tool_names", []))
+            for name in connected_server_names
        )
-        for name, result in zip(server_names, results):
-            if isinstance(result, Exception):
-                failed_count += 1
-                command = new_servers.get(name, {}).get("command")
-                logger.warning(
-                    "Failed to connect to MCP server '%s'%s: %s",
-                    name,
-                    f" (command={command})" if command else "",
-                    _format_connect_error(result),
-                )
-            elif isinstance(result, list):
-                all_tools.extend(result)
-            else:
-                failed_count += 1

-    # Per-server timeouts are handled inside _discover_and_register_server.
-    # The outer timeout is generous: 120s total for parallel discovery.
-    _run_on_mcp_loop(_discover_all(), timeout=120)
-
-    _sync_mcp_toolsets(list(servers.keys()))
-
-    # Print summary
-    total_servers = len(new_servers)
-    ok_servers = total_servers - failed_count
-    if all_tools or failed_count:
-        summary = f"  MCP: {len(all_tools)} tool(s) from {ok_servers} server(s)"
+    failed_count = len(new_server_names) - len(connected_server_names)
+    if new_tool_count or failed_count:
+        summary = f"  MCP: {new_tool_count} tool(s) from {len(connected_server_names)} server(s)"
        if failed_count:
            summary += f" ({failed_count} failed)"
        logger.info(summary)

-    # Return ALL registered tools (existing + newly discovered)
-    return _existing_tool_names()
+    return tool_names


 def get_mcp_status() -> List[dict]:
@@ -2004,6 +2121,29 @@ def shutdown_mcp_servers():
    _stop_mcp_loop()


+def _kill_orphaned_mcp_children() -> None:
+    """Best-effort kill of MCP stdio subprocesses that survived loop shutdown.
+
+    After the MCP event loop is stopped, stdio server subprocesses *should*
+    have been terminated by the SDK's context-manager cleanup.  If the loop
+    was stuck or the shutdown timed out, orphaned children may remain.
+
+    Only kills PIDs tracked in ``_stdio_pids`` — never arbitrary children.
+    """
+    import signal as _signal
+
+    with _lock:
+        pids = list(_stdio_pids)
+        _stdio_pids.clear()
+
+    for pid in pids:
+        try:
+            os.kill(pid, _signal.SIGKILL)
+            logger.debug("Force-killed orphaned MCP stdio process %d", pid)
+        except (ProcessLookupError, PermissionError, OSError):
+            pass  # Already exited or inaccessible
+
+
 def _stop_mcp_loop():
    """Stop the background event loop and join its thread."""
    global _mcp_loop, _mcp_thread
@@ -2016,4 +2156,10 @@ def _stop_mcp_loop():
        loop.call_soon_threadsafe(loop.stop)
        if thread is not None:
            thread.join(timeout=5)
-        loop.close()
+        try:
+            loop.close()
+        except Exception:
+            pass
+        # After closing the loop, any stdio subprocesses that survived the
+        # graceful shutdown are now orphaned.  Force-kill them.
+        _kill_orphaned_mcp_children()
@@ -127,8 +127,12 @@ def is_stt_enabled(stt_config: Optional[dict] = None) -> bool:


 def _has_openai_audio_backend() -> bool:
-    """Return True when OpenAI audio can use direct credentials or the managed gateway."""
-    return bool(resolve_openai_audio_api_key() or resolve_managed_tool_gateway("openai-audio"))
+    """Return True when OpenAI audio can use config credentials, env credentials, or the managed gateway."""
+    try:
+        _resolve_openai_audio_client_config()
+        return True
+    except ValueError:
+        return False


 def _find_binary(binary_name: str) -> Optional[str]:
@@ -577,13 +581,20 @@ def transcribe_audio(file_path: str, model: Optional[str] = None) -> Dict[str, A

 def _resolve_openai_audio_client_config() -> tuple[str, str]:
    """Return direct OpenAI audio config or a managed gateway fallback."""
+    stt_config = _load_stt_config()
+    openai_cfg = stt_config.get("openai", {})
+    cfg_api_key = openai_cfg.get("api_key", "")
+    cfg_base_url = openai_cfg.get("base_url", "")
+    if cfg_api_key:
+        return cfg_api_key, (cfg_base_url or OPENAI_BASE_URL)
+
    direct_api_key = resolve_openai_audio_api_key()
    if direct_api_key:
        return direct_api_key, OPENAI_BASE_URL

    managed_gateway = resolve_managed_tool_gateway("openai-audio")
    if managed_gateway is None:
-        message = "Neither VOICE_TOOLS_OPENAI_KEY nor OPENAI_API_KEY is set"
+        message = "Neither stt.openai.api_key in config nor VOICE_TOOLS_OPENAI_KEY/OPENAI_API_KEY is set"
        if managed_nous_tools_enabled():
            message += ", and the managed OpenAI audio gateway is unavailable"
        raise ValueError(message)
@@ -1017,6 +1017,31 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/c6/45/e6dd0c6c740c67c07474f2eb5175bb5656598488db444c4abd2a4e948393/daytona_toolbox_api_client_async-0.155.0-py3-none-any.whl", hash = "sha256:6ecf6351a31686d8e33ff054db69e279c45b574018b6c9a1cae15a7940412951", size = 176355, upload-time = "2026-03-24T14:47:36.327Z" },
 ]

+[[package]]
+name = "debugpy"
+version = "1.8.20"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/e0/b7/cd8080344452e4874aae67c40d8940e2b4d47b01601a8fd9f44786c757c7/debugpy-1.8.20.tar.gz", hash = "sha256:55bc8701714969f1ab89a6d5f2f3d40c36f91b2cbe2f65d98bf8196f6a6a2c33", size = 1645207, upload-time = "2026-01-29T23:03:28.199Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/51/56/c3baf5cbe4dd77427fd9aef99fcdade259ad128feeb8a786c246adb838e5/debugpy-1.8.20-cp311-cp311-macosx_15_0_universal2.whl", hash = "sha256:eada6042ad88fa1571b74bd5402ee8b86eded7a8f7b827849761700aff171f1b", size = 2208318, upload-time = "2026-01-29T23:03:36.481Z" },
+    { url = "https://files.pythonhosted.org/packages/9a/7d/4fa79a57a8e69fe0d9763e98d1110320f9ecd7f1f362572e3aafd7417c9d/debugpy-1.8.20-cp311-cp311-manylinux_2_34_x86_64.whl", hash = "sha256:7de0b7dfeedc504421032afba845ae2a7bcc32ddfb07dae2c3ca5442f821c344", size = 3171493, upload-time = "2026-01-29T23:03:37.775Z" },
+    { url = "https://files.pythonhosted.org/packages/7d/f2/1e8f8affe51e12a26f3a8a8a4277d6e60aa89d0a66512f63b1e799d424a4/debugpy-1.8.20-cp311-cp311-win32.whl", hash = "sha256:773e839380cf459caf73cc533ea45ec2737a5cc184cf1b3b796cd4fd98504fec", size = 5209240, upload-time = "2026-01-29T23:03:39.109Z" },
+    { url = "https://files.pythonhosted.org/packages/d5/92/1cb532e88560cbee973396254b21bece8c5d7c2ece958a67afa08c9f10dc/debugpy-1.8.20-cp311-cp311-win_amd64.whl", hash = "sha256:1f7650546e0eded1902d0f6af28f787fa1f1dbdbc97ddabaf1cd963a405930cb", size = 5233481, upload-time = "2026-01-29T23:03:40.659Z" },
+    { url = "https://files.pythonhosted.org/packages/14/57/7f34f4736bfb6e00f2e4c96351b07805d83c9a7b33d28580ae01374430f7/debugpy-1.8.20-cp312-cp312-macosx_15_0_universal2.whl", hash = "sha256:4ae3135e2089905a916909ef31922b2d733d756f66d87345b3e5e52b7a55f13d", size = 2550686, upload-time = "2026-01-29T23:03:42.023Z" },
+    { url = "https://files.pythonhosted.org/packages/ab/78/b193a3975ca34458f6f0e24aaf5c3e3da72f5401f6054c0dfd004b41726f/debugpy-1.8.20-cp312-cp312-manylinux_2_34_x86_64.whl", hash = "sha256:88f47850a4284b88bd2bfee1f26132147d5d504e4e86c22485dfa44b97e19b4b", size = 4310588, upload-time = "2026-01-29T23:03:43.314Z" },
+    { url = "https://files.pythonhosted.org/packages/c1/55/f14deb95eaf4f30f07ef4b90a8590fc05d9e04df85ee379712f6fb6736d7/debugpy-1.8.20-cp312-cp312-win32.whl", hash = "sha256:4057ac68f892064e5f98209ab582abfee3b543fb55d2e87610ddc133a954d390", size = 5331372, upload-time = "2026-01-29T23:03:45.526Z" },
+    { url = "https://files.pythonhosted.org/packages/a1/39/2bef246368bd42f9bd7cba99844542b74b84dacbdbea0833e610f384fee8/debugpy-1.8.20-cp312-cp312-win_amd64.whl", hash = "sha256:a1a8f851e7cf171330679ef6997e9c579ef6dd33c9098458bd9986a0f4ca52e3", size = 5372835, upload-time = "2026-01-29T23:03:47.245Z" },
+    { url = "https://files.pythonhosted.org/packages/15/e2/fc500524cc6f104a9d049abc85a0a8b3f0d14c0a39b9c140511c61e5b40b/debugpy-1.8.20-cp313-cp313-macosx_15_0_universal2.whl", hash = "sha256:5dff4bb27027821fdfcc9e8f87309a28988231165147c31730128b1c983e282a", size = 2539560, upload-time = "2026-01-29T23:03:48.738Z" },
+    { url = "https://files.pythonhosted.org/packages/90/83/fb33dcea789ed6018f8da20c5a9bc9d82adc65c0c990faed43f7c955da46/debugpy-1.8.20-cp313-cp313-manylinux_2_34_x86_64.whl", hash = "sha256:84562982dd7cf5ebebfdea667ca20a064e096099997b175fe204e86817f64eaf", size = 4293272, upload-time = "2026-01-29T23:03:50.169Z" },
+    { url = "https://files.pythonhosted.org/packages/a6/25/b1e4a01bfb824d79a6af24b99ef291e24189080c93576dfd9b1a2815cd0f/debugpy-1.8.20-cp313-cp313-win32.whl", hash = "sha256:da11dea6447b2cadbf8ce2bec59ecea87cc18d2c574980f643f2d2dfe4862393", size = 5331208, upload-time = "2026-01-29T23:03:51.547Z" },
+    { url = "https://files.pythonhosted.org/packages/13/f7/a0b368ce54ffff9e9028c098bd2d28cfc5b54f9f6c186929083d4c60ba58/debugpy-1.8.20-cp313-cp313-win_amd64.whl", hash = "sha256:eb506e45943cab2efb7c6eafdd65b842f3ae779f020c82221f55aca9de135ed7", size = 5372930, upload-time = "2026-01-29T23:03:53.585Z" },
+    { url = "https://files.pythonhosted.org/packages/33/2e/f6cb9a8a13f5058f0a20fe09711a7b726232cd5a78c6a7c05b2ec726cff9/debugpy-1.8.20-cp314-cp314-macosx_15_0_universal2.whl", hash = "sha256:9c74df62fc064cd5e5eaca1353a3ef5a5d50da5eb8058fcef63106f7bebe6173", size = 2538066, upload-time = "2026-01-29T23:03:54.999Z" },
+    { url = "https://files.pythonhosted.org/packages/c5/56/6ddca50b53624e1ca3ce1d1e49ff22db46c47ea5fb4c0cc5c9b90a616364/debugpy-1.8.20-cp314-cp314-manylinux_2_34_x86_64.whl", hash = "sha256:077a7447589ee9bc1ff0cdf443566d0ecf540ac8aa7333b775ebcb8ce9f4ecad", size = 4269425, upload-time = "2026-01-29T23:03:56.518Z" },
+    { url = "https://files.pythonhosted.org/packages/c5/d9/d64199c14a0d4c476df46c82470a3ce45c8d183a6796cfb5e66533b3663c/debugpy-1.8.20-cp314-cp314-win32.whl", hash = "sha256:352036a99dd35053b37b7803f748efc456076f929c6a895556932eaf2d23b07f", size = 5331407, upload-time = "2026-01-29T23:03:58.481Z" },
+    { url = "https://files.pythonhosted.org/packages/e0/d9/1f07395b54413432624d61524dfd98c1a7c7827d2abfdb8829ac92638205/debugpy-1.8.20-cp314-cp314-win_amd64.whl", hash = "sha256:a98eec61135465b062846112e5ecf2eebb855305acc1dfbae43b72903b8ab5be", size = 5372521, upload-time = "2026-01-29T23:03:59.864Z" },
+    { url = "https://files.pythonhosted.org/packages/e0/c3/7f67dea8ccf8fdcb9c99033bbe3e90b9e7395415843accb81428c441be2d/debugpy-1.8.20-py2.py3-none-any.whl", hash = "sha256:5be9bed9ae3be00665a06acaa48f8329d2b9632f15fd09f6a9a8c8d9907e54d7", size = 5337658, upload-time = "2026-01-29T23:04:17.404Z" },
+]
+
 [[package]]
 name = "deprecated"
 version = "1.3.1"
@@ -1133,6 +1158,24 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/97/a8/c070e1340636acb38d4e6a7e45c46d168a462b48b9b3257e14ca0e5af79b/environs-14.6.0-py3-none-any.whl", hash = "sha256:f8fb3d6c6a55872b0c6db077a28f5a8c7b8984b7c32029613d44cef95cfc0812", size = 17205, upload-time = "2026-02-20T04:02:07.299Z" },
 ]

+[[package]]
+name = "exa-py"
+version = "2.10.2"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "httpcore" },
+    { name = "httpx" },
+    { name = "openai" },
+    { name = "pydantic" },
+    { name = "python-dotenv" },
+    { name = "requests" },
+    { name = "typing-extensions" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/fe/4f/f06a6f277d668f143e330fe503b0027cc5fed753b22c3e161f8cbbccdf65/exa_py-2.10.2.tar.gz", hash = "sha256:f781f30b199f1102333384728adae64bb15a6bbcabfa97e91fd705f90acffc45", size = 53792, upload-time = "2026-03-26T20:29:35.764Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/e2/bc/7a34e904a415040ba626948d0b0a36a08cd073f12b13342578a68331be3c/exa_py-2.10.2-py3-none-any.whl", hash = "sha256:ecb2a7581f4b7a8aeb6b434acce1bbc40f92ed1d4126b2aa6029913acd904a47", size = 72248, upload-time = "2026-03-26T20:29:37.306Z" },
+]
+
 [[package]]
 name = "execnet"
 version = "2.1.2"
@@ -1600,13 +1643,13 @@ wheels = [

 [[package]]
 name = "hermes-agent"
-version = "0.5.0"
+version = "0.7.0"
 source = { editable = "." }
 dependencies = [
    { name = "anthropic" },
    { name = "edge-tts" },
+    { name = "exa-py" },
    { name = "fal-client" },
-    { name = "faster-whisper" },
    { name = "fire" },
    { name = "firecrawl-py" },
    { name = "httpx" },
@@ -1632,10 +1675,13 @@ all = [
    { name = "aiohttp" },
    { name = "croniter" },
    { name = "daytona" },
+    { name = "debugpy" },
    { name = "dingtalk-stream" },
    { name = "discord-py", extra = ["voice"] },
    { name = "elevenlabs" },
+    { name = "faster-whisper" },
    { name = "honcho-ai" },
+    { name = "lark-oapi" },
    { name = "mcp" },
    { name = "modal" },
    { name = "numpy" },
@@ -1660,6 +1706,7 @@ daytona = [
    { name = "daytona" },
 ]
 dev = [
+    { name = "debugpy" },
    { name = "mcp" },
    { name = "pytest" },
    { name = "pytest-asyncio" },
@@ -1668,6 +1715,9 @@ dev = [
 dingtalk = [
    { name = "dingtalk-stream" },
 ]
+feishu = [
+    { name = "lark-oapi" },
+]
 homeassistant = [
    { name = "aiohttp" },
 ]
@@ -1712,6 +1762,7 @@ tts-premium = [
    { name = "elevenlabs" },
 ]
 voice = [
+    { name = "faster-whisper" },
    { name = "numpy" },
    { name = "sounddevice" },
 ]
@@ -1729,13 +1780,15 @@ requires-dist = [
    { name = "atroposlib", marker = "extra == 'rl'", git = "https://github.com/NousResearch/atropos.git" },
    { name = "croniter", marker = "extra == 'cron'", specifier = ">=6.0.0,<7" },
    { name = "daytona", marker = "extra == 'daytona'", specifier = ">=0.148.0,<1" },
+    { name = "debugpy", marker = "extra == 'dev'", specifier = ">=1.8.0,<2" },
    { name = "dingtalk-stream", marker = "extra == 'dingtalk'", specifier = ">=0.1.0,<1" },
    { name = "discord-py", extras = ["voice"], marker = "extra == 'messaging'", specifier = ">=2.7.1,<3" },
    { name = "edge-tts", specifier = ">=7.2.7,<8" },
    { name = "elevenlabs", marker = "extra == 'tts-premium'", specifier = ">=1.0,<2" },
+    { name = "exa-py", specifier = ">=2.9.0,<3" },
    { name = "fal-client", specifier = ">=0.13.1,<1" },
    { name = "fastapi", marker = "extra == 'rl'", specifier = ">=0.104.0,<1" },
-    { name = "faster-whisper", specifier = ">=1.0.0,<2" },
+    { name = "faster-whisper", marker = "extra == 'voice'", specifier = ">=1.0.0,<2" },
    { name = "fire", specifier = ">=0.7.1,<1" },
    { name = "firecrawl-py", specifier = ">=4.16.0,<5" },
    { name = "hermes-agent", extras = ["acp"], marker = "extra == 'all'" },
@@ -1744,6 +1797,7 @@ requires-dist = [
    { name = "hermes-agent", extras = ["daytona"], marker = "extra == 'all'" },
    { name = "hermes-agent", extras = ["dev"], marker = "extra == 'all'" },
    { name = "hermes-agent", extras = ["dingtalk"], marker = "extra == 'all'" },
+    { name = "hermes-agent", extras = ["feishu"], marker = "extra == 'all'" },
    { name = "hermes-agent", extras = ["homeassistant"], marker = "extra == 'all'" },
    { name = "hermes-agent", extras = ["honcho"], marker = "extra == 'all'" },
    { name = "hermes-agent", extras = ["mcp"], marker = "extra == 'all'" },
@@ -1757,6 +1811,7 @@ requires-dist = [
    { name = "honcho-ai", marker = "extra == 'honcho'", specifier = ">=2.0.1,<3" },
    { name = "httpx", specifier = ">=0.28.1,<1" },
    { name = "jinja2", specifier = ">=3.1.5,<4" },
+    { name = "lark-oapi", marker = "extra == 'feishu'", specifier = ">=1.5.3,<2" },
    { name = "matrix-nio", extras = ["e2e"], marker = "extra == 'matrix'", specifier = ">=0.24.0,<1" },
    { name = "mcp", marker = "extra == 'dev'", specifier = ">=1.2.0,<2" },
    { name = "mcp", marker = "extra == 'mcp'", specifier = ">=1.2.0,<2" },
@@ -1789,7 +1844,7 @@ requires-dist = [
    { name = "wandb", marker = "extra == 'rl'", specifier = ">=0.15.0,<1" },
    { name = "yc-bench", marker = "python_full_version >= '3.12' and extra == 'yc-bench'", git = "https://github.com/collinear-ai/yc-bench.git" },
 ]
-provides-extras = ["modal", "daytona", "dev", "messaging", "cron", "slack", "matrix", "cli", "tts-premium", "voice", "pty", "honcho", "mcp", "homeassistant", "sms", "acp", "dingtalk", "rl", "yc-bench", "all"]
+provides-extras = ["modal", "daytona", "dev", "messaging", "cron", "slack", "matrix", "cli", "tts-premium", "voice", "pty", "honcho", "mcp", "homeassistant", "sms", "acp", "dingtalk", "feishu", "rl", "yc-bench", "all"]

 [[package]]
 name = "hf-transfer"
@@ -2267,6 +2322,21 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/0a/dd/8050c947d435c8d4bc94e3252f4d8bb8a76cfb424f043a8680be637a57f1/kiwisolver-1.5.0-pp311-pypy311_pp73-win_amd64.whl", hash = "sha256:59cd8683f575d96df5bb48f6add94afc055012c29e28124fcae2b63661b9efb1", size = 73558, upload-time = "2026-03-09T13:15:52.112Z" },
 ]

+[[package]]
+name = "lark-oapi"
+version = "1.5.3"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "httpx" },
+    { name = "pycryptodome" },
+    { name = "requests" },
+    { name = "requests-toolbelt" },
+    { name = "websockets" },
+]
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/bf/ff/2ece5d735ebfa2af600a53176f2636ae47af2bf934e08effab64f0d1e047/lark_oapi-1.5.3-py3-none-any.whl", hash = "sha256:fda6b32bb38d21b6bdaae94979c600b94c7c521e985adade63a54e4b3e20cc36", size = 6993016, upload-time = "2026-01-27T08:21:49.307Z" },
+]
+
 [[package]]
 name = "latex2sympy2-extended"
 version = "1.11.0"
@@ -4122,6 +4192,18 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/56/5d/c814546c2333ceea4ba42262d8c4d55763003e767fa169adc693bd524478/requests-2.33.0-py3-none-any.whl", hash = "sha256:3324635456fa185245e24865e810cecec7b4caf933d7eb133dcde67d48cee69b", size = 65017, upload-time = "2026-03-25T15:10:40.382Z" },
 ]

+[[package]]
+name = "requests-toolbelt"
+version = "1.0.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "requests" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/f3/61/d7545dafb7ac2230c70d38d31cbfe4cc64f7144dc41f6e4e4b78ecd9f5bb/requests-toolbelt-1.0.0.tar.gz", hash = "sha256:7681a0a3d047012b5bdc0ee37d7f8f07ebe76ab08caeccfc3921ce23c88d5bc6", size = 206888, upload-time = "2023-05-01T04:11:33.229Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/3f/51/d4db610ef29373b879047326cbf6fa98b6c1969d6f6dc423279de2b1be2c/requests_toolbelt-1.0.0-py2.py3-none-any.whl", hash = "sha256:cccfdd665f0a24fcf4726e690f65639d272bb0637b9b92dfd91a5568ccf6bd06", size = 54481, upload-time = "2023-05-01T04:11:28.427Z" },
+]
+
 [[package]]
 name = "rich"
 version = "14.3.3"
@@ -527,6 +527,187 @@ There is no hard limit. Each profile is just a directory under `~/.hermes/profil

 ---

+## Workflows & Patterns
+
+### Using different models for different tasks (multi-model workflows)
+
+**Scenario:** You use GPT-5.4 as your daily driver, but Gemini or Grok writes better social media content. Manually switching models every time is tedious.
+
+**Solution: Delegation config.** Hermes can route subagents to a different model automatically. Set this in `~/.hermes/config.yaml`:
+
+```yaml
+delegation:
+  model: "google/gemini-3-flash-preview"   # subagents use this model
+  provider: "openrouter"                    # provider for subagents
+```
+
+Now when you tell Hermes "write me a Twitter thread about X" and it spawns a `delegate_task` subagent, that subagent runs on Gemini instead of your main model. Your primary conversation stays on GPT-5.4.
+
+You can also be explicit in your prompt: *"Delegate a task to write social media posts about our product launch. Use your subagent for the actual writing."* The agent will use `delegate_task`, which automatically picks up the delegation config.
+
+For one-off model switches without delegation, use `/model` in the CLI:
+
+```bash
+/model google/gemini-3-flash-preview    # switch for this session
+# ... write your content ...
+/model openai/gpt-5.4                   # switch back
+```
+
+See [Subagent Delegation](../user-guide/features/delegation.md) for more on how delegation works.
+
+### Running multiple agents on one WhatsApp number (per-chat binding)
+
+**Scenario:** In OpenClaw, you had multiple independent agents bound to specific WhatsApp chats — one for a family shopping list group, another for your private chat. Can Hermes do this?
+
+**Current limitation:** Hermes profiles each require their own WhatsApp number/session. You cannot bind multiple profiles to different chats on the same WhatsApp number — the WhatsApp bridge (Baileys) uses one authenticated session per number.
+
+**Workarounds:**
+
+1. **Use a single profile with personality switching.** Create different `AGENTS.md` context files or use the `/personality` command to change behavior per chat. The agent sees which chat it's in and can adapt.
+
+2. **Use cron jobs for specialized tasks.** For a shopping list tracker, set up a cron job that monitors a specific chat and manages the list — no separate agent needed.
+
+3. **Use separate numbers.** If you need truly independent agents, pair each profile with its own WhatsApp number. Virtual numbers from services like Google Voice work for this.
+
+4. **Use Telegram or Discord instead.** These platforms support per-chat binding more naturally — each Telegram group or Discord channel gets its own session, and you can run multiple bot tokens (one per profile) on the same account.
+
+See [Profiles](../user-guide/profiles.md) and [WhatsApp setup](../user-guide/messaging/whatsapp.md) for more details.
+
+### Controlling what shows up in Telegram (hiding logs and reasoning)
+
+**Scenario:** You see gateway exec logs, Hermes reasoning, and tool call details in Telegram instead of just the final output.
+
+**Solution:** The `display.tool_progress` setting in `config.yaml` controls how much tool activity is shown:
+
+```yaml
+display:
+  tool_progress: "off"   # options: off, new, all, verbose
+```
+
+- **`off`** — Only the final response. No tool calls, no reasoning, no logs.
+- **`new`** — Shows new tool calls as they happen (brief one-liners).
+- **`all`** — Shows all tool activity including results.
+- **`verbose`** — Full detail including tool arguments and outputs.
+
+For messaging platforms, `off` or `new` is usually what you want. After editing `config.yaml`, restart the gateway for changes to take effect.
+
+You can also toggle this per-session with the `/verbose` command (if enabled):
+
+```yaml
+display:
+  tool_progress_command: true   # enables /verbose in the gateway
+```
+
+### Managing skills on Telegram (slash command limit)
+
+**Scenario:** Telegram has a 100 slash command limit, and your skills are pushing past it. You want to disable skills you don't need on Telegram, but `hermes skills config` settings don't seem to take effect.
+
+**Solution:** Use `hermes skills config` to disable skills per-platform. This writes to `config.yaml`:
+
+```yaml
+skills:
+  disabled: []                    # globally disabled skills
+  platform_disabled:
+    telegram: [skill-a, skill-b]  # disabled only on telegram
+```
+
+After changing this, **restart the gateway** (`hermes gateway restart` or kill and relaunch). The Telegram bot command menu rebuilds on startup.
+
+:::tip
+Skills with very long descriptions are truncated to 40 characters in the Telegram menu to stay within payload size limits. If skills aren't appearing, it may be a total payload size issue rather than the 100 command count limit — disabling unused skills helps with both.
+:::
+
+### Shared thread sessions (multiple users, one conversation)
+
+**Scenario:** You have a Telegram or Discord thread where multiple people mention the bot. You want all mentions in that thread to be part of one shared conversation, not separate per-user sessions.
+
+**Current behavior:** Hermes creates sessions keyed by user ID on most platforms, so each person gets their own conversation context. This is by design for privacy and context isolation.
+
+**Workarounds:**
+
+1. **Use Slack.** Slack sessions are keyed by thread, not by user. Multiple users in the same thread share one conversation — exactly the behavior you're describing. This is the most natural fit.
+
+2. **Use a group chat with a single user.** If one person is the designated "operator" who relays questions, the session stays unified. Others can read along.
+
+3. **Use a Discord channel.** Discord sessions are keyed by channel, so all users in the same channel share context. Use a dedicated channel for the shared conversation.
+
+### Exporting Hermes to another machine
+
+**Scenario:** You've built up skills, cron jobs, and memories on one machine and want to move everything to a new dedicated Linux box.
+
+**Solution:**
+
+1. Install Hermes Agent on the new machine:
+   ```bash
+   curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
+   ```
+
+2. Copy your entire `~/.hermes/` directory **except** the `hermes-agent` subdirectory (that's the code repo — the new install has its own):
+   ```bash
+   # On the source machine
+   rsync -av --exclude='hermes-agent' ~/.hermes/ newmachine:~/.hermes/
+   ```
+
+   Or use profile export/import:
+   ```bash
+   # On source machine
+   hermes profile export default ./hermes-backup.tar.gz
+
+   # On target machine
+   hermes profile import ./hermes-backup.tar.gz default
+   ```
+
+3. On the new machine, run `hermes setup` to verify API keys and provider config are working. Re-authenticate any messaging platforms (especially WhatsApp, which uses QR pairing).
+
+The `~/.hermes/` directory contains everything: `config.yaml`, `.env`, `SOUL.md`, `memories/`, `skills/`, `state.db` (sessions), `cron/`, and any custom plugins. The code itself lives in `~/.hermes/hermes-agent/` and is installed fresh.
+
+### Permission denied when reloading shell after install
+
+**Scenario:** After running the Hermes installer, `source ~/.zshrc` gives a permission denied error.
+
+**Cause:** This usually happens when `~/.zshrc` (or `~/.bashrc`) has incorrect file permissions, or when the installer couldn't write to it cleanly. It's not a Hermes-specific issue — it's a shell config permissions problem.
+
+**Solution:**
+```bash
+# Check permissions
+ls -la ~/.zshrc
+
+# Fix if needed (should be -rw-r--r-- or 644)
+chmod 644 ~/.zshrc
+
+# Then reload
+source ~/.zshrc
+
+# Or just open a new terminal window — it picks up PATH changes automatically
+```
+
+If the installer added the PATH line but permissions are wrong, you can add it manually:
+```bash
+echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.zshrc
+```
+
+### Error 400 on first agent run
+
+**Scenario:** Setup completes fine, but the first chat attempt fails with HTTP 400.
+
+**Cause:** Usually a model name mismatch — the configured model doesn't exist on your provider, or the API key doesn't have access to it.
+
+**Solution:**
+```bash
+# Check what model and provider are configured
+hermes config show | head -20
+
+# Re-run model selection
+hermes model
+
+# Or test with a known-good model
+hermes chat -q "hello" --model anthropic/claude-sonnet-4.6
+```
+
+If using OpenRouter, make sure your API key has credits. A 400 from OpenRouter often means the model requires a paid plan or the model ID has a typo.
+
+---
+
 ## Still Stuck?

 If your issue isn't covered here:
@@ -88,14 +88,13 @@ Example settings snippet:

 ```json
 {
-  "acp": {
-    "agents": [
-      {
-        "name": "hermes-agent",
-        "registry_dir": "/path/to/hermes-agent/acp_registry"
-      }
-    ]
-  }
+  "agent_servers": {
+    "hermes-agent": {
+      "type": "custom",
+      "command": "hermes",
+      "args": ["acp"],
+    },
+  },
 }
 ```

@@ -0,0 +1,113 @@
+---
+sidebar_position: 3
+---
+
+# Switching Models
+
+Change models mid-conversation without losing your chat history.
+
+```
+/model sonnet
+```
+
+That's it. Your conversation continues with the new model. Hermes formats the name correctly for whatever provider you're on — you don't need to think about it.
+
+## Quick Reference
+
+| You type | You get |
+|----------|---------|
+| `/model sonnet` | Claude Sonnet 4.6 |
+| `/model opus` | Claude Opus 4.6 |
+| `/model haiku` | Claude Haiku 4.5 |
+| `/model gpt5` | GPT-5.4 |
+| `/model gpt5-mini` | GPT-5.4 Mini |
+| `/model gpt5-pro` | GPT-5.4 Pro |
+| `/model codex` | GPT-5.3 Codex |
+| `/model gemini` | Gemini 3 Pro |
+| `/model gemini-flash` | Gemini 3 Flash |
+| `/model deepseek` | DeepSeek Chat |
+| `/model grok` | Grok 4.20 |
+| `/model qwen` | Qwen 3.6 Plus |
+| `/model minimax` | MiniMax M2.7 |
+
+These aliases **stay on your current provider**. If you're on OpenRouter, you stay on OpenRouter. If you're on native Anthropic, you stay on native Anthropic. The model name is formatted correctly for each — `anthropic/claude-sonnet-4.6` on OpenRouter becomes `claude-sonnet-4-6` on native Anthropic automatically.
+
+If the model isn't available on your current provider (like `/model gpt5` on native Anthropic), Hermes will switch to a provider that has it and tell you.
+
+Type `/model` with no arguments to see the full alias list and your current model.
+
+## Full Model Names
+
+Aliases cover the most popular models. For anything else, use the full name in your provider's format:
+
+```
+/model anthropic/claude-sonnet-4.5
+/model openai/gpt-5.4-nano
+/model nvidia/nemotron-3-super-120b-a12b
+```
+
+On OpenRouter these are the standard model IDs from [openrouter.ai/models](https://openrouter.ai). On other providers, use whatever model name that provider expects.
+
+If you're not sure of the exact name, type something close. Hermes will suggest corrections:
+
+```
+> /model claude-sonet
+  Note: Not in catalog — did you mean: anthropic/claude-sonnet-4.6?
+```
+
+## Switching Providers
+
+Aliases and bare model names keep you on your current provider. To explicitly switch to a different provider, use the provider prefix with a colon:
+
+```
+/model anthropic:claude-opus-4
+/model deepseek:deepseek-chat
+/model nous:anthropic/claude-opus-4.6
+```
+
+The part before the colon is the Hermes provider name (the same names from `hermes setup`). The part after is the model name as that provider knows it.
+
+To see which providers you have configured: `/provider`
+
+:::tip
+On OpenRouter, you can also use `openai:gpt-5.4` — Hermes knows "openai" is a vendor name on OpenRouter (not a separate Hermes provider) and converts it to `openai/gpt-5.4` automatically.
+:::
+
+## Custom / Local Endpoints
+
+If you've set up a local model server (Ollama, vLLM, LM Studio, etc.):
+
+```
+/model custom
+```
+
+This auto-detects the model running on your custom endpoint. If you have multiple models or want to specify one:
+
+```
+/model custom:llama-3.3-70b
+```
+
+Custom endpoints are configured in `~/.hermes/config.yaml` under `model.base_url`, or via the `OPENAI_BASE_URL` environment variable.
+
+## What Happens When You Switch
+
+- **Conversation history is preserved.** The new model picks up where the old one left off.
+- **Prompt cache resets.** The new model builds a fresh cache. This is unavoidable — different models have different cache keys.
+- **System prompt rebuilds.** Some models get tailored guidance (tool use patterns, etc.). The system prompt updates automatically.
+- **Config is saved.** The new model becomes your default for future sessions too.
+
+## Where It Works
+
+`/model` works everywhere Hermes runs:
+
+- CLI (`hermes chat`)
+- Telegram
+- Discord
+- Slack
+- Matrix
+- WhatsApp
+- Signal
+- Home Assistant
+- All other gateway platforms
+
+On messaging platforms, if the agent is currently processing a message, `/model` will ask you to wait or `/stop` first.