fix: add telegram.request mock and discovery fixture to remaining test files

The original PR missed test_dm_topics.py and test_telegram_network_reconnect.py — both need the telegram.request mock module. The reconnect test also needs _no_auto_discovery since _handle_polling_network_error calls connect() which now invokes discover_fallback_ips().
fix: share transport instance and downgrade seed fallback log to info
2026-03-27 03:59:01 -07:00 · 2026-03-27 03:56:42 -07:00 · 2026-03-27 03:56:41 -07:00
156 changed files with 1070 additions and 8566 deletions
@@ -1,13 +0,0 @@
-# Git
-.git
-.gitignore
-.gitmodules
-
-# Dependencies
-node_modules
-
-# CI/CD
-.github
-
-# Environment files
-.env
@@ -59,25 +59,12 @@ OPENCODE_ZEN_API_KEY=
 # OpenCode Go provides access to open models (GLM-5, Kimi K2.5, MiniMax M2.5)
 # $10/month subscription. Get your key at: https://opencode.ai/auth
 OPENCODE_GO_API_KEY=
-
-# =============================================================================
-# LLM PROVIDER (Hugging Face Inference Providers)
-# =============================================================================
-# Hugging Face routes to 20+ open models via unified OpenAI-compatible endpoint.
-# Free tier included ($0.10/month), no markup on provider rates.
-# Get your token at: https://huggingface.co/settings/tokens
-# Required permission: "Make calls to Inference Providers"
-HF_TOKEN=
 # OPENCODE_GO_BASE_URL=https://opencode.ai/zen/go/v1  # Override default base URL

 # =============================================================================
 # TOOL API KEYS
 # =============================================================================

-# Exa API Key - AI-native web search and contents
-# Get at: https://exa.ai
-EXA_API_KEY=
-
 # Parallel API Key - AI-native web search and extract
 # Get at: https://parallel.ai
 PARALLEL_API_KEY=
@@ -1,61 +0,0 @@
-name: Docker Build and Publish
-
-on:
-  push:
-    branches: [main]
-  pull_request:
-    branches: [main]
-
-concurrency:
-  group: docker-${{ github.ref }}
-  cancel-in-progress: true
-
-jobs:
-  build-and-push:
-    runs-on: ubuntu-latest
-    timeout-minutes: 30
-    steps:
-      - name: Checkout code
-        uses: actions/checkout@v4
-        with:
-          submodules: recursive
-
-      - name: Set up Docker Buildx
-        uses: docker/setup-buildx-action@v3
-
-      - name: Build image
-        uses: docker/build-push-action@v6
-        with:
-          context: .
-          file: Dockerfile
-          load: true
-          tags: nousresearch/hermes-agent:test
-          cache-from: type=gha
-          cache-to: type=gha,mode=max
-
-      - name: Test image starts
-        run: |
-          docker run --rm \
-            -v /tmp/hermes-test:/opt/data \
-            --entrypoint /opt/hermes/docker/entrypoint.sh \
-            nousresearch/hermes-agent:test --help
-
-      - name: Log in to Docker Hub
-        if: github.event_name == 'push' && github.ref == 'refs/heads/main'
-        uses: docker/login-action@v3
-        with:
-          username: ${{ secrets.DOCKERHUB_USERNAME }}
-          password: ${{ secrets.DOCKERHUB_TOKEN }}
-
-      - name: Push image
-        if: github.event_name == 'push' && github.ref == 'refs/heads/main'
-        uses: docker/build-push-action@v6
-        with:
-          context: .
-          file: Dockerfile
-          push: true
-          tags: |
-            nousresearch/hermes-agent:latest
-            nousresearch/hermes-agent:${{ github.sha }}
-          cache-from: type=gha
-          cache-to: type=gha,mode=max
@@ -1,20 +0,0 @@
-FROM debian:13.4
-
-RUN apt-get update
-RUN apt-get install -y nodejs npm python3 python3-pip ripgrep ffmpeg gcc python3-dev libffi-dev
-
-COPY . /opt/hermes
-WORKDIR /opt/hermes
-
-RUN pip install -e ".[all]" --break-system-packages
-RUN npm install
-RUN npx playwright install --with-deps chromium
-WORKDIR /opt/hermes/scripts/whatsapp-bridge
-RUN npm install
-
-WORKDIR /opt/hermes
-RUN chmod +x /opt/hermes/docker/entrypoint.sh
-
-ENV HERMES_HOME=/opt/data
-VOLUME [ "/opt/data" ]
-ENTRYPOINT [ "/opt/hermes/docker/entrypoint.sh" ]
@@ -1,348 +0,0 @@
-# Hermes Agent v0.5.0 (v2026.3.28)
-
-**Release Date:** March 28, 2026
-
-> The hardening release — Hugging Face provider, /model command overhaul, Telegram Private Chat Topics, native Modal SDK, plugin lifecycle hooks, tool-use enforcement for GPT models, Nix flake, 50+ security and reliability fixes, and a comprehensive supply chain audit.
-
---
-
-## ✨ Highlights
-
- **Nous Portal now supports 400+ models** — The Nous Research inference portal has expanded dramatically, giving Hermes Agent users access to over 400 models through a single provider endpoint
-
- **Hugging Face as a first-class inference provider** — Full integration with HF Inference API including curated agentic model picker that maps to OpenRouter analogues, live `/models` endpoint probe, and setup wizard flow ([#3419](https://github.com/NousResearch/hermes-agent/pull/3419), [#3440](https://github.com/NousResearch/hermes-agent/pull/3440))
-
- **Telegram Private Chat Topics** — Project-based conversations with functional skill binding per topic, enabling isolated workflows within a single Telegram chat ([#3163](https://github.com/NousResearch/hermes-agent/pull/3163))
-
- **Native Modal SDK backend** — Replaced swe-rex dependency with native Modal SDK (`Sandbox.create.aio` + `exec.aio`), eliminating tunnels and simplifying the Modal terminal backend ([#3538](https://github.com/NousResearch/hermes-agent/pull/3538))
-
- **Plugin lifecycle hooks activated** — `pre_llm_call`, `post_llm_call`, `on_session_start`, and `on_session_end` hooks now fire in the agent loop and CLI/gateway, completing the plugin hook system ([#3542](https://github.com/NousResearch/hermes-agent/pull/3542))
-
- **Improved OpenAI Model Reliability** — Added `GPT_TOOL_USE_GUIDANCE` to prevent GPT models from describing intended actions instead of making tool calls, plus automatic stripping of stale budget warnings from conversation history that caused models to avoid tools across turns ([#3528](https://github.com/NousResearch/hermes-agent/pull/3528))
-
- **Nix flake** — Full uv2nix build, NixOS module with persistent container mode, auto-generated config keys from Python source, and suffix PATHs for agent-friendliness ([#20](https://github.com/NousResearch/hermes-agent/pull/20), [#3274](https://github.com/NousResearch/hermes-agent/pull/3274), [#3061](https://github.com/NousResearch/hermes-agent/pull/3061)) by @alt-glitch
-
- **Supply chain hardening** — Removed compromised `litellm` dependency, pinned all dependency version ranges, regenerated `uv.lock` with hashes, added CI workflow scanning PRs for supply chain attack patterns, and bumped deps to fix CVEs ([#2796](https://github.com/NousResearch/hermes-agent/pull/2796), [#2810](https://github.com/NousResearch/hermes-agent/pull/2810), [#2812](https://github.com/NousResearch/hermes-agent/pull/2812), [#2816](https://github.com/NousResearch/hermes-agent/pull/2816), [#3073](https://github.com/NousResearch/hermes-agent/pull/3073))
-
- **Anthropic output limits fix** — Replaced hardcoded 16K `max_tokens` with per-model native output limits (128K for Opus 4.6, 64K for Sonnet 4.6), fixing "Response truncated" and thinking-budget exhaustion on direct Anthropic API ([#3426](https://github.com/NousResearch/hermes-agent/pull/3426), [#3444](https://github.com/NousResearch/hermes-agent/pull/3444))
-
---
-
-## 🏗️ Core Agent & Architecture
-
-### New Provider: Hugging Face
- First-class Hugging Face Inference API integration with auth, setup wizard, and model picker ([#3419](https://github.com/NousResearch/hermes-agent/pull/3419))
- Curated model list mapping OpenRouter agentic defaults to HF equivalents — providers with 8+ curated models skip live `/models` probe for speed ([#3440](https://github.com/NousResearch/hermes-agent/pull/3440))
- Added glm-5-turbo to Z.AI provider model list ([#3095](https://github.com/NousResearch/hermes-agent/pull/3095))
-
-### Provider & Model Improvements
- `/model` command overhaul — extracted shared `switch_model()` pipeline for CLI and gateway, custom endpoint support, provider-aware routing ([#2795](https://github.com/NousResearch/hermes-agent/pull/2795), [#2799](https://github.com/NousResearch/hermes-agent/pull/2799))
- Removed `/model` slash command from CLI and gateway in favor of `hermes model` subcommand ([#3080](https://github.com/NousResearch/hermes-agent/pull/3080))
- Preserve `custom` provider instead of silently remapping to `openrouter` ([#2792](https://github.com/NousResearch/hermes-agent/pull/2792))
- Read root-level `provider` and `base_url` from config.yaml into model config ([#3112](https://github.com/NousResearch/hermes-agent/pull/3112))
- Align Nous Portal model slugs with OpenRouter naming ([#3253](https://github.com/NousResearch/hermes-agent/pull/3253))
- Fix Alibaba provider default endpoint and model list ([#3484](https://github.com/NousResearch/hermes-agent/pull/3484))
- Allow MiniMax users to override `/v1` → `/anthropic` auto-correction ([#3553](https://github.com/NousResearch/hermes-agent/pull/3553))
- Migrate OAuth token refresh to `platform.claude.com` with fallback ([#3246](https://github.com/NousResearch/hermes-agent/pull/3246))
-
-### Agent Loop & Conversation
- **Improved OpenAI model reliability** — `GPT_TOOL_USE_GUIDANCE` prevents GPT models from describing actions instead of calling tools + automatic budget warning stripping from history ([#3528](https://github.com/NousResearch/hermes-agent/pull/3528))
- **Surface lifecycle events** — All retry, fallback, and compression events now surface to the user as formatted messages ([#3153](https://github.com/NousResearch/hermes-agent/pull/3153))
- **Anthropic output limits** — Per-model native output limits instead of hardcoded 16K `max_tokens` ([#3426](https://github.com/NousResearch/hermes-agent/pull/3426))
- **Thinking-budget exhaustion detection** — Skip useless continuation retries when model uses all output tokens on reasoning ([#3444](https://github.com/NousResearch/hermes-agent/pull/3444))
- Always prefer streaming for API calls to prevent hung subagents ([#3120](https://github.com/NousResearch/hermes-agent/pull/3120))
- Restore safe non-streaming fallback after stream failures ([#3020](https://github.com/NousResearch/hermes-agent/pull/3020))
- Give subagents independent iteration budgets ([#3004](https://github.com/NousResearch/hermes-agent/pull/3004))
- Update `api_key` in `_try_activate_fallback` for subagent auth ([#3103](https://github.com/NousResearch/hermes-agent/pull/3103))
- Graceful return on max retries instead of crashing thread ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Count compression restarts toward retry limit ([#3070](https://github.com/NousResearch/hermes-agent/pull/3070))
- Include tool tokens in preflight estimate, guard context probe persistence ([#3164](https://github.com/NousResearch/hermes-agent/pull/3164))
- Update context compressor limits after fallback activation ([#3305](https://github.com/NousResearch/hermes-agent/pull/3305))
- Validate empty user messages to prevent Anthropic API 400 errors ([#3322](https://github.com/NousResearch/hermes-agent/pull/3322))
- GLM reasoning-only and max-length handling ([#3010](https://github.com/NousResearch/hermes-agent/pull/3010))
- Increase API timeout default from 900s to 1800s for slow-thinking models ([#3431](https://github.com/NousResearch/hermes-agent/pull/3431))
- Send `max_tokens` for Claude/OpenRouter + retry SSE connection errors ([#3497](https://github.com/NousResearch/hermes-agent/pull/3497))
- Prevent AsyncOpenAI/httpx cross-loop deadlock in gateway mode ([#2701](https://github.com/NousResearch/hermes-agent/pull/2701)) by @ctlst
-
-### Streaming & Reasoning
- **Persist reasoning across gateway session turns** with new schema v6 columns (`reasoning`, `reasoning_details`, `codex_reasoning_items`) ([#2974](https://github.com/NousResearch/hermes-agent/pull/2974))
- Detect and kill stale SSE connections ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Fix stale stream detector race causing spurious `RemoteProtocolError` ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Skip duplicate callback for `<think>`-extracted reasoning during streaming ([#3116](https://github.com/NousResearch/hermes-agent/pull/3116))
- Preserve reasoning fields in `rewrite_transcript` ([#3311](https://github.com/NousResearch/hermes-agent/pull/3311))
- Preserve Gemini thought signatures in streamed tool calls ([#2997](https://github.com/NousResearch/hermes-agent/pull/2997))
- Ensure first delta is fired during reasoning updates ([untagged commit](https://github.com/NousResearch/hermes-agent))
-
-### Session & Memory
- **Session search recent sessions mode** — Omit query to browse recent sessions with titles, previews, and timestamps ([#2533](https://github.com/NousResearch/hermes-agent/pull/2533))
- **Session config surfacing** on `/new`, `/reset`, and auto-reset ([#3321](https://github.com/NousResearch/hermes-agent/pull/3321))
- **Third-party session isolation** — `--source` flag for isolating sessions by origin ([#3255](https://github.com/NousResearch/hermes-agent/pull/3255))
- Add `/resume` CLI handler, session log truncation guard, `reopen_session` API ([#3315](https://github.com/NousResearch/hermes-agent/pull/3315))
- Clear compressor summary and turn counter on `/clear` and `/new` ([#3102](https://github.com/NousResearch/hermes-agent/pull/3102))
- Surface silent SessionDB failures that cause session data loss ([#2999](https://github.com/NousResearch/hermes-agent/pull/2999))
- Session search fallback preview on summarization failure ([#3478](https://github.com/NousResearch/hermes-agent/pull/3478))
- Prevent stale memory overwrites by flush agent ([#2687](https://github.com/NousResearch/hermes-agent/pull/2687))
-
-### Context Compression
- Replace dead `summary_target_tokens` with ratio-based scaling ([#2554](https://github.com/NousResearch/hermes-agent/pull/2554))
- Expose `compression.target_ratio`, `protect_last_n`, and `threshold` in `DEFAULT_CONFIG` ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Restore sane defaults and cap summary at 12K tokens ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Preserve transcript on `/compress` and hygiene compression ([#3556](https://github.com/NousResearch/hermes-agent/pull/3556))
- Update context pressure warnings and token estimates after compaction ([untagged commit](https://github.com/NousResearch/hermes-agent))
-
-### Architecture & Dependencies
- **Remove mini-swe-agent dependency** — Inline Docker and Modal backends directly ([#2804](https://github.com/NousResearch/hermes-agent/pull/2804))
- **Replace swe-rex with native Modal SDK** for Modal backend ([#3538](https://github.com/NousResearch/hermes-agent/pull/3538))
- **Plugin lifecycle hooks** — `pre_llm_call`, `post_llm_call`, `on_session_start`, `on_session_end` now fire in the agent loop ([#3542](https://github.com/NousResearch/hermes-agent/pull/3542))
- Fix plugin toolsets invisible in `hermes tools` and standalone processes ([#3457](https://github.com/NousResearch/hermes-agent/pull/3457))
- Consolidate `get_hermes_home()` and `parse_reasoning_effort()` ([#3062](https://github.com/NousResearch/hermes-agent/pull/3062))
- Remove unused Hermes-native PKCE OAuth flow ([#3107](https://github.com/NousResearch/hermes-agent/pull/3107))
- Remove ~100 unused imports across 55 files ([#3016](https://github.com/NousResearch/hermes-agent/pull/3016))
- Fix 154 f-strings, simplify getattr/URL patterns, remove dead code ([#3119](https://github.com/NousResearch/hermes-agent/pull/3119))
-
---
-
-## 📱 Messaging Platforms (Gateway)
-
-### Telegram
- **Private Chat Topics** — Project-based conversations with functional skill binding per topic, enabling isolated workflows within a single Telegram chat ([#3163](https://github.com/NousResearch/hermes-agent/pull/3163))
- **Auto-discover fallback IPs via DNS-over-HTTPS** when `api.telegram.org` is unreachable ([#3376](https://github.com/NousResearch/hermes-agent/pull/3376))
- **Configurable reply threading mode** ([#2907](https://github.com/NousResearch/hermes-agent/pull/2907))
- Fall back to no `thread_id` on "Message thread not found" BadRequest ([#3390](https://github.com/NousResearch/hermes-agent/pull/3390))
- Self-reschedule reconnect when `start_polling` fails after 502 ([#3268](https://github.com/NousResearch/hermes-agent/pull/3268))
-
-### Discord
- Stop phantom typing indicator after agent turn completes ([#3003](https://github.com/NousResearch/hermes-agent/pull/3003))
-
-### Slack
- Send tool call progress messages to correct Slack thread ([#3063](https://github.com/NousResearch/hermes-agent/pull/3063))
- Scope progress thread fallback to Slack only ([#3488](https://github.com/NousResearch/hermes-agent/pull/3488))
-
-### WhatsApp
- Download documents, audio, and video media from messages ([#2978](https://github.com/NousResearch/hermes-agent/pull/2978))
-
-### Matrix
- Add missing Matrix entry in `PLATFORMS` dict ([#3473](https://github.com/NousResearch/hermes-agent/pull/3473))
- Harden e2ee access-token handling ([#3562](https://github.com/NousResearch/hermes-agent/pull/3562))
- Add backoff for `SyncError` in sync loop ([#3280](https://github.com/NousResearch/hermes-agent/pull/3280))
-
-### Signal
- Track SSE keepalive comments as connection activity ([#3316](https://github.com/NousResearch/hermes-agent/pull/3316))
-
-### Email
- Prevent unbounded growth of `_seen_uids` in EmailAdapter ([#3490](https://github.com/NousResearch/hermes-agent/pull/3490))
-
-### Gateway Core
- **Config-gated `/verbose` command** for messaging platforms — toggle tool output verbosity from chat ([#3262](https://github.com/NousResearch/hermes-agent/pull/3262))
- **Background review notifications** delivered to user chat ([#3293](https://github.com/NousResearch/hermes-agent/pull/3293))
- **Retry transient send failures** and notify user on exhaustion ([#3288](https://github.com/NousResearch/hermes-agent/pull/3288))
- Recover from hung agents — `/stop` hard-kills session lock ([#3104](https://github.com/NousResearch/hermes-agent/pull/3104))
- Thread-safe `SessionStore` — protect `_entries` with `threading.Lock` ([#3052](https://github.com/NousResearch/hermes-agent/pull/3052))
- Fix gateway token double-counting with cached agents — use absolute set instead of increment ([#3306](https://github.com/NousResearch/hermes-agent/pull/3306), [#3317](https://github.com/NousResearch/hermes-agent/pull/3317))
- Fingerprint full auth token in agent cache signature ([#3247](https://github.com/NousResearch/hermes-agent/pull/3247))
- Silence background agent terminal output ([#3297](https://github.com/NousResearch/hermes-agent/pull/3297))
- Include per-platform `ALLOW_ALL` and `SIGNAL_GROUP` in startup allowlist check ([#3313](https://github.com/NousResearch/hermes-agent/pull/3313))
- Include user-local bin paths in systemd unit PATH ([#3527](https://github.com/NousResearch/hermes-agent/pull/3527))
- Track background task references in `GatewayRunner` ([#3254](https://github.com/NousResearch/hermes-agent/pull/3254))
- Add request timeouts to HA, Email, Mattermost, SMS adapters ([#3258](https://github.com/NousResearch/hermes-agent/pull/3258))
- Add media download retry to Mattermost, Slack, and base cache ([#3323](https://github.com/NousResearch/hermes-agent/pull/3323))
- Detect virtualenv path instead of hardcoding `venv/` ([#2797](https://github.com/NousResearch/hermes-agent/pull/2797))
- Use `TERMINAL_CWD` for context file discovery, not process cwd ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Stop loading hermes repo AGENTS.md into gateway sessions (~10k wasted tokens) ([#2891](https://github.com/NousResearch/hermes-agent/pull/2891))
-
---
-
-## 🖥️ CLI & User Experience
-
-### Interactive CLI
- **Configurable busy input mode** + fix `/queue` always working ([#3298](https://github.com/NousResearch/hermes-agent/pull/3298))
- **Preserve user input on multiline paste** ([#3065](https://github.com/NousResearch/hermes-agent/pull/3065))
- **Tool generation callback** — streaming "preparing terminal…" updates during tool argument generation ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Show tool progress for substantive tools, not just "preparing" ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Buffer reasoning preview chunks and fix duplicate display ([#3013](https://github.com/NousResearch/hermes-agent/pull/3013))
- Prevent reasoning box from rendering 3x during tool-calling loops ([#3405](https://github.com/NousResearch/hermes-agent/pull/3405))
- Eliminate "Event loop is closed" / "Press ENTER to continue" during idle — three-layer fix with `neuter_async_httpx_del()`, custom exception handler, and stale client cleanup ([#3398](https://github.com/NousResearch/hermes-agent/pull/3398))
- Fix status bar shows 26K instead of 260K for token counts with trailing zeros ([#3024](https://github.com/NousResearch/hermes-agent/pull/3024))
- Fix status bar duplicates and degrades during long sessions ([#3291](https://github.com/NousResearch/hermes-agent/pull/3291))
- Refresh TUI before background task output to prevent status bar overlap ([#3048](https://github.com/NousResearch/hermes-agent/pull/3048))
- Suppress KawaiiSpinner animation under `patch_stdout` ([#2994](https://github.com/NousResearch/hermes-agent/pull/2994))
- Skip KawaiiSpinner when TUI handles tool progress ([#2973](https://github.com/NousResearch/hermes-agent/pull/2973))
- Guard `isatty()` against closed streams via `_is_tty` property ([#3056](https://github.com/NousResearch/hermes-agent/pull/3056))
- Ensure single closure of streaming boxes during tool generation ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Cap context pressure percentage at 100% in display ([#3480](https://github.com/NousResearch/hermes-agent/pull/3480))
- Clean up HTML error messages in CLI display ([#3069](https://github.com/NousResearch/hermes-agent/pull/3069))
- Show HTTP status code and 400 body in API error output ([#3096](https://github.com/NousResearch/hermes-agent/pull/3096))
- Extract useful info from HTML error pages, dump debug on max retries ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Prevent TypeError on startup when `base_url` is None ([#3068](https://github.com/NousResearch/hermes-agent/pull/3068))
- Prevent update crash in non-TTY environments ([#3094](https://github.com/NousResearch/hermes-agent/pull/3094))
- Handle EOFError in sessions delete/prune confirmation prompts ([#3101](https://github.com/NousResearch/hermes-agent/pull/3101))
- Catch KeyboardInterrupt during `flush_memories` on exit and in exit cleanup handlers ([#3025](https://github.com/NousResearch/hermes-agent/pull/3025), [#3257](https://github.com/NousResearch/hermes-agent/pull/3257))
- Guard `.strip()` against None values from YAML config ([#3552](https://github.com/NousResearch/hermes-agent/pull/3552))
- Guard `config.get()` against YAML null values to prevent AttributeError ([#3377](https://github.com/NousResearch/hermes-agent/pull/3377))
- Store asyncio task references to prevent GC mid-execution ([#3267](https://github.com/NousResearch/hermes-agent/pull/3267))
-
-### Setup & Configuration
- Use explicit key mapping for returning-user menu dispatch instead of positional index ([#3083](https://github.com/NousResearch/hermes-agent/pull/3083))
- Use `sys.executable` for pip in update commands to fix PEP 668 ([#3099](https://github.com/NousResearch/hermes-agent/pull/3099))
- Harden `hermes update` against diverged history, non-main branches, and gateway edge cases ([#3492](https://github.com/NousResearch/hermes-agent/pull/3492))
- OpenClaw migration overwrites defaults and setup wizard skips imported sections — fixed ([#3282](https://github.com/NousResearch/hermes-agent/pull/3282))
- Stop recursive AGENTS.md walk, load top-level only ([#3110](https://github.com/NousResearch/hermes-agent/pull/3110))
- Add macOS Homebrew paths to browser and terminal PATH resolution ([#2713](https://github.com/NousResearch/hermes-agent/pull/2713))
- YAML boolean handling for `tool_progress` config ([#3300](https://github.com/NousResearch/hermes-agent/pull/3300))
- Reset default SOUL.md to baseline identity text ([#3159](https://github.com/NousResearch/hermes-agent/pull/3159))
- Reject relative cwd paths for container terminal backends ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Add explicit `hermes-api-server` toolset for API server platform ([#3304](https://github.com/NousResearch/hermes-agent/pull/3304))
- Reorder setup wizard providers — OpenRouter first ([untagged commit](https://github.com/NousResearch/hermes-agent))
-
---
-
-## 🔧 Tool System
-
-### API Server
- **Idempotency-Key support**, body size limit, and OpenAI error envelope ([#2903](https://github.com/NousResearch/hermes-agent/pull/2903))
- Allow Idempotency-Key in CORS headers ([#3530](https://github.com/NousResearch/hermes-agent/pull/3530))
- Cancel orphaned agent + true interrupt on SSE disconnect ([#3427](https://github.com/NousResearch/hermes-agent/pull/3427))
- Fix streaming breaks when agent makes tool calls ([#2985](https://github.com/NousResearch/hermes-agent/pull/2985))
-
-### Terminal & File Operations
- Handle addition-only hunks in V4A patch parser ([#3325](https://github.com/NousResearch/hermes-agent/pull/3325))
- Exponential backoff for persistent shell polling ([#2996](https://github.com/NousResearch/hermes-agent/pull/2996))
- Add timeout to subprocess calls in `context_references` ([#3469](https://github.com/NousResearch/hermes-agent/pull/3469))
-
-### Browser & Vision
- Handle 402 insufficient credits error in vision tool ([#2802](https://github.com/NousResearch/hermes-agent/pull/2802))
- Fix `browser_vision` ignores `auxiliary.vision.timeout` config ([#2901](https://github.com/NousResearch/hermes-agent/pull/2901))
- Make browser command timeout configurable via config.yaml ([#2801](https://github.com/NousResearch/hermes-agent/pull/2801))
-
-### MCP
- MCP toolset resolution for runtime and config ([#3252](https://github.com/NousResearch/hermes-agent/pull/3252))
- Add MCP tool name collision protection ([#3077](https://github.com/NousResearch/hermes-agent/pull/3077))
-
-### Auxiliary LLM
- Guard aux LLM calls against None content + reasoning fallback + retry ([#3449](https://github.com/NousResearch/hermes-agent/pull/3449))
- Catch ImportError from `build_anthropic_client` in vision auto-detection ([#3312](https://github.com/NousResearch/hermes-agent/pull/3312))
-
-### Other Tools
- Add request timeouts to `send_message_tool` HTTP calls ([#3162](https://github.com/NousResearch/hermes-agent/pull/3162)) by @memosr
- Auto-repair `jobs.json` with invalid control characters ([#3537](https://github.com/NousResearch/hermes-agent/pull/3537))
- Enable fine-grained tool streaming for Claude/OpenRouter ([#3497](https://github.com/NousResearch/hermes-agent/pull/3497))
-
---
-
-## 🧩 Skills Ecosystem
-
-### Skills System
- **Env var passthrough** for skills and user config — skills can declare environment variables to pass through ([#2807](https://github.com/NousResearch/hermes-agent/pull/2807))
- Cache skills prompt with shared `skill_utils` module for faster TTFT ([#3421](https://github.com/NousResearch/hermes-agent/pull/3421))
- Avoid redundant file re-read for skill conditions ([#2992](https://github.com/NousResearch/hermes-agent/pull/2992))
- Use Git Trees API to prevent silent subdirectory loss during install ([#2995](https://github.com/NousResearch/hermes-agent/pull/2995))
- Fix skills-sh install for deeply nested repo structures ([#2980](https://github.com/NousResearch/hermes-agent/pull/2980))
- Handle null metadata in skill frontmatter ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Preserve trust for skills-sh identifiers + reduce resolution churn ([#3251](https://github.com/NousResearch/hermes-agent/pull/3251))
- Agent-created skills were incorrectly treated as untrusted community content — fixed ([untagged commit](https://github.com/NousResearch/hermes-agent))
-
-### New Skills
- **G0DM0D3 godmode jailbreaking skill** + docs ([#3157](https://github.com/NousResearch/hermes-agent/pull/3157))
- **Docker management skill** added to optional-skills ([#3060](https://github.com/NousResearch/hermes-agent/pull/3060))
- **OpenClaw migration v2** — 17 new modules, terminal recap for migrating from OpenClaw to Hermes ([#2906](https://github.com/NousResearch/hermes-agent/pull/2906))
-
---
-
-## 🔒 Security & Reliability
-
-### Security Hardening
- **SSRF protection** added to `browser_navigate` ([#3058](https://github.com/NousResearch/hermes-agent/pull/3058))
- **SSRF protection** added to `vision_tools` and `web_tools` (hardened) ([#2679](https://github.com/NousResearch/hermes-agent/pull/2679))
- **Restrict subagent toolsets** to parent's enabled set ([#3269](https://github.com/NousResearch/hermes-agent/pull/3269))
- **Prevent zip-slip path traversal** in self-update ([#3250](https://github.com/NousResearch/hermes-agent/pull/3250))
- **Prevent shell injection** in `_expand_path` via `~user` path suffix ([#2685](https://github.com/NousResearch/hermes-agent/pull/2685))
- **Normalize input** before dangerous command detection ([#3260](https://github.com/NousResearch/hermes-agent/pull/3260))
- Make tirith block verdicts approvable instead of hard-blocking ([#3428](https://github.com/NousResearch/hermes-agent/pull/3428))
- Remove compromised `litellm`/`typer`/`platformdirs` from deps ([#2796](https://github.com/NousResearch/hermes-agent/pull/2796))
- Pin all dependency version ranges ([#2810](https://github.com/NousResearch/hermes-agent/pull/2810))
- Regenerate `uv.lock` with hashes, use lockfile in setup ([#2812](https://github.com/NousResearch/hermes-agent/pull/2812))
- Bump dependencies to fix CVEs + regenerate `uv.lock` ([#3073](https://github.com/NousResearch/hermes-agent/pull/3073))
- Supply chain audit CI workflow for PR scanning ([#2816](https://github.com/NousResearch/hermes-agent/pull/2816))
-
-### Reliability
- **SQLite WAL write-lock contention** causing 15-20s TUI freeze — fixed ([#3385](https://github.com/NousResearch/hermes-agent/pull/3385))
- **SQLite concurrency hardening** + session transcript integrity ([#3249](https://github.com/NousResearch/hermes-agent/pull/3249))
- Prevent recurring cron job re-fire on gateway crash/restart loop ([#3396](https://github.com/NousResearch/hermes-agent/pull/3396))
- Mark cron session as ended after job completes ([#2998](https://github.com/NousResearch/hermes-agent/pull/2998))
-
---
-
-## ⚡ Performance
-
- **TTFT startup optimizations** — salvaged easy-win startup improvements ([#3395](https://github.com/NousResearch/hermes-agent/pull/3395))
- Cache skills prompt with shared `skill_utils` module ([#3421](https://github.com/NousResearch/hermes-agent/pull/3421))
- Avoid redundant file re-read for skill conditions in prompt builder ([#2992](https://github.com/NousResearch/hermes-agent/pull/2992))
-
---
-
-## 🐛 Notable Bug Fixes
-
- Fix gateway token double-counting with cached agents ([#3306](https://github.com/NousResearch/hermes-agent/pull/3306), [#3317](https://github.com/NousResearch/hermes-agent/pull/3317))
- Fix "Event loop is closed" / "Press ENTER to continue" during idle sessions ([#3398](https://github.com/NousResearch/hermes-agent/pull/3398))
- Fix reasoning box rendering 3x during tool-calling loops ([#3405](https://github.com/NousResearch/hermes-agent/pull/3405))
- Fix status bar shows 26K instead of 260K for token counts ([#3024](https://github.com/NousResearch/hermes-agent/pull/3024))
- Fix `/queue` always working regardless of config ([#3298](https://github.com/NousResearch/hermes-agent/pull/3298))
- Fix phantom Discord typing indicator after agent turn ([#3003](https://github.com/NousResearch/hermes-agent/pull/3003))
- Fix Slack progress messages appearing in wrong thread ([#3063](https://github.com/NousResearch/hermes-agent/pull/3063))
- Fix WhatsApp media downloads (documents, audio, video) ([#2978](https://github.com/NousResearch/hermes-agent/pull/2978))
- Fix Telegram "Message thread not found" killing progress messages ([#3390](https://github.com/NousResearch/hermes-agent/pull/3390))
- Fix OpenClaw migration overwriting defaults ([#3282](https://github.com/NousResearch/hermes-agent/pull/3282))
- Fix returning-user setup menu dispatching wrong section ([#3083](https://github.com/NousResearch/hermes-agent/pull/3083))
- Fix `hermes update` PEP 668 "externally-managed-environment" error ([#3099](https://github.com/NousResearch/hermes-agent/pull/3099))
- Fix subagents hitting `max_iterations` prematurely via shared budget ([#3004](https://github.com/NousResearch/hermes-agent/pull/3004))
- Fix YAML boolean handling for `tool_progress` config ([#3300](https://github.com/NousResearch/hermes-agent/pull/3300))
- Fix `config.get()` crashes on YAML null values ([#3377](https://github.com/NousResearch/hermes-agent/pull/3377))
- Fix `.strip()` crash on None values from YAML config ([#3552](https://github.com/NousResearch/hermes-agent/pull/3552))
- Fix hung agents on gateway — `/stop` now hard-kills session lock ([#3104](https://github.com/NousResearch/hermes-agent/pull/3104))
- Fix `_custom` provider silently remapped to `openrouter` ([#2792](https://github.com/NousResearch/hermes-agent/pull/2792))
- Fix Matrix missing from `PLATFORMS` dict ([#3473](https://github.com/NousResearch/hermes-agent/pull/3473))
- Fix Email adapter unbounded `_seen_uids` growth ([#3490](https://github.com/NousResearch/hermes-agent/pull/3490))
-
---
-
-## 🧪 Testing
-
- Pin `agent-client-protocol` < 0.9 to handle breaking upstream release ([#3320](https://github.com/NousResearch/hermes-agent/pull/3320))
- Catch anthropic ImportError in vision auto-detection tests ([#3312](https://github.com/NousResearch/hermes-agent/pull/3312))
- Update retry-exhaust test for new graceful return behavior ([#3320](https://github.com/NousResearch/hermes-agent/pull/3320))
- Add regression tests for null metadata frontmatter ([untagged commit](https://github.com/NousResearch/hermes-agent))
-
---
-
-## 📚 Documentation
-
- Update all docs for `/model` command overhaul and custom provider support ([#2800](https://github.com/NousResearch/hermes-agent/pull/2800))
- Fix stale and incorrect documentation across 18 files ([#2805](https://github.com/NousResearch/hermes-agent/pull/2805))
- Document 9 previously undocumented features ([#2814](https://github.com/NousResearch/hermes-agent/pull/2814))
- Add missing skills, CLI commands, and messaging env vars to docs ([#2809](https://github.com/NousResearch/hermes-agent/pull/2809))
- Fix api-server response storage documentation — SQLite, not in-memory ([#2819](https://github.com/NousResearch/hermes-agent/pull/2819))
- Quote pip install extras to fix zsh glob errors ([#2815](https://github.com/NousResearch/hermes-agent/pull/2815))
- Unify hooks documentation — add plugin hooks to hooks page, add `session:end` event ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Clarify two-mode behavior in `session_search` schema description ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Fix Discord Public Bot setting for Discord-provided invite link ([#3519](https://github.com/NousResearch/hermes-agent/pull/3519)) by @mehmoodosman
- Revise v0.4.0 changelog — fix feature attribution, reorder sections ([untagged commit](https://github.com/NousResearch/hermes-agent))
-
---
-
-## 👥 Contributors
-
-### Core
- **@teknium1** — 157 PRs covering the full scope of this release
-
-### Community Contributors
- **@alt-glitch** (Siddharth Balyan) — 2 PRs: Nix flake with uv2nix build, NixOS module, and persistent container mode ([#20](https://github.com/NousResearch/hermes-agent/pull/20)); auto-generated config keys and suffix PATHs for Nix builds ([#3061](https://github.com/NousResearch/hermes-agent/pull/3061), [#3274](https://github.com/NousResearch/hermes-agent/pull/3274))
- **@ctlst** — 1 PR: Prevent AsyncOpenAI/httpx cross-loop deadlock in gateway mode ([#2701](https://github.com/NousResearch/hermes-agent/pull/2701))
- **@memosr** (memosr.eth) — 1 PR: Add request timeouts to `send_message_tool` HTTP calls ([#3162](https://github.com/NousResearch/hermes-agent/pull/3162))
- **@mehmoodosman** (Osman Mehmood) — 1 PR: Fix Discord docs for Public Bot setting ([#3519](https://github.com/NousResearch/hermes-agent/pull/3519))
-
-### All Contributors
-@alt-glitch, @ctlst, @mehmoodosman, @memosr, @teknium1
-
---
-
-**Full Changelog**: [v2026.3.23...v2026.3.28](https://github.com/NousResearch/hermes-agent/compare/v2026.3.23...v2026.3.28)
@@ -35,54 +35,6 @@ ADAPTIVE_EFFORT_MAP = {
    "minimal": "low",
 }

-# ── Max output token limits per Anthropic model ───────────────────────
-# Source: Anthropic docs + Cline model catalog.  Anthropic's API requires
-# max_tokens as a mandatory field.  Previously we hardcoded 16384, which
-# starves thinking-enabled models (thinking tokens count toward the limit).
-_ANTHROPIC_OUTPUT_LIMITS = {
-    # Claude 4.6
-    "claude-opus-4-6":   128_000,
-    "claude-sonnet-4-6":  64_000,
-    # Claude 4.5
-    "claude-opus-4-5":    64_000,
-    "claude-sonnet-4-5":  64_000,
-    "claude-haiku-4-5":   64_000,
-    # Claude 4
-    "claude-opus-4":      32_000,
-    "claude-sonnet-4":    64_000,
-    # Claude 3.7
-    "claude-3-7-sonnet": 128_000,
-    # Claude 3.5
-    "claude-3-5-sonnet":   8_192,
-    "claude-3-5-haiku":    8_192,
-    # Claude 3
-    "claude-3-opus":       4_096,
-    "claude-3-sonnet":     4_096,
-    "claude-3-haiku":      4_096,
-}
-
-# For any model not in the table, assume the highest current limit.
-# Future Anthropic models are unlikely to have *less* output capacity.
-_ANTHROPIC_DEFAULT_OUTPUT_LIMIT = 128_000
-
-
-def _get_anthropic_max_output(model: str) -> int:
-    """Look up the max output token limit for an Anthropic model.
-
-    Uses substring matching against _ANTHROPIC_OUTPUT_LIMITS so date-stamped
-    model IDs (claude-sonnet-4-5-20250929) and variant suffixes (:1m, :fast)
-    resolve correctly.  Longest-prefix match wins to avoid e.g. "claude-3-5"
-    matching before "claude-3-5-sonnet".
-    """
-    m = model.lower()
-    best_key = ""
-    best_val = _ANTHROPIC_DEFAULT_OUTPUT_LIMIT
-    for key, val in _ANTHROPIC_OUTPUT_LIMITS.items():
-        if key in m and len(key) > len(best_key):
-            best_key = key
-            best_val = val
-    return best_val
-

 def _supports_adaptive_thinking(model: str) -> bool:
    """Return True for Claude 4.6 models that support adaptive thinking."""
@@ -107,7 +59,6 @@ _OAUTH_ONLY_BETAS = [
 # The version must stay reasonably current — Anthropic rejects OAuth requests
 # when the spoofed user-agent version is too far behind the actual release.
 _CLAUDE_CODE_VERSION_FALLBACK = "2.1.74"
-_claude_code_version_cache: Optional[str] = None


 def _detect_claude_code_version() -> str:
@@ -135,18 +86,11 @@ def _detect_claude_code_version() -> str:
    return _CLAUDE_CODE_VERSION_FALLBACK


+_CLAUDE_CODE_VERSION = _detect_claude_code_version()
 _CLAUDE_CODE_SYSTEM_PREFIX = "You are Claude Code, Anthropic's official CLI for Claude."
 _MCP_TOOL_PREFIX = "mcp_"


-def _get_claude_code_version() -> str:
-    """Lazily detect the installed Claude Code version when OAuth headers need it."""
-    global _claude_code_version_cache
-    if _claude_code_version_cache is None:
-        _claude_code_version_cache = _detect_claude_code_version()
-    return _claude_code_version_cache
-
-
 def _is_oauth_token(key: str) -> bool:
    """Check if the key is an OAuth/setup token (not a regular Console API key).

@@ -188,7 +132,7 @@ def build_anthropic_client(api_key: str, base_url: str = None):
        kwargs["auth_token"] = api_key
        kwargs["default_headers"] = {
            "anthropic-beta": ",".join(all_betas),
-            "user-agent": f"claude-cli/{_get_claude_code_version()} (external, cli)",
+            "user-agent": f"claude-cli/{_CLAUDE_CODE_VERSION} (external, cli)",
            "x-app": "cli",
        }
    else:
@@ -297,7 +241,7 @@ def _refresh_oauth_token(creds: Dict[str, Any]) -> Optional[str]:

    headers = {
        "Content-Type": "application/json",
-        "User-Agent": f"claude-cli/{_get_claude_code_version()} (external, cli)",
+        "User-Agent": f"claude-cli/{_CLAUDE_CODE_VERSION} (external, cli)",
    }

    for endpoint in token_endpoints:
@@ -866,15 +810,9 @@ def build_anthropic_kwargs(
    tool_choice: Optional[str] = None,
    is_oauth: bool = False,
    preserve_dots: bool = False,
-    context_length: Optional[int] = None,
 ) -> Dict[str, Any]:
    """Build kwargs for anthropic.messages.create().

-    When *max_tokens* is None, the model's native output limit is used
-    (e.g. 128K for Opus 4.6, 64K for Sonnet 4.6).  If *context_length*
-    is provided, the effective limit is clamped so it doesn't exceed
-    the context window.
-
    When *is_oauth* is True, applies Claude Code compatibility transforms:
    system prompt prefix, tool name prefixing, and prompt sanitization.

@@ -885,12 +823,7 @@ def build_anthropic_kwargs(
    anthropic_tools = convert_tools_to_anthropic(tools) if tools else []

    model = normalize_model_name(model, preserve_dots=preserve_dots)
-    effective_max_tokens = max_tokens or _get_anthropic_max_output(model)
-
-    # Clamp to context window if the user set a lower context_length
-    # (e.g. custom endpoint with limited capacity).
-    if context_length and effective_max_tokens > context_length:
-        effective_max_tokens = max(context_length - 1, 1)
+    effective_max_tokens = max_tokens or 16384

    # ── OAuth: Claude Code identity ──────────────────────────────────
    if is_oauth:
@@ -1137,13 +1137,7 @@ def resolve_vision_provider_client(
        return "custom", client, final_model

    if requested == "auto":
-        ordered = list(_VISION_AUTO_PROVIDER_ORDER)
-        preferred = _preferred_main_vision_provider()
-        if preferred in ordered:
-            ordered.remove(preferred)
-            ordered.insert(0, preferred)
-
-        for candidate in ordered:
+        for candidate in get_available_vision_backends():
            sync_client, default_model = _resolve_strict_vision_backend(candidate)
            if sync_client is not None:
                return _finalize(candidate, sync_client, default_model)
@@ -1216,39 +1210,6 @@ _client_cache: Dict[tuple, tuple] = {}
 _client_cache_lock = threading.Lock()


-def neuter_async_httpx_del() -> None:
-    """Monkey-patch ``AsyncHttpxClientWrapper.__del__`` to be a no-op.
-
-    The OpenAI SDK's ``AsyncHttpxClientWrapper.__del__`` schedules
-    ``self.aclose()`` via ``asyncio.get_running_loop().create_task()``.
-    When an ``AsyncOpenAI`` client is garbage-collected while
-    prompt_toolkit's event loop is running (the common CLI idle state),
-    the ``aclose()`` task runs on prompt_toolkit's loop but the
-    underlying TCP transport is bound to a *different* loop (the worker
-    thread's loop that the client was originally created on).  If that
-    loop is closed or its thread is dead, the transport's
-    ``self._loop.call_soon()`` raises ``RuntimeError("Event loop is
-    closed")``, which prompt_toolkit surfaces as "Unhandled exception
-    in event loop ... Press ENTER to continue...".
-
-    Neutering ``__del__`` is safe because:
-    - Cached clients are explicitly cleaned via ``_force_close_async_httpx``
-      on stale-loop detection and ``shutdown_cached_clients`` on exit.
-    - Uncached clients' TCP connections are cleaned up by the OS when the
-      process exits.
-    - The OpenAI SDK itself marks this as a TODO (``# TODO(someday):
-      support non asyncio runtimes here``).
-
-    Call this once at CLI startup, before any ``AsyncOpenAI`` clients are
-    created.
-    """
-    try:
-        from openai._base_client import AsyncHttpxClientWrapper
-        AsyncHttpxClientWrapper.__del__ = lambda self: None  # type: ignore[assignment]
-    except (ImportError, AttributeError):
-        pass  # Graceful degradation if the SDK changes its internals
-
-
 def _force_close_async_httpx(client: Any) -> None:
    """Mark the httpx AsyncClient inside an AsyncOpenAI client as closed.

@@ -1296,25 +1257,6 @@ def shutdown_cached_clients() -> None:
        _client_cache.clear()


-def cleanup_stale_async_clients() -> None:
-    """Force-close cached async clients whose event loop is closed.
-
-    Call this after each agent turn to proactively clean up stale clients
-    before GC can trigger ``AsyncHttpxClientWrapper.__del__`` on them.
-    This is defense-in-depth — the primary fix is ``neuter_async_httpx_del``
-    which disables ``__del__`` entirely.
-    """
-    with _client_cache_lock:
-        stale_keys = []
-        for key, entry in _client_cache.items():
-            client, _default, cached_loop = entry
-            if cached_loop is not None and cached_loop.is_closed():
-                _force_close_async_httpx(client)
-                stale_keys.append(key)
-        for key in stale_keys:
-            del _client_cache[key]
-
-
 def _get_cached_client(
    provider: str,
    model: str = None,
@@ -1458,29 +1400,6 @@ def _resolve_task_provider_model(
    return "auto", resolved_model, None, None


-_DEFAULT_AUX_TIMEOUT = 30.0
-
-
-def _get_task_timeout(task: str, default: float = _DEFAULT_AUX_TIMEOUT) -> float:
-    """Read timeout from auxiliary.{task}.timeout in config, falling back to *default*."""
-    if not task:
-        return default
-    try:
-        from hermes_cli.config import load_config
-        config = load_config()
-    except ImportError:
-        return default
-    aux = config.get("auxiliary", {}) if isinstance(config, dict) else {}
-    task_config = aux.get(task, {}) if isinstance(aux, dict) else {}
-    raw = task_config.get("timeout")
-    if raw is not None:
-        try:
-            return float(raw)
-        except (ValueError, TypeError):
-            pass
-    return default
-
-
 def _build_call_kwargs(
    provider: str,
    model: str,
@@ -1538,7 +1457,7 @@ def call_llm(
    temperature: float = None,
    max_tokens: int = None,
    tools: list = None,
-    timeout: float = None,
+    timeout: float = 30.0,
    extra_body: dict = None,
 ) -> Any:
    """Centralized synchronous LLM call.
@@ -1556,7 +1475,7 @@ def call_llm(
        temperature: Sampling temperature (None = provider default).
        max_tokens: Max output tokens (handles max_tokens vs max_completion_tokens).
        tools: Tool definitions (for function calling).
-        timeout: Request timeout in seconds (None = read from auxiliary.{task}.timeout config).
+        timeout: Request timeout in seconds.
        extra_body: Additional request body fields.

    Returns:
@@ -1621,12 +1540,10 @@ def call_llm(
                f"No LLM provider configured for task={task} provider={resolved_provider}. "
                f"Run: hermes setup")

-    effective_timeout = timeout if timeout is not None else _get_task_timeout(task)
-
    kwargs = _build_call_kwargs(
        resolved_provider, final_model, messages,
        temperature=temperature, max_tokens=max_tokens,
-        tools=tools, timeout=effective_timeout, extra_body=extra_body,
+        tools=tools, timeout=timeout, extra_body=extra_body,
        base_url=resolved_base_url)

    # Handle max_tokens vs max_completion_tokens retry
@@ -1641,62 +1558,6 @@ def call_llm(
        raise


-def extract_content_or_reasoning(response) -> str:
-    """Extract content from an LLM response, falling back to reasoning fields.
-
-    Mirrors the main agent loop's behavior when a reasoning model (DeepSeek-R1,
-    Qwen-QwQ, etc.) returns ``content=None`` with reasoning in structured fields.
-
-    Resolution order:
-      1. ``message.content`` — strip inline think/reasoning blocks, check for
-         remaining non-whitespace text.
-      2. ``message.reasoning`` / ``message.reasoning_content`` — direct
-         structured reasoning fields (DeepSeek, Moonshot, Novita, etc.).
-      3. ``message.reasoning_details`` — OpenRouter unified array format.
-
-    Returns the best available text, or ``""`` if nothing found.
-    """
-    import re
-
-    msg = response.choices[0].message
-    content = (msg.content or "").strip()
-
-    if content:
-        # Strip inline think/reasoning blocks (mirrors _strip_think_blocks)
-        cleaned = re.sub(
-            r"<(?:think|thinking|reasoning|REASONING_SCRATCHPAD)>"
-            r".*?"
-            r"</(?:think|thinking|reasoning|REASONING_SCRATCHPAD)>",
-            "", content, flags=re.DOTALL | re.IGNORECASE,
-        ).strip()
-        if cleaned:
-            return cleaned
-
-    # Content is empty or reasoning-only — try structured reasoning fields
-    reasoning_parts: list[str] = []
-    for field in ("reasoning", "reasoning_content"):
-        val = getattr(msg, field, None)
-        if val and isinstance(val, str) and val.strip() and val not in reasoning_parts:
-            reasoning_parts.append(val.strip())
-
-    details = getattr(msg, "reasoning_details", None)
-    if details and isinstance(details, list):
-        for detail in details:
-            if isinstance(detail, dict):
-                summary = (
-                    detail.get("summary")
-                    or detail.get("content")
-                    or detail.get("text")
-                )
-                if summary and summary not in reasoning_parts:
-                    reasoning_parts.append(summary.strip() if isinstance(summary, str) else str(summary))
-
-    if reasoning_parts:
-        return "\n\n".join(reasoning_parts)
-
-    return ""
-
-
 async def async_call_llm(
    task: str = None,
    *,
@@ -1708,7 +1569,7 @@ async def async_call_llm(
    temperature: float = None,
    max_tokens: int = None,
    tools: list = None,
-    timeout: float = None,
+    timeout: float = 30.0,
    extra_body: dict = None,
 ) -> Any:
    """Centralized asynchronous LLM call.
@@ -1769,12 +1630,10 @@ async def async_call_llm(
                f"No LLM provider configured for task={task} provider={resolved_provider}. "
                f"Run: hermes setup")

-    effective_timeout = timeout if timeout is not None else _get_task_timeout(task)
-
    kwargs = _build_call_kwargs(
        resolved_provider, final_model, messages,
        temperature=temperature, max_tokens=max_tokens,
-        tools=tools, timeout=effective_timeout, extra_body=extra_body,
+        tools=tools, timeout=timeout, extra_body=extra_body,
        base_url=resolved_base_url)

    try:
@@ -141,7 +141,7 @@ class ContextCompressor:
            "last_prompt_tokens": self.last_prompt_tokens,
            "threshold_tokens": self.threshold_tokens,
            "context_length": self.context_length,
-            "usage_percent": min(100, (self.last_prompt_tokens / self.context_length * 100)) if self.context_length else 0,
+            "usage_percent": (self.last_prompt_tokens / self.context_length * 100) if self.context_length else 0,
            "compression_count": self.compression_count,
        }

@@ -347,7 +347,7 @@ Write only the summary body. Do not include any preamble or prefix."""
                "messages": [{"role": "user", "content": prompt}],
                "temperature": 0.3,
                "max_tokens": summary_budget * 2,
-                # timeout resolved from auxiliary.compression.timeout config by call_llm
+                "timeout": 45.0,
            }
            if self.summary_model:
                call_kwargs["model"] = self.summary_model
@@ -286,16 +286,12 @@ def _expand_git_reference(
    args: list[str],
    label: str,
 ) -> tuple[str | None, str | None]:
-    try:
-        result = subprocess.run(
-            ["git", *args],
-            cwd=cwd,
-            capture_output=True,
-            text=True,
-            timeout=30,
-        )
-    except subprocess.TimeoutExpired:
-        return f"{ref.raw}: git command timed out (30s)", None
+    result = subprocess.run(
+        ["git", *args],
+        cwd=cwd,
+        capture_output=True,
+        text=True,
+    )
    if result.returncode != 0:
        stderr = (result.stderr or "").strip() or "git command failed"
        return f"{ref.raw}: {stderr}", None
@@ -453,12 +449,9 @@ def _rg_files(path: Path, cwd: Path, limit: int) -> list[Path] | None:
            cwd=cwd,
            capture_output=True,
            text=True,
-            timeout=10,
        )
    except FileNotFoundError:
        return None
-    except subprocess.TimeoutExpired:
-        return None
    if result.returncode != 0:
        return None
    files = [Path(line.strip()) for line in result.stdout.splitlines() if line.strip()]
@@ -284,11 +284,11 @@ class KawaiiSpinner:
        The CLI already drives a TUI widget (_spinner_text) for spinner display,
        so KawaiiSpinner's \\r-based animation is redundant under StdoutProxy.
        """
-        try:
-            from prompt_toolkit.patch_stdout import StdoutProxy
-            return isinstance(self._out, StdoutProxy)
-        except ImportError:
-            return False
+        out = self._out
+        # StdoutProxy has a 'raw' attribute (bool) that plain file objects lack.
+        if hasattr(out, 'raw') and type(out).__name__ == 'StdoutProxy':
+            return True
+        return False

    def _animate(self):
        # When stdout is not a real terminal (e.g. Docker, systemd, pipe),
@@ -699,7 +699,7 @@ def format_context_pressure(
        threshold_percent: Compaction threshold as a fraction of context window.
        compression_enabled: Whether auto-compression is active.
    """
-    pct_int = min(int(compaction_progress * 100), 100)
+    pct_int = int(compaction_progress * 100)
    filled = min(int(compaction_progress * _BAR_WIDTH), _BAR_WIDTH)
    bar = _BAR_FILLED * filled + _BAR_EMPTY * (_BAR_WIDTH - filled)

@@ -729,7 +729,7 @@ def format_context_pressure_gateway(
    No ANSI — just Unicode and plain text suitable for Telegram/Discord/etc.
    The percentage shows progress toward the compaction threshold.
    """
-    pct_int = min(int(compaction_progress * 100), 100)
+    pct_int = int(compaction_progress * 100)
    filled = min(int(compaction_progress * _BAR_WIDTH), _BAR_WIDTH)
    bar = _BAR_FILLED * filled + _BAR_EMPTY * (_BAR_WIDTH - filled)

@@ -113,15 +113,6 @@ DEFAULT_CONTEXT_LENGTHS = {
    "glm": 202752,
    # Kimi
    "kimi": 262144,
-    # Hugging Face Inference Providers — model IDs use org/name format
-    "Qwen/Qwen3.5-397B-A17B": 131072,
-    "Qwen/Qwen3.5-35B-A3B": 131072,
-    "deepseek-ai/DeepSeek-V3.2": 65536,
-    "moonshotai/Kimi-K2.5": 262144,
-    "moonshotai/Kimi-K2-Thinking": 262144,
-    "MiniMaxAI/MiniMax-M2.5": 204800,
-    "XiaomiMiMo/MiMo-V2-Flash": 32768,
-    "zai-org/GLM-5": 202752,
 }

 _CONTEXT_LENGTH_KEYS = (
@@ -15,8 +15,6 @@ import time
 from pathlib import Path
 from typing import Any, Dict, Optional

-from utils import atomic_json_write
-
 import requests

 logger = logging.getLogger(__name__)
@@ -66,10 +64,12 @@ def _load_disk_cache() -> Dict[str, Any]:


 def _save_disk_cache(data: Dict[str, Any]) -> None:
-    """Save models.dev data to disk cache atomically."""
+    """Save models.dev data to disk cache."""
    try:
        cache_path = _get_cache_path()
-        atomic_json_write(cache_path, data, indent=None, separators=(",", ":"))
+        cache_path.parent.mkdir(parents=True, exist_ok=True)
+        with open(cache_path, "w", encoding="utf-8") as f:
+            json.dump(data, f, separators=(",", ":"))
    except Exception as e:
        logger.debug("Failed to save models.dev disk cache: %s", e)

@@ -4,27 +4,14 @@ All functions are stateless. AIAgent._build_system_prompt() calls these to
 assemble pieces, then combines them with memory and ephemeral prompts.
 """

-import json
 import logging
 import os
 import re
-import threading
-from collections import OrderedDict
 from pathlib import Path

 from hermes_constants import get_hermes_home
 from typing import Optional

-from agent.skill_utils import (
-    extract_skill_conditions,
-    extract_skill_description,
-    get_disabled_skill_names,
-    iter_skill_index_files,
-    parse_frontmatter,
-    skill_matches_platform,
-)
-from utils import atomic_json_write
-
 logger = logging.getLogger(__name__)

 # ---------------------------------------------------------------------------
@@ -169,25 +156,6 @@ SKILLS_GUIDANCE = (
    "Skills that aren't maintained become liabilities."
 )

-TOOL_USE_ENFORCEMENT_GUIDANCE = (
-    "# Tool-use enforcement\n"
-    "You MUST use your tools to take action — do not describe what you would do "
-    "or plan to do without actually doing it. When you say you will perform an "
-    "action (e.g. 'I will run the tests', 'Let me check the file', 'I will create "
-    "the project'), you MUST immediately make the corresponding tool call in the same "
-    "response. Never end your turn with a promise of future action — execute it now.\n"
-    "Keep working until the task is actually complete. Do not stop with a summary of "
-    "what you plan to do next time. If you have tools available that can accomplish "
-    "the task, use them instead of telling the user what you would do.\n"
-    "Every response should either (a) contain tool calls that make progress, or "
-    "(b) deliver a final result to the user. Responses that only describe intentions "
-    "without acting are not acceptable."
-)
-
-# Model name substrings that trigger tool-use enforcement guidance.
-# Add new patterns here when a model family needs explicit steering.
-TOOL_USE_ENFORCEMENT_MODELS = ("gpt", "codex")
-
 PLATFORM_HINTS = {
    "whatsapp": (
        "You are on a text messaging communication platform, WhatsApp. "
@@ -262,111 +230,6 @@ CONTEXT_TRUNCATE_HEAD_RATIO = 0.7
 CONTEXT_TRUNCATE_TAIL_RATIO = 0.2


-# =========================================================================
-# Skills prompt cache
-# =========================================================================
-
-_SKILLS_PROMPT_CACHE_MAX = 8
-_SKILLS_PROMPT_CACHE: OrderedDict[tuple, str] = OrderedDict()
-_SKILLS_PROMPT_CACHE_LOCK = threading.Lock()
-_SKILLS_SNAPSHOT_VERSION = 1
-
-
-def _skills_prompt_snapshot_path() -> Path:
-    return get_hermes_home() / ".skills_prompt_snapshot.json"
-
-
-def clear_skills_system_prompt_cache(*, clear_snapshot: bool = False) -> None:
-    """Drop the in-process skills prompt cache (and optionally the disk snapshot)."""
-    with _SKILLS_PROMPT_CACHE_LOCK:
-        _SKILLS_PROMPT_CACHE.clear()
-    if clear_snapshot:
-        try:
-            _skills_prompt_snapshot_path().unlink(missing_ok=True)
-        except OSError as e:
-            logger.debug("Could not remove skills prompt snapshot: %s", e)
-
-
-def _build_skills_manifest(skills_dir: Path) -> dict[str, list[int]]:
-    """Build an mtime/size manifest of all SKILL.md and DESCRIPTION.md files."""
-    manifest: dict[str, list[int]] = {}
-    for filename in ("SKILL.md", "DESCRIPTION.md"):
-        for path in iter_skill_index_files(skills_dir, filename):
-            try:
-                st = path.stat()
-            except OSError:
-                continue
-            manifest[str(path.relative_to(skills_dir))] = [st.st_mtime_ns, st.st_size]
-    return manifest
-
-
-def _load_skills_snapshot(skills_dir: Path) -> Optional[dict]:
-    """Load the disk snapshot if it exists and its manifest still matches."""
-    snapshot_path = _skills_prompt_snapshot_path()
-    if not snapshot_path.exists():
-        return None
-    try:
-        snapshot = json.loads(snapshot_path.read_text(encoding="utf-8"))
-    except Exception:
-        return None
-    if not isinstance(snapshot, dict):
-        return None
-    if snapshot.get("version") != _SKILLS_SNAPSHOT_VERSION:
-        return None
-    if snapshot.get("manifest") != _build_skills_manifest(skills_dir):
-        return None
-    return snapshot
-
-
-def _write_skills_snapshot(
-    skills_dir: Path,
-    manifest: dict[str, list[int]],
-    skill_entries: list[dict],
-    category_descriptions: dict[str, str],
-) -> None:
-    """Persist skill metadata to disk for fast cold-start reuse."""
-    payload = {
-        "version": _SKILLS_SNAPSHOT_VERSION,
-        "manifest": manifest,
-        "skills": skill_entries,
-        "category_descriptions": category_descriptions,
-    }
-    try:
-        atomic_json_write(_skills_prompt_snapshot_path(), payload)
-    except Exception as e:
-        logger.debug("Could not write skills prompt snapshot: %s", e)
-
-
-def _build_snapshot_entry(
-    skill_file: Path,
-    skills_dir: Path,
-    frontmatter: dict,
-    description: str,
-) -> dict:
-    """Build a serialisable metadata dict for one skill."""
-    rel_path = skill_file.relative_to(skills_dir)
-    parts = rel_path.parts
-    if len(parts) >= 2:
-        skill_name = parts[-2]
-        category = "/".join(parts[:-2]) if len(parts) > 2 else parts[0]
-    else:
-        category = "general"
-        skill_name = skill_file.parent.name
-
-    platforms = frontmatter.get("platforms") or []
-    if isinstance(platforms, str):
-        platforms = [platforms]
-
-    return {
-        "skill_name": skill_name,
-        "category": category,
-        "frontmatter_name": str(frontmatter.get("name", skill_name)),
-        "description": description,
-        "platforms": [str(p).strip() for p in platforms if str(p).strip()],
-        "conditions": extract_skill_conditions(frontmatter),
-    }
-
-
 # =========================================================================
 # Skills index
 # =========================================================================
@@ -378,13 +241,22 @@ def _parse_skill_file(skill_file: Path) -> tuple[bool, dict, str]:
    (True, {}, "") to err on the side of showing the skill.
    """
    try:
+        from tools.skills_tool import _parse_frontmatter, skill_matches_platform
+
        raw = skill_file.read_text(encoding="utf-8")[:2000]
-        frontmatter, _ = parse_frontmatter(raw)
+        frontmatter, _ = _parse_frontmatter(raw)

        if not skill_matches_platform(frontmatter):
-            return False, frontmatter, ""
+            return False, {}, ""

-        return True, frontmatter, extract_skill_description(frontmatter)
+        desc = ""
+        raw_desc = frontmatter.get("description", "")
+        if raw_desc:
+            desc = str(raw_desc).strip().strip("'\"")
+            if len(desc) > 60:
+                desc = desc[:57] + "..."
+
+        return True, frontmatter, desc
    except Exception as e:
        logger.debug("Failed to parse skill file %s: %s", skill_file, e)
        return True, {}, ""
@@ -393,9 +265,16 @@ def _parse_skill_file(skill_file: Path) -> tuple[bool, dict, str]:
 def _read_skill_conditions(skill_file: Path) -> dict:
    """Extract conditional activation fields from SKILL.md frontmatter."""
    try:
+        from tools.skills_tool import _parse_frontmatter
        raw = skill_file.read_text(encoding="utf-8")[:2000]
-        frontmatter, _ = parse_frontmatter(raw)
-        return extract_skill_conditions(frontmatter)
+        frontmatter, _ = _parse_frontmatter(raw)
+        hermes = frontmatter.get("metadata", {}).get("hermes", {})
+        return {
+            "fallback_for_toolsets": hermes.get("fallback_for_toolsets", []),
+            "requires_toolsets": hermes.get("requires_toolsets", []),
+            "fallback_for_tools": hermes.get("fallback_for_tools", []),
+            "requires_tools": hermes.get("requires_tools", []),
+        }
    except Exception as e:
        logger.debug("Failed to read skill conditions from %s: %s", skill_file, e)
        return {}
@@ -438,12 +317,10 @@ def build_skills_system_prompt(
 ) -> str:
    """Build a compact skill index for the system prompt.

-    Two-layer cache:
-      1. In-process LRU dict keyed by (skills_dir, tools, toolsets)
-      2. Disk snapshot (``.skills_prompt_snapshot.json``) validated by
-         mtime/size manifest — survives process restarts
-
-    Falls back to a full filesystem scan when both layers miss.
+    Scans ~/.hermes/skills/ for SKILL.md files grouped by category.
+    Includes per-skill descriptions from frontmatter so the model can
+    match skills by meaning, not just name.
+    Filters out skills incompatible with the current OS platform.
    """
    hermes_home = get_hermes_home()
    skills_dir = hermes_home / "skills"
@@ -451,140 +328,98 @@ def build_skills_system_prompt(
    if not skills_dir.exists():
        return ""

-    # ── Layer 1: in-process LRU cache ─────────────────────────────────
-    cache_key = (
-        str(skills_dir.resolve()),
-        tuple(sorted(str(t) for t in (available_tools or set()))),
-        tuple(sorted(str(ts) for ts in (available_toolsets or set()))),
-    )
-    with _SKILLS_PROMPT_CACHE_LOCK:
-        cached = _SKILLS_PROMPT_CACHE.get(cache_key)
-        if cached is not None:
-            _SKILLS_PROMPT_CACHE.move_to_end(cache_key)
-            return cached
-
-    disabled = get_disabled_skill_names()
-
-    # ── Layer 2: disk snapshot ────────────────────────────────────────
-    snapshot = _load_skills_snapshot(skills_dir)
+    # Collect skills with descriptions, grouped by category.
+    # Each entry: (skill_name, description)
+    # Supports sub-categories: skills/mlops/training/axolotl/SKILL.md
+    # -> category "mlops/training", skill "axolotl"
+    # Load disabled skill names once for the entire scan
+    try:
+        from tools.skills_tool import _get_disabled_skill_names
+        disabled = _get_disabled_skill_names()
+    except Exception:
+        disabled = set()

    skills_by_category: dict[str, list[tuple[str, str]]] = {}
-    category_descriptions: dict[str, str] = {}
-
-    if snapshot is not None:
-        # Fast path: use pre-parsed metadata from disk
-        for entry in snapshot.get("skills", []):
-            if not isinstance(entry, dict):
-                continue
-            skill_name = entry.get("skill_name") or ""
-            category = entry.get("category") or "general"
-            frontmatter_name = entry.get("frontmatter_name") or skill_name
-            platforms = entry.get("platforms") or []
-            if not skill_matches_platform({"platforms": platforms}):
-                continue
-            if frontmatter_name in disabled or skill_name in disabled:
-                continue
-            if not _skill_should_show(
-                entry.get("conditions") or {},
-                available_tools,
-                available_toolsets,
-            ):
-                continue
-            skills_by_category.setdefault(category, []).append(
-                (skill_name, entry.get("description", ""))
-            )
-        category_descriptions = {
-            str(k): str(v)
-            for k, v in (snapshot.get("category_descriptions") or {}).items()
+    for skill_file in skills_dir.rglob("SKILL.md"):
+        is_compatible, frontmatter, desc = _parse_skill_file(skill_file)
+        if not is_compatible:
+            continue
+        rel_path = skill_file.relative_to(skills_dir)
+        parts = rel_path.parts
+        if len(parts) >= 2:
+            skill_name = parts[-2]
+            category = "/".join(parts[:-2]) if len(parts) > 2 else parts[0]
+        else:
+            category = "general"
+            skill_name = skill_file.parent.name
+        # Respect user's disabled skills config
+        fm_name = frontmatter.get("name", skill_name)
+        if fm_name in disabled or skill_name in disabled:
+            continue
+        # Extract conditions inline from already-parsed frontmatter
+        # (avoids redundant file re-read that _read_skill_conditions would do)
+        hermes_meta = (frontmatter.get("metadata") or {}).get("hermes") or {}
+        conditions = {
+            "fallback_for_toolsets": hermes_meta.get("fallback_for_toolsets", []),
+            "requires_toolsets": hermes_meta.get("requires_toolsets", []),
+            "fallback_for_tools": hermes_meta.get("fallback_for_tools", []),
+            "requires_tools": hermes_meta.get("requires_tools", []),
        }
-    else:
-        # Cold path: full filesystem scan + write snapshot for next time
-        skill_entries: list[dict] = []
-        for skill_file in iter_skill_index_files(skills_dir, "SKILL.md"):
-            is_compatible, frontmatter, desc = _parse_skill_file(skill_file)
-            entry = _build_snapshot_entry(skill_file, skills_dir, frontmatter, desc)
-            skill_entries.append(entry)
-            if not is_compatible:
-                continue
-            skill_name = entry["skill_name"]
-            if entry["frontmatter_name"] in disabled or skill_name in disabled:
-                continue
-            if not _skill_should_show(
-                extract_skill_conditions(frontmatter),
-                available_tools,
-                available_toolsets,
-            ):
-                continue
-            skills_by_category.setdefault(entry["category"], []).append(
-                (skill_name, entry["description"])
-            )
+        if not _skill_should_show(conditions, available_tools, available_toolsets):
+            continue
+        skills_by_category.setdefault(category, []).append((skill_name, desc))

-        # Read category-level DESCRIPTION.md files
-        for desc_file in iter_skill_index_files(skills_dir, "DESCRIPTION.md"):
+    if not skills_by_category:
+        return ""
+
+    # Read category-level descriptions from DESCRIPTION.md
+    # Checks both the exact category path and parent directories
+    category_descriptions = {}
+    for category in skills_by_category:
+        cat_path = Path(category)
+        desc_file = skills_dir / cat_path / "DESCRIPTION.md"
+        if desc_file.exists():
            try:
                content = desc_file.read_text(encoding="utf-8")
-                fm, _ = parse_frontmatter(content)
-                cat_desc = fm.get("description")
-                if not cat_desc:
-                    continue
-                rel = desc_file.relative_to(skills_dir)
-                cat = "/".join(rel.parts[:-1]) if len(rel.parts) > 1 else "general"
-                category_descriptions[cat] = str(cat_desc).strip().strip("'\"")
+                match = re.search(r"^---\s*\n.*?description:\s*(.+?)\s*\n.*?^---", content, re.MULTILINE | re.DOTALL)
+                if match:
+                    category_descriptions[category] = match.group(1).strip()
            except Exception as e:
                logger.debug("Could not read skill description %s: %s", desc_file, e)

-        _write_skills_snapshot(
-            skills_dir,
-            _build_skills_manifest(skills_dir),
-            skill_entries,
-            category_descriptions,
-        )
-
-    if not skills_by_category:
-        result = ""
-    else:
-        index_lines = []
-        for category in sorted(skills_by_category.keys()):
-            cat_desc = category_descriptions.get(category, "")
-            if cat_desc:
-                index_lines.append(f"  {category}: {cat_desc}")
+    index_lines = []
+    for category in sorted(skills_by_category.keys()):
+        cat_desc = category_descriptions.get(category, "")
+        if cat_desc:
+            index_lines.append(f"  {category}: {cat_desc}")
+        else:
+            index_lines.append(f"  {category}:")
+        # Deduplicate and sort skills within each category
+        seen = set()
+        for name, desc in sorted(skills_by_category[category], key=lambda x: x[0]):
+            if name in seen:
+                continue
+            seen.add(name)
+            if desc:
+                index_lines.append(f"    - {name}: {desc}")
            else:
-                index_lines.append(f"  {category}:")
-            # Deduplicate and sort skills within each category
-            seen = set()
-            for name, desc in sorted(skills_by_category[category], key=lambda x: x[0]):
-                if name in seen:
-                    continue
-                seen.add(name)
-                if desc:
-                    index_lines.append(f"    - {name}: {desc}")
-                else:
-                    index_lines.append(f"    - {name}")
+                index_lines.append(f"    - {name}")

-        result = (
-            "## Skills (mandatory)\n"
-            "Before replying, scan the skills below. If one clearly matches your task, "
-            "load it with skill_view(name) and follow its instructions. "
-            "If a skill has issues, fix it with skill_manage(action='patch').\n"
-            "After difficult/iterative tasks, offer to save as a skill. "
-            "If a skill you loaded was missing steps, had wrong commands, or needed "
-            "pitfalls you discovered, update it before finishing.\n"
-            "\n"
-            "<available_skills>\n"
-            + "\n".join(index_lines) + "\n"
-            "</available_skills>\n"
-            "\n"
-            "If none match, proceed normally without loading a skill."
-        )
-
-    # ── Store in LRU cache ────────────────────────────────────────────
-    with _SKILLS_PROMPT_CACHE_LOCK:
-        _SKILLS_PROMPT_CACHE[cache_key] = result
-        _SKILLS_PROMPT_CACHE.move_to_end(cache_key)
-        while len(_SKILLS_PROMPT_CACHE) > _SKILLS_PROMPT_CACHE_MAX:
-            _SKILLS_PROMPT_CACHE.popitem(last=False)
-
-    return result
+    return (
+        "## Skills (mandatory)\n"
+        "Before replying, scan the skills below. If one clearly matches your task, "
+        "load it with skill_view(name) and follow its instructions. "
+        "If a skill has issues, fix it with skill_manage(action='patch').\n"
+        "After difficult/iterative tasks, offer to save as a skill. "
+        "If a skill you loaded was missing steps, had wrong commands, or needed "
+        "pitfalls you discovered, update it before finishing.\n"
+        "\n"
+        "<available_skills>\n"
+        + "\n".join(index_lines) + "\n"
+        "</available_skills>\n"
+        "\n"
+        "If none match, proceed normally without loading a skill."
+    )


 # =========================================================================
@@ -1,203 +0,0 @@
-"""Lightweight skill metadata utilities shared by prompt_builder and skills_tool.
-
-This module intentionally avoids importing the tool registry, CLI config, or any
-heavy dependency chain.  It is safe to import at module level without triggering
-tool registration or provider resolution.
-"""
-
-import logging
-import os
-import re
-import sys
-from pathlib import Path
-from typing import Any, Dict, List, Optional, Set, Tuple
-
-from hermes_constants import get_hermes_home
-
-logger = logging.getLogger(__name__)
-
-# ── Platform mapping ──────────────────────────────────────────────────────
-
-PLATFORM_MAP = {
-    "macos": "darwin",
-    "linux": "linux",
-    "windows": "win32",
-}
-
-EXCLUDED_SKILL_DIRS = frozenset((".git", ".github", ".hub"))
-
-# ── Lazy YAML loader ─────────────────────────────────────────────────────
-
-_yaml_load_fn = None
-
-
-def yaml_load(content: str):
-    """Parse YAML with lazy import and CSafeLoader preference."""
-    global _yaml_load_fn
-    if _yaml_load_fn is None:
-        import yaml
-
-        loader = getattr(yaml, "CSafeLoader", None) or yaml.SafeLoader
-
-        def _load(value: str):
-            return yaml.load(value, Loader=loader)
-
-        _yaml_load_fn = _load
-    return _yaml_load_fn(content)
-
-
-# ── Frontmatter parsing ──────────────────────────────────────────────────
-
-
-def parse_frontmatter(content: str) -> Tuple[Dict[str, Any], str]:
-    """Parse YAML frontmatter from a markdown string.
-
-    Uses yaml with CSafeLoader for full YAML support (nested metadata, lists)
-    with a fallback to simple key:value splitting for robustness.
-
-    Returns:
-        (frontmatter_dict, remaining_body)
-    """
-    frontmatter: Dict[str, Any] = {}
-    body = content
-
-    if not content.startswith("---"):
-        return frontmatter, body
-
-    end_match = re.search(r"\n---\s*\n", content[3:])
-    if not end_match:
-        return frontmatter, body
-
-    yaml_content = content[3 : end_match.start() + 3]
-    body = content[end_match.end() + 3 :]
-
-    try:
-        parsed = yaml_load(yaml_content)
-        if isinstance(parsed, dict):
-            frontmatter = parsed
-    except Exception:
-        # Fallback: simple key:value parsing for malformed YAML
-        for line in yaml_content.strip().split("\n"):
-            if ":" not in line:
-                continue
-            key, value = line.split(":", 1)
-            frontmatter[key.strip()] = value.strip()
-
-    return frontmatter, body
-
-
-# ── Platform matching ─────────────────────────────────────────────────────
-
-
-def skill_matches_platform(frontmatter: Dict[str, Any]) -> bool:
-    """Return True when the skill is compatible with the current OS.
-
-    Skills declare platform requirements via a top-level ``platforms`` list
-    in their YAML frontmatter::
-
-        platforms: [macos]          # macOS only
-        platforms: [macos, linux]   # macOS and Linux
-
-    If the field is absent or empty the skill is compatible with **all**
-    platforms (backward-compatible default).
-    """
-    platforms = frontmatter.get("platforms")
-    if not platforms:
-        return True
-    if not isinstance(platforms, list):
-        platforms = [platforms]
-    current = sys.platform
-    for platform in platforms:
-        normalized = str(platform).lower().strip()
-        mapped = PLATFORM_MAP.get(normalized, normalized)
-        if current.startswith(mapped):
-            return True
-    return False
-
-
-# ── Disabled skills ───────────────────────────────────────────────────────
-
-
-def get_disabled_skill_names() -> Set[str]:
-    """Read disabled skill names from config.yaml.
-
-    Resolves platform from ``HERMES_PLATFORM`` env var, falls back to
-    the global disabled list.  Reads the config file directly (no CLI
-    config imports) to stay lightweight.
-    """
-    config_path = get_hermes_home() / "config.yaml"
-    if not config_path.exists():
-        return set()
-    try:
-        parsed = yaml_load(config_path.read_text(encoding="utf-8"))
-    except Exception as e:
-        logger.debug("Could not read skill config %s: %s", config_path, e)
-        return set()
-    if not isinstance(parsed, dict):
-        return set()
-
-    skills_cfg = parsed.get("skills")
-    if not isinstance(skills_cfg, dict):
-        return set()
-
-    resolved_platform = os.getenv("HERMES_PLATFORM")
-    if resolved_platform:
-        platform_disabled = (skills_cfg.get("platform_disabled") or {}).get(
-            resolved_platform
-        )
-        if platform_disabled is not None:
-            return _normalize_string_set(platform_disabled)
-    return _normalize_string_set(skills_cfg.get("disabled"))
-
-
-def _normalize_string_set(values) -> Set[str]:
-    if values is None:
-        return set()
-    if isinstance(values, str):
-        values = [values]
-    return {str(v).strip() for v in values if str(v).strip()}
-
-
-# ── Condition extraction ──────────────────────────────────────────────────
-
-
-def extract_skill_conditions(frontmatter: Dict[str, Any]) -> Dict[str, List]:
-    """Extract conditional activation fields from parsed frontmatter."""
-    hermes = (frontmatter.get("metadata") or {}).get("hermes") or {}
-    return {
-        "fallback_for_toolsets": hermes.get("fallback_for_toolsets", []),
-        "requires_toolsets": hermes.get("requires_toolsets", []),
-        "fallback_for_tools": hermes.get("fallback_for_tools", []),
-        "requires_tools": hermes.get("requires_tools", []),
-    }
-
-
-# ── Description extraction ────────────────────────────────────────────────
-
-
-def extract_skill_description(frontmatter: Dict[str, Any]) -> str:
-    """Extract a truncated description from parsed frontmatter."""
-    raw_desc = frontmatter.get("description", "")
-    if not raw_desc:
-        return ""
-    desc = str(raw_desc).strip().strip("'\"")
-    if len(desc) > 60:
-        return desc[:57] + "..."
-    return desc
-
-
-# ── File iteration ────────────────────────────────────────────────────────
-
-
-def iter_skill_index_files(skills_dir: Path, filename: str):
-    """Walk skills_dir yielding sorted paths matching *filename*.
-
-    Excludes ``.git``, ``.github``, ``.hub`` directories.
-    """
-    matches = []
-    for root, dirs, files in os.walk(skills_dir):
-        dirs[:] = [d for d in dirs if d not in EXCLUDED_SKILL_DIRS]
-        if filename in files:
-            matches.append(Path(root) / filename)
-    for path in sorted(matches, key=lambda p: str(p.relative_to(skills_dir))):
-        yield path
@@ -19,7 +19,7 @@ _TITLE_PROMPT = (
 )


-def generate_title(user_message: str, assistant_response: str, timeout: float = 30.0) -> Optional[str]:
+def generate_title(user_message: str, assistant_response: str, timeout: float = 15.0) -> Optional[str]:
    """Generate a session title from the first exchange.

    Uses the auxiliary LLM client (cheapest/fastest available model).
@@ -7,7 +7,6 @@
 # =============================================================================
 model:
  # Default model to use (can be overridden with --model flag)
-  # Both "default" and "model" work as the key name here.
  default: "anthropic/claude-opus-4.6"
  
  # Inference provider selection:
@@ -449,17 +449,6 @@ try:
 except Exception:
    pass  # Skin engine is optional — default skin used if unavailable

-# Neuter AsyncHttpxClientWrapper.__del__ before any AsyncOpenAI clients are
-# created.  The SDK's __del__ schedules aclose() on asyncio.get_running_loop()
-# which, during CLI idle time, finds prompt_toolkit's event loop and tries to
-# close TCP transports bound to dead worker loops — producing
-# "Event loop is closed" / "Press ENTER to continue..." errors.
-try:
-    from agent.auxiliary_client import neuter_async_httpx_del
-    neuter_async_httpx_del()
-except Exception:
-    pass
-
 from rich import box as rich_box
 from rich.console import Console
 from rich.markup import escape as _escape
@@ -1078,12 +1067,12 @@ class HermesCLI:
        # authoritative.  This avoids conflicts in multi-agent setups where
        # env vars would stomp each other.
        _model_config = CLI_CONFIG.get("model", {})
-        _config_model = (_model_config.get("default") or _model_config.get("model") or "") if isinstance(_model_config, dict) else (_model_config or "")
+        _config_model = _model_config.get("default", "") if isinstance(_model_config, dict) else (_model_config or "")
        _FALLBACK_MODEL = "anthropic/claude-opus-4.6"
        self.model = model or _config_model or _FALLBACK_MODEL
        # Auto-detect model from local server if still on fallback
        if self.model == _FALLBACK_MODEL:
-            _base_url = (_model_config.get("base_url") or "") if isinstance(_model_config, dict) else ""
+            _base_url = _model_config.get("base_url", "") if isinstance(_model_config, dict) else ""
            if "localhost" in _base_url or "127.0.0.1" in _base_url:
                from hermes_cli.runtime_provider import _auto_detect_local_model
                _detected = _auto_detect_local_model(_base_url)
@@ -1625,7 +1614,6 @@ class HermesCLI:
        if not text:
            return
        self._reasoning_stream_started = True
-        self._reasoning_shown_this_turn = True
        if getattr(self, "_stream_box_opened", False):
            return

@@ -4034,17 +4022,6 @@ class HermesCLI:
                    provider_data_collection=self._provider_data_collection,
                    fallback_model=self._fallback_model,
                )
-                # Silence raw spinner; route thinking through TUI widget when no foreground agent is active.
-                bg_agent._print_fn = lambda *_a, **_kw: None
-
-                def _bg_thinking(text: str) -> None:
-                    # Concurrent bg tasks may race on _spinner_text; acceptable for best-effort UI.
-                    if not self._agent_running:
-                        self._spinner_text = text
-                        if self._app:
-                            self._app.invalidate()
-
-                bg_agent.thinking_callback = _bg_thinking

                result = bg_agent.run_conversation(
                    user_message=prompt,
@@ -4107,9 +4084,6 @@ class HermesCLI:
                _cprint(f"  ❌ Background task #{task_num} failed: {e}")
            finally:
                self._background_tasks.pop(task_id, None)
-                # Clear spinner only if no foreground agent owns it
-                if not self._agent_running:
-                    self._spinner_text = ""
                if self._app:
                    self._invalidate(min_interval=0)

@@ -4520,7 +4494,7 @@ class HermesCLI:
        compressor = agent.context_compressor
        last_prompt = compressor.last_prompt_tokens
        ctx_len = compressor.context_length
-        pct = min(100, (last_prompt / ctx_len * 100)) if ctx_len else 0
+        pct = (last_prompt / ctx_len * 100) if ctx_len else 0
        compressions = compressor.compression_count

        msg_count = len(self.conversation_history)
@@ -5548,13 +5522,6 @@ class HermesCLI:
            except Exception as e:
                logging.debug("@ context reference expansion failed: %s", e)

-        # Sanitize surrogate characters that can arrive via clipboard paste from
-        # rich-text editors (Google Docs, Word, etc.).  Lone surrogates are invalid
-        # UTF-8 and crash JSON serialization in the OpenAI SDK.
-        if isinstance(message, str):
-            from run_agent import _sanitize_surrogates
-            message = _sanitize_surrogates(message)
-
        # Add user message to history
        self.conversation_history.append({"role": "user", "content": message})

@@ -5567,10 +5534,6 @@ class HermesCLI:

            # Reset streaming display state for this turn
            self._reset_stream_state()
-            # Separate from _reset_stream_state because this must persist
-            # across intermediate turn boundaries (tool-calling loops) — only
-            # reset at the start of each user turn.
-            self._reasoning_shown_this_turn = False

            # --- Streaming TTS setup ---
            # When ElevenLabs is the TTS provider and sounddevice is available,
@@ -5715,16 +5678,6 @@ class HermesCLI:

            agent_thread.join()  # Ensure agent thread completes

-            # Proactively clean up async clients whose event loop is dead.
-            # The agent thread may have created AsyncOpenAI clients bound
-            # to a per-thread event loop; if that loop is now closed, those
-            # clients' __del__ would crash prompt_toolkit's loop on GC.
-            try:
-                from agent.auxiliary_client import cleanup_stale_async_clients
-                cleanup_stale_async_clients()
-            except Exception:
-                pass
-
            # Flush any remaining streamed text and close the box
            self._flush_stream()

@@ -5785,13 +5738,8 @@ class HermesCLI:
            response_previewed = result.get("response_previewed", False) if result else False

            # Display reasoning (thinking) box if enabled and available.
-            # Skip when streaming already showed reasoning live.  Use the
-            # turn-persistent flag (_reasoning_shown_this_turn) instead of
-            # _reasoning_stream_started — the latter gets reset during
-            # intermediate turn boundaries (tool-calling loops), which caused
-            # the reasoning box to re-render after the final response.
-            _reasoning_already_shown = getattr(self, '_reasoning_shown_this_turn', False)
-            if self.show_reasoning and result and not _reasoning_already_shown:
+            # Skip when streaming already showed reasoning live.
+            if self.show_reasoning and result and not self._reasoning_stream_started:
                reasoning = result.get("last_reasoning")
                if reasoning:
                    w = shutil.get_terminal_size().columns
@@ -5912,22 +5860,10 @@ class HermesCLI:
            else:
                duration_str = f"{seconds}s"
            
-            # Look up session title for resume-by-name hint
-            session_title = None
-            if self._session_db:
-                try:
-                    session_title = self._session_db.get_session_title(self.session_id)
-                except Exception:
-                    pass
-
            print("Resume this session with:")
            print(f"  hermes --resume {self.session_id}")
-            if session_title:
-                print(f"  hermes -c \"{session_title}\"")
            print()
            print(f"Session:        {self.session_id}")
-            if session_title:
-                print(f"Title:          {session_title}")
            print(f"Duration:       {duration_str}")
            print(f"Messages:       {msg_count} ({user_msgs} user, {tool_calls} tool calls)")
        else:
@@ -6103,7 +6039,7 @@ class HermesCLI:
            from honcho_integration.client import HonchoClientConfig
            from agent.display import honcho_session_line, write_tty
            hcfg = HonchoClientConfig.from_global_config()
-            if hcfg.enabled and (hcfg.api_key or hcfg.base_url) and hcfg.explicitly_configured:
+            if hcfg.enabled and hcfg.api_key and hcfg.explicitly_configured:
                sname = hcfg.resolve_session_name(session_id=self.session_id)
                if sname:
                    write_tty(honcho_session_line(hcfg.workspace_id, sname) + "\n")
@@ -6190,18 +6126,10 @@ class HermesCLI:
        set_approval_callback(self._approval_callback)
        set_secret_capture_callback(self._secret_capture_callback)

-        # Ensure tirith security scanner is available (downloads if needed).
-        # Warn the user if tirith is enabled in config but not available,
-        # so they know command security scanning is degraded.
+        # Ensure tirith security scanner is available (downloads if needed)
        try:
            from tools.tirith_security import ensure_installed
-            tirith_path = ensure_installed(log_failures=False)
-            if tirith_path is None:
-                security_cfg = self.config.get("security", {}) or {}
-                tirith_enabled = security_cfg.get("tirith_enabled", True)
-                if tirith_enabled:
-                    _cprint(f"  {_DIM}⚠ tirith security scanner enabled but not available "
-                            f"— command scanning will use pattern matching only{_RST}")
+            ensure_installed(log_failures=False)
        except Exception:
            pass  # Non-fatal — fail-open at scan time if unavailable
        
@@ -6677,7 +6605,6 @@ class HermesCLI:
        # Paste collapsing: detect large pastes and save to temp file
        _paste_counter = [0]
        _prev_text_len = [0]
-        _prev_newline_count = [0]
        _paste_just_collapsed = [False]

        def _on_text_changed(buf):
@@ -6686,27 +6613,18 @@ class HermesCLI:
            When bracketed paste is available, handle_paste collapses
            large pastes directly.  This handler is a fallback for
            terminals without bracketed paste support.
-
-            Two heuristics (either triggers collapse):
-            1. Many characters added at once (chars_added > 1) — works
-               when the terminal delivers the paste in one event-loop tick.
-            2. Newline count jumped by 4+ in a single text-change event —
-               catches terminals that feed characters individually but
-               still batch newlines.  Alt+Enter only adds 1 newline per
-               event so it never triggers this.
            """
            text = buf.text
            chars_added = len(text) - _prev_text_len[0]
            _prev_text_len[0] = len(text)
            if _paste_just_collapsed[0]:
                _paste_just_collapsed[0] = False
-                _prev_newline_count[0] = text.count('\n')
                return
            line_count = text.count('\n')
-            newlines_added = line_count - _prev_newline_count[0]
-            _prev_newline_count[0] = line_count
-            is_paste = chars_added > 1 or newlines_added >= 4
-            if line_count >= 5 and is_paste and not text.startswith('/'):
+            # Heuristic: a real paste adds many characters at once (not just a
+            # single newline from Alt+Enter) AND the result has 5+ lines.
+            # Fallback for terminals without bracketed paste support.
+            if line_count >= 5 and chars_added > 1 and not text.startswith('/'):
                _paste_counter[0] += 1
                # Save to temp file
                paste_dir = _hermes_home / "pastes"
@@ -6714,7 +6632,6 @@ class HermesCLI:
                paste_file = paste_dir / f"paste_{_paste_counter[0]}_{datetime.now().strftime('%H%M%S')}.txt"
                paste_file.write_text(text, encoding="utf-8")
                # Replace buffer with compact reference
-                _paste_just_collapsed[0] = True
                buf.text = f"[Pasted text #{_paste_counter[0]}: {line_count + 1} lines \u2192 {paste_file}]"
                buf.cursor_position = len(buf.text)

@@ -7324,28 +7241,9 @@ class HermesCLI:
        # Register atexit cleanup so resources are freed even on unexpected exit
        atexit.register(_run_cleanup)
        
-        # Install a custom asyncio exception handler that suppresses the
-        # "Event loop is closed" RuntimeError from httpx transport cleanup.
-        # This is defense-in-depth — the primary fix is neuter_async_httpx_del
-        # which disables __del__ entirely, but older clients or SDK upgrades
-        # could bypass it.
-        def _suppress_closed_loop_errors(loop, context):
-            exc = context.get("exception")
-            if isinstance(exc, RuntimeError) and "Event loop is closed" in str(exc):
-                return  # silently suppress
-            # Fall back to default handler for everything else
-            loop.default_exception_handler(context)
-
        # Run the application with patch_stdout for proper output handling
        try:
            with patch_stdout():
-                # Set the custom handler on prompt_toolkit's event loop
-                try:
-                    import asyncio as _aio
-                    _loop = _aio.get_event_loop()
-                    _loop.set_exception_handler(_suppress_closed_loop_errors)
-                except Exception:
-                    pass
                app.run()
        except (EOFError, KeyboardInterrupt):
            pass
@@ -327,20 +327,7 @@ def load_jobs() -> List[Dict[str, Any]]:
        with open(JOBS_FILE, 'r', encoding='utf-8') as f:
            data = json.load(f)
            return data.get("jobs", [])
-    except json.JSONDecodeError:
-        # Retry with strict=False to handle bare control chars in string values
-        try:
-            with open(JOBS_FILE, 'r', encoding='utf-8') as f:
-                data = json.loads(f.read(), strict=False)
-                jobs = data.get("jobs", [])
-                if jobs:
-                    # Auto-repair: rewrite with proper escaping
-                    save_jobs(jobs)
-                    logger.warning("Auto-repaired jobs.json (had invalid control characters)")
-                return jobs
-        except Exception:
-            return []
-    except IOError:
+    except (json.JSONDecodeError, IOError):
        return []


@@ -611,34 +598,6 @@ def mark_job_run(job_id: str, success: bool, error: Optional[str] = None):
    save_jobs(jobs)


-def advance_next_run(job_id: str) -> bool:
-    """Preemptively advance next_run_at for a recurring job before execution.
-
-    Call this BEFORE run_job() so that if the process crashes mid-execution,
-    the job won't re-fire on the next gateway restart.  This converts the
-    scheduler from at-least-once to at-most-once for recurring jobs — missing
-    one run is far better than firing dozens of times in a crash loop.
-
-    One-shot jobs are left unchanged so they can still retry on restart.
-
-    Returns True if next_run_at was advanced, False otherwise.
-    """
-    jobs = load_jobs()
-    for job in jobs:
-        if job["id"] == job_id:
-            kind = job.get("schedule", {}).get("kind")
-            if kind not in ("cron", "interval"):
-                return False
-            now = _hermes_now().isoformat()
-            new_next = compute_next_run(job["schedule"], now)
-            if new_next and new_next != job.get("next_run_at"):
-                job["next_run_at"] = new_next
-                save_jobs(jobs)
-                return True
-            return False
-    return False
-
-
 def get_due_jobs() -> List[Dict[str, Any]]:
    """Get all jobs that are due to run now.

@@ -35,7 +35,7 @@ logger = logging.getLogger(__name__)
 # Add parent directory to path for imports
 sys.path.insert(0, str(Path(__file__).parent.parent))

-from cron.jobs import get_due_jobs, mark_job_run, save_job_output, advance_next_run
+from cron.jobs import get_due_jobs, mark_job_run, save_job_output

 # Sentinel: when a cron agent has nothing new to report, it can start its
 # response with this marker to suppress delivery.  Output is still saved
@@ -524,12 +524,6 @@ def tick(verbose: bool = True) -> int:
        executed = 0
        for job in due_jobs:
            try:
-                # For recurring jobs (cron/interval), advance next_run_at to the
-                # next future occurrence BEFORE execution.  This way, if the
-                # process crashes mid-run, the job won't re-fire on restart.
-                # One-shot jobs are left alone so they can retry on restart.
-                advance_next_run(job["id"])
-
                success, output, final_response, error = run_job(job)

                output_file = save_job_output(job["id"], output)
@@ -1,15 +0,0 @@
-# Hermes Agent Persona
-
-<!--
-This file defines the agent's personality and tone.
-The agent will embody whatever you write here.
-Edit this to customize how Hermes communicates with you.
-
-Examples:
-  - "You are a warm, playful assistant who uses kaomoji occasionally."
-  - "You are a concise technical expert. No fluff, just facts."
-  - "You speak like a friendly coworker who happens to know everything."
-
-This file is loaded fresh each message -- no restart needed.
-Delete the contents (or this file) to use the default personality.
-->
@@ -1,31 +0,0 @@
-#!/bin/bash
-# Docker entrypoint: bootstrap config files into the mounted volume, then run hermes.
-set -e
-
-HERMES_HOME="/opt/data"
-INSTALL_DIR="/opt/hermes"
-
-# Create directory structure
-mkdir -p "$HERMES_HOME"/{cron,sessions,logs,pairing,hooks,image_cache,audio_cache,memories,skills,whatsapp/session}
-
-# .env
-if [ ! -f "$HERMES_HOME/.env" ]; then
-    cp "$INSTALL_DIR/.env.example" "$HERMES_HOME/.env"
-fi
-
-# config.yaml
-if [ ! -f "$HERMES_HOME/config.yaml" ]; then
-    cp "$INSTALL_DIR/cli-config.yaml.example" "$HERMES_HOME/config.yaml"
-fi
-
-# SOUL.md
-if [ ! -f "$HERMES_HOME/SOUL.md" ]; then
-    cp "$INSTALL_DIR/docker/SOUL.md" "$HERMES_HOME/SOUL.md"
-fi
-
-# Sync bundled skills (manifest-based so user edits are preserved)
-if [ -d "$INSTALL_DIR/skills" ]; then
-    python3 "$INSTALL_DIR/tools/skills_sync.py"
-fi
-
-exec hermes "$@"
@@ -1,56 +0,0 @@
-# Hermes Agent — Docker
-
-Want to run Hermes Agent, but without installing packages on your host? This'll sort you out.
-
-This will let you run the agent in a container, with the most relevant modes outlined below.
-
-The container stores all user data (config, API keys, sessions, skills, memories) in a single directory mounted from the host at `/opt/data`. The image itself is stateless and can be upgraded by pulling a new version without losing any configuration.
-
-## Quick start
-
-If this is your first time running Hermes Agent, create a data directory on the host and start the container interactively to run the setup wizard:
-
-```sh
-mkdir -p ~/.hermes
-docker run -it --rm \
-  -v ~/.hermes:/opt/data \
-  nousresearch/hermes-agent
-```
-
-This drops you into the setup wizard, which will prompt you for your API keys and write them to `~/.hermes/.env`. You only need to do this once. It is highly recommended to set up a chat system for the gateway to work with at this point.
-
-## Running in gateway mode
-
-Once configured, run the container in the background as a persistent gateway (Telegram, Discord, Slack, WhatsApp, etc.):
-
-```sh
-docker run -d \
-  --name hermes \
-  --restart unless-stopped \
-  -v ~/.hermes:/opt/data \
-  nousresearch/hermes-agent gateway run
-```
-
-## Running interactively (CLI chat)
-
-To open an interactive chat session against a running data directory:
-
-```sh
-docker run -it --rm \
-  -v ~/.hermes:/opt/data \
-  nousresearch/hermes-agent
-```
-
-## Upgrading
-
-Pull the latest image and recreate the container. Your data directory is untouched.
-
-```sh
-docker pull nousresearch/hermes-agent:latest
-docker rm -f hermes
-docker run -d \
-  --name hermes \
-  --restart unless-stopped \
-  -v ~/.hermes:/opt/data \
-  nousresearch/hermes-agent
-```
@@ -101,11 +101,21 @@ Available methods:

 ### Patches (`patches.py`)

-**Problem**: Some hermes-agent tools use `asyncio.run()` internally (e.g., the Modal backend). This crashes when called from inside Atropos's event loop because `asyncio.run()` cannot be nested.
+**Problem**: Some hermes-agent tools use `asyncio.run()` internally (e.g., the Modal backend via SWE-ReX). This crashes when called from inside Atropos's event loop because `asyncio.run()` cannot be nested.

-**Solution**: `ModalEnvironment` uses a dedicated `_AsyncWorker` background thread with its own event loop. The calling code sees a sync interface, but internally all async Modal SDK calls happen on the worker thread so they don't conflict with Atropos's loop. This is built directly into `tools/environments/modal.py` — no monkey-patching required.
+**Solution**: `patches.py` monkey-patches `SwerexModalEnvironment` to use a dedicated background thread (`_AsyncWorker`) with its own event loop. The calling code sees the same sync interface, but internally the async work happens on a separate thread that doesn't conflict with Atropos's loop.

-`patches.py` is now a no-op (kept for backward compatibility with imports).
+What gets patched:
+- `SwerexModalEnvironment.__init__` -- creates Modal deployment on a background thread
+- `SwerexModalEnvironment.execute` -- runs commands on the same background thread
+- `SwerexModalEnvironment.stop` -- stops deployment on the background thread
+
+The patches are:
+- **Idempotent** -- calling `apply_patches()` multiple times is safe
+- **Transparent** -- same interface and behavior, only the internal async execution changes
+- **Universal** -- works identically in normal CLI use (no running event loop)
+
+Applied automatically at import time by `hermes_base_env.py`.

 ### Tool Call Parsers (`tool_call_parsers/`)

@@ -25,7 +25,7 @@ import time
 from pathlib import Path
 from typing import Optional

-from hermes_constants import get_hermes_dir
+from hermes_cli.config import get_hermes_home


 # Unambiguous alphabet -- excludes 0/O, 1/I to prevent confusion
@@ -41,7 +41,7 @@ LOCKOUT_SECONDS = 3600              # Lockout duration after too many failures
 MAX_PENDING_PER_PLATFORM = 3        # Max pending codes per platform
 MAX_FAILED_ATTEMPTS = 5             # Failed approvals before lockout

-PAIRING_DIR = get_hermes_dir("platforms/pairing", "pairing")
+PAIRING_DIR = get_hermes_home() / "pairing"


 def _secure_write(path: Path, data: str) -> None:
@@ -166,7 +166,7 @@ class ResponseStore:

 _CORS_HEADERS = {
    "Access-Control-Allow-Methods": "GET, POST, DELETE, OPTIONS",
-    "Access-Control-Allow-Headers": "Authorization, Content-Type, Idempotency-Key",
+    "Access-Control-Allow-Headers": "Authorization, Content-Type",
 }


@@ -223,23 +223,6 @@ if AIOHTTP_AVAILABLE:
 else:
    body_limit_middleware = None  # type: ignore[assignment]

-_SECURITY_HEADERS = {
-    "X-Content-Type-Options": "nosniff",
-    "Referrer-Policy": "no-referrer",
-}
-
-
-if AIOHTTP_AVAILABLE:
-    @web.middleware
-    async def security_headers_middleware(request, handler):
-        """Add security headers to all responses (including errors)."""
-        response = await handler(request)
-        for k, v in _SECURITY_HEADERS.items():
-            response.headers.setdefault(k, v)
-        return response
-else:
-    security_headers_middleware = None  # type: ignore[assignment]
-

 class _IdempotencyCache:
    """In-memory idempotency cache with TTL and basic LRU semantics."""
@@ -324,7 +307,6 @@ class APIServerAdapter(BasePlatformAdapter):
        if "*" in self._cors_origins:
            headers = dict(_CORS_HEADERS)
            headers["Access-Control-Allow-Origin"] = "*"
-            headers["Access-Control-Max-Age"] = "600"
            return headers

        if origin not in self._cors_origins:
@@ -333,7 +315,6 @@ class APIServerAdapter(BasePlatformAdapter):
        headers = dict(_CORS_HEADERS)
        headers["Access-Control-Allow-Origin"] = origin
        headers["Vary"] = "Origin"
-        headers["Access-Control-Max-Age"] = "600"
        return headers

    def _origin_allowed(self, origin: str) -> bool:
@@ -514,21 +495,17 @@ class APIServerAdapter(BasePlatformAdapter):
                if delta is not None:
                    _stream_q.put(delta)

-            # Start agent in background.  agent_ref is a mutable container
-            # so the SSE writer can interrupt the agent on client disconnect.
-            agent_ref = [None]
+            # Start agent in background
            agent_task = asyncio.ensure_future(self._run_agent(
                user_message=user_message,
                conversation_history=history,
                ephemeral_system_prompt=system_prompt,
                session_id=session_id,
                stream_delta_callback=_on_delta,
-                agent_ref=agent_ref,
            ))

            return await self._write_sse_chat_completion(
-                request, completion_id, model_name, created, _stream_q,
-                agent_task, agent_ref,
+                request, completion_id, model_name, created, _stream_q, agent_task
            )

        # Non-streaming: run the agent (with optional Idempotency-Key)
@@ -591,107 +568,80 @@ class APIServerAdapter(BasePlatformAdapter):

    async def _write_sse_chat_completion(
        self, request: "web.Request", completion_id: str, model: str,
-        created: int, stream_q, agent_task, agent_ref=None,
+        created: int, stream_q, agent_task,
    ) -> "web.StreamResponse":
-        """Write real streaming SSE from agent's stream_delta_callback queue.
-
-        If the client disconnects mid-stream (network drop, browser tab close),
-        the agent is interrupted via ``agent.interrupt()`` so it stops making
-        LLM API calls, and the asyncio task wrapper is cancelled.
-        """
+        """Write real streaming SSE from agent's stream_delta_callback queue."""
        import queue as _q

-        sse_headers = {"Content-Type": "text/event-stream", "Cache-Control": "no-cache"}
-        # CORS middleware can't inject headers into StreamResponse after
-        # prepare() flushes them, so resolve CORS headers up front.
-        origin = request.headers.get("Origin", "")
-        cors = self._cors_headers_for_origin(origin) if origin else None
-        if cors:
-            sse_headers.update(cors)
-        response = web.StreamResponse(status=200, headers=sse_headers)
+        response = web.StreamResponse(
+            status=200,
+            headers={"Content-Type": "text/event-stream", "Cache-Control": "no-cache"},
+        )
        await response.prepare(request)

-        try:
-            # Role chunk
-            role_chunk = {
-                "id": completion_id, "object": "chat.completion.chunk",
-                "created": created, "model": model,
-                "choices": [{"index": 0, "delta": {"role": "assistant"}, "finish_reason": None}],
-            }
-            await response.write(f"data: {json.dumps(role_chunk)}\n\n".encode())
+        # Role chunk
+        role_chunk = {
+            "id": completion_id, "object": "chat.completion.chunk",
+            "created": created, "model": model,
+            "choices": [{"index": 0, "delta": {"role": "assistant"}, "finish_reason": None}],
+        }
+        await response.write(f"data: {json.dumps(role_chunk)}\n\n".encode())

-            # Stream content chunks as they arrive from the agent
-            loop = asyncio.get_event_loop()
-            while True:
-                try:
-                    delta = await loop.run_in_executor(None, lambda: stream_q.get(timeout=0.5))
-                except _q.Empty:
-                    if agent_task.done():
-                        # Drain any remaining items
-                        while True:
-                            try:
-                                delta = stream_q.get_nowait()
-                                if delta is None:
-                                    break
-                                content_chunk = {
-                                    "id": completion_id, "object": "chat.completion.chunk",
-                                    "created": created, "model": model,
-                                    "choices": [{"index": 0, "delta": {"content": delta}, "finish_reason": None}],
-                                }
-                                await response.write(f"data: {json.dumps(content_chunk)}\n\n".encode())
-                            except _q.Empty:
-                                break
-                        break
-                    continue
-
-                if delta is None:  # End of stream sentinel
-                    break
-
-                content_chunk = {
-                    "id": completion_id, "object": "chat.completion.chunk",
-                    "created": created, "model": model,
-                    "choices": [{"index": 0, "delta": {"content": delta}, "finish_reason": None}],
-                }
-                await response.write(f"data: {json.dumps(content_chunk)}\n\n".encode())
-
-            # Get usage from completed agent
-            usage = {"input_tokens": 0, "output_tokens": 0, "total_tokens": 0}
+        # Stream content chunks as they arrive from the agent
+        loop = asyncio.get_event_loop()
+        while True:
            try:
-                result, agent_usage = await agent_task
-                usage = agent_usage or usage
-            except Exception:
-                pass
+                delta = await loop.run_in_executor(None, lambda: stream_q.get(timeout=0.5))
+            except _q.Empty:
+                if agent_task.done():
+                    # Drain any remaining items
+                    while True:
+                        try:
+                            delta = stream_q.get_nowait()
+                            if delta is None:
+                                break
+                            content_chunk = {
+                                "id": completion_id, "object": "chat.completion.chunk",
+                                "created": created, "model": model,
+                                "choices": [{"index": 0, "delta": {"content": delta}, "finish_reason": None}],
+                            }
+                            await response.write(f"data: {json.dumps(content_chunk)}\n\n".encode())
+                        except _q.Empty:
+                            break
+                    break
+                continue

-            # Finish chunk
-            finish_chunk = {
+            if delta is None:  # End of stream sentinel
+                break
+
+            content_chunk = {
                "id": completion_id, "object": "chat.completion.chunk",
                "created": created, "model": model,
-                "choices": [{"index": 0, "delta": {}, "finish_reason": "stop"}],
-                "usage": {
-                    "prompt_tokens": usage.get("input_tokens", 0),
-                    "completion_tokens": usage.get("output_tokens", 0),
-                    "total_tokens": usage.get("total_tokens", 0),
-                },
+                "choices": [{"index": 0, "delta": {"content": delta}, "finish_reason": None}],
            }
-            await response.write(f"data: {json.dumps(finish_chunk)}\n\n".encode())
-            await response.write(b"data: [DONE]\n\n")
-        except (ConnectionResetError, ConnectionAbortedError, BrokenPipeError, OSError):
-            # Client disconnected mid-stream.  Interrupt the agent so it
-            # stops making LLM API calls at the next loop iteration, then
-            # cancel the asyncio task wrapper.
-            agent = agent_ref[0] if agent_ref else None
-            if agent is not None:
-                try:
-                    agent.interrupt("SSE client disconnected")
-                except Exception:
-                    pass
-            if not agent_task.done():
-                agent_task.cancel()
-                try:
-                    await agent_task
-                except (asyncio.CancelledError, Exception):
-                    pass
-            logger.info("SSE client disconnected; interrupted agent task %s", completion_id)
+            await response.write(f"data: {json.dumps(content_chunk)}\n\n".encode())
+
+        # Get usage from completed agent
+        usage = {"input_tokens": 0, "output_tokens": 0, "total_tokens": 0}
+        try:
+            result, agent_usage = await agent_task
+            usage = agent_usage or usage
+        except Exception:
+            pass
+
+        # Finish chunk
+        finish_chunk = {
+            "id": completion_id, "object": "chat.completion.chunk",
+            "created": created, "model": model,
+            "choices": [{"index": 0, "delta": {}, "finish_reason": "stop"}],
+            "usage": {
+                "prompt_tokens": usage.get("input_tokens", 0),
+                "completion_tokens": usage.get("output_tokens", 0),
+                "total_tokens": usage.get("total_tokens", 0),
+            },
+        }
+        await response.write(f"data: {json.dumps(finish_chunk)}\n\n".encode())
+        await response.write(b"data: [DONE]\n\n")

        return response

@@ -1194,18 +1144,12 @@ class APIServerAdapter(BasePlatformAdapter):
        ephemeral_system_prompt: Optional[str] = None,
        session_id: Optional[str] = None,
        stream_delta_callback=None,
-        agent_ref: Optional[list] = None,
    ) -> tuple:
        """
        Create an agent and run a conversation in a thread executor.

        Returns ``(result_dict, usage_dict)`` where *usage_dict* contains
        ``input_tokens``, ``output_tokens`` and ``total_tokens``.
-
-        If *agent_ref* is a one-element list, the AIAgent instance is stored
-        at ``agent_ref[0]`` before ``run_conversation`` begins.  This allows
-        callers (e.g. the SSE writer) to call ``agent.interrupt()`` from
-        another thread to stop in-progress LLM calls.
        """
        loop = asyncio.get_event_loop()

@@ -1215,8 +1159,6 @@ class APIServerAdapter(BasePlatformAdapter):
                session_id=session_id,
                stream_delta_callback=stream_delta_callback,
            )
-            if agent_ref is not None:
-                agent_ref[0] = agent
            result = agent.run_conversation(
                user_message=user_message,
                conversation_history=conversation_history,
@@ -1241,11 +1183,10 @@ class APIServerAdapter(BasePlatformAdapter):
            return False

        try:
-            mws = [mw for mw in (cors_middleware, body_limit_middleware, security_headers_middleware) if mw is not None]
+            mws = [mw for mw in (cors_middleware, body_limit_middleware) if mw is not None]
            self._app = web.Application(middlewares=mws)
            self._app["api_server_adapter"] = self
            self._app.router.add_get("/health", self._handle_health)
-            self._app.router.add_get("/v1/health", self._handle_health)
            self._app.router.add_get("/v1/models", self._handle_models)
            self._app.router.add_post("/v1/chat/completions", self._handle_chat_completions)
            self._app.router.add_post("/v1/responses", self._handle_responses)
@@ -27,7 +27,6 @@ sys.path.insert(0, str(_Path(__file__).resolve().parents[2]))
 from gateway.config import Platform, PlatformConfig
 from gateway.session import SessionSource, build_session_key
 from hermes_cli.config import get_hermes_home
-from hermes_constants import get_hermes_dir


 GATEWAY_SECRET_CAPTURE_UNSUPPORTED_MESSAGE = (
@@ -45,8 +44,8 @@ GATEWAY_SECRET_CAPTURE_UNSUPPORTED_MESSAGE = (
 # (e.g. Telegram file URLs expire after ~1 hour).
 # ---------------------------------------------------------------------------

-# Default location: {HERMES_HOME}/cache/images/ (legacy: image_cache/)
-IMAGE_CACHE_DIR = get_hermes_dir("cache/images", "image_cache")
+# Default location: {HERMES_HOME}/image_cache/
+IMAGE_CACHE_DIR = get_hermes_home() / "image_cache"


 def get_image_cache_dir() -> Path:
@@ -148,7 +147,7 @@ def cleanup_image_cache(max_age_hours: int = 24) -> int:
 # here so the STT tool (OpenAI Whisper) can transcribe them from local files.
 # ---------------------------------------------------------------------------

-AUDIO_CACHE_DIR = get_hermes_dir("cache/audio", "audio_cache")
+AUDIO_CACHE_DIR = get_hermes_home() / "audio_cache"


 def get_audio_cache_dir() -> Path:
@@ -175,51 +174,29 @@ def cache_audio_from_bytes(data: bytes, ext: str = ".ogg") -> str:
    return str(filepath)


-async def cache_audio_from_url(url: str, ext: str = ".ogg", retries: int = 2) -> str:
+async def cache_audio_from_url(url: str, ext: str = ".ogg") -> str:
    """
    Download an audio file from a URL and save it to the local cache.

-    Retries on transient failures (timeouts, 429, 5xx) with exponential
-    backoff so a single slow CDN response doesn't lose the media.
-
    Args:
        url: The HTTP/HTTPS URL to download from.
        ext: File extension including the dot (e.g. ".ogg", ".mp3").
-        retries: Number of retry attempts on transient failures.

    Returns:
        Absolute path to the cached audio file as a string.
    """
-    import asyncio
    import httpx
-    import logging as _logging
-    _log = _logging.getLogger(__name__)

-    last_exc = None
    async with httpx.AsyncClient(timeout=30.0, follow_redirects=True) as client:
-        for attempt in range(retries + 1):
-            try:
-                response = await client.get(
-                    url,
-                    headers={
-                        "User-Agent": "Mozilla/5.0 (compatible; HermesAgent/1.0)",
-                        "Accept": "audio/*,*/*;q=0.8",
-                    },
-                )
-                response.raise_for_status()
-                return cache_audio_from_bytes(response.content, ext)
-            except (httpx.TimeoutException, httpx.HTTPStatusError) as exc:
-                last_exc = exc
-                if isinstance(exc, httpx.HTTPStatusError) and exc.response.status_code < 429:
-                    raise
-                if attempt < retries:
-                    wait = 1.5 * (attempt + 1)
-                    _log.debug("Audio cache retry %d/%d for %s (%.1fs): %s",
-                               attempt + 1, retries, url[:80], wait, exc)
-                    await asyncio.sleep(wait)
-                    continue
-                raise
-    raise last_exc
+        response = await client.get(
+            url,
+            headers={
+                "User-Agent": "Mozilla/5.0 (compatible; HermesAgent/1.0)",
+                "Accept": "audio/*,*/*;q=0.8",
+            },
+        )
+        response.raise_for_status()
+        return cache_audio_from_bytes(response.content, ext)


 # ---------------------------------------------------------------------------
@@ -229,7 +206,7 @@ async def cache_audio_from_url(url: str, ext: str = ".ogg", retries: int = 2) ->
 # here so the agent can reference them by local file path.
 # ---------------------------------------------------------------------------

-DOCUMENT_CACHE_DIR = get_hermes_dir("cache/documents", "document_cache")
+DOCUMENT_CACHE_DIR = get_hermes_home() / "document_cache"

 SUPPORTED_DOCUMENT_TYPES = {
    ".pdf": "application/pdf",
@@ -356,10 +333,7 @@ class MessageEvent:
            return None
        # Split on space and get first word, strip the /
        parts = self.text.split(maxsplit=1)
-        raw = parts[0][1:].lower() if parts else None
-        if raw and "@" in raw:
-            raw = raw.split("@", 1)[0]
-        return raw
+        return parts[0][1:].lower() if parts else None
    
    def get_command_args(self) -> str:
        """Get the arguments after a command."""
@@ -550,22 +550,6 @@ class DiscordAdapter(BasePlatformAdapter):
                            return
                    # "all" falls through to handle_message
                
-                # If the message @mentions other users but NOT the bot, the
-                # sender is talking to someone else — stay silent.  Only
-                # applies in server channels; in DMs the user is always
-                # talking to the bot (mentions are just references).
-                # Controlled by DISCORD_IGNORE_NO_MENTION (default: true).
-                _ignore_no_mention = os.getenv(
-                    "DISCORD_IGNORE_NO_MENTION", "true"
-                ).lower() in ("true", "1", "yes")
-                if _ignore_no_mention and message.mentions and not isinstance(message.channel, discord.DMChannel):
-                    _bot_mentioned = (
-                        self._client.user is not None
-                        and self._client.user in message.mentions
-                    )
-                    if not _bot_mentioned:
-                        return  # Talking to someone else, don't interrupt
-
                await self._handle_message(message)

            @self._client.event
@@ -43,20 +43,6 @@ from gateway.platforms.base import (
 from gateway.config import Platform, PlatformConfig

 logger = logging.getLogger(__name__)
-# Automated sender patterns — emails from these are silently ignored
-_NOREPLY_PATTERNS = (
-    "noreply", "no-reply", "no_reply", "donotreply", "do-not-reply",
-    "mailer-daemon", "postmaster", "bounce", "notifications@",
-    "automated@", "auto-confirm", "auto-reply", "automailer",
-)
-
-# RFC headers that indicate bulk/automated mail
-_AUTOMATED_HEADERS = {
-    "Auto-Submitted": lambda v: v.lower() != "no",
-    "Precedence": lambda v: v.lower() in ("bulk", "list", "junk"),
-    "X-Auto-Response-Suppress": lambda v: bool(v),
-    "List-Unsubscribe": lambda v: bool(v),
-}

 # Gmail-safe max length per email body
 MAX_MESSAGE_LENGTH = 50_000
@@ -64,17 +50,7 @@ MAX_MESSAGE_LENGTH = 50_000
 # Supported image extensions for inline detection
 _IMAGE_EXTS = {".jpg", ".jpeg", ".png", ".gif", ".webp"}

-def _is_automated_sender(address: str, headers: dict) -> bool:
-    """Return True if this email is from an automated/noreply source."""
-    addr = address.lower()
-    if any(pattern in addr for pattern in _NOREPLY_PATTERNS):
-        return True
-    for header, check in _AUTOMATED_HEADERS.items():
-        value = headers.get(header, "")
-        if value and check(value):
-            return True
-    return False
-    
+
 def check_email_requirements() -> bool:
    """Check if email platform dependencies are available."""
    addr = os.getenv("EMAIL_ADDRESS")
@@ -237,7 +213,6 @@ class EmailAdapter(BasePlatformAdapter):

        # Track message IDs we've already processed to avoid duplicates
        self._seen_uids: set = set()
-        self._seen_uids_max: int = 2000   # cap to prevent unbounded memory growth
        self._poll_task: Optional[asyncio.Task] = None

        # Map chat_id (sender email) -> last subject + message-id for threading
@@ -245,26 +220,6 @@ class EmailAdapter(BasePlatformAdapter):

        logger.info("[Email] Adapter initialized for %s", self._address)

-    def _trim_seen_uids(self) -> None:
-        """Keep only the most recent UIDs to prevent unbounded memory growth.
-
-        IMAP UIDs are monotonically increasing integers. When the set grows
-        beyond the cap, we keep only the highest half — old UIDs are safe to
-        drop because new messages always have higher UIDs and IMAP's UNSEEN
-        flag prevents re-delivery regardless.
-        """
-        if len(self._seen_uids) <= self._seen_uids_max:
-            return
-        try:
-            # UIDs are bytes like b'1234' — sort numerically and keep top half
-            sorted_uids = sorted(self._seen_uids, key=lambda u: int(u))
-            keep = self._seen_uids_max // 2
-            self._seen_uids = set(sorted_uids[-keep:])
-            logger.debug("[Email] Trimmed seen UIDs to %d entries", len(self._seen_uids))
-        except (ValueError, TypeError):
-            # Fallback: just clear old entries if sort fails
-            self._seen_uids = set(list(self._seen_uids)[-self._seen_uids_max // 2:])
-
    async def connect(self) -> bool:
        """Connect to the IMAP server and start polling for new messages."""
        try:
@@ -277,8 +232,6 @@ class EmailAdapter(BasePlatformAdapter):
            if status == "OK" and data and data[0]:
                for uid in data[0].split():
                    self._seen_uids.add(uid)
-            # Keep only the most recent UIDs to prevent unbounded growth
-            self._trim_seen_uids()
            imap.logout()
            logger.info("[Email] IMAP connection test passed. %d existing messages skipped.", len(self._seen_uids))
        except Exception as e:
@@ -349,9 +302,6 @@ class EmailAdapter(BasePlatformAdapter):
                if uid in self._seen_uids:
                    continue
                self._seen_uids.add(uid)
-                # Trim periodically to prevent unbounded memory growth
-                if len(self._seen_uids) > self._seen_uids_max:
-                    self._trim_seen_uids()

                status, msg_data = imap.uid("fetch", uid, "(RFC822)")
                if status != "OK":
@@ -370,11 +320,6 @@ class EmailAdapter(BasePlatformAdapter):
                subject = _decode_header_value(msg.get("Subject", "(no subject)"))
                message_id = msg.get("Message-ID", "")
                in_reply_to = msg.get("In-Reply-To", "")
-                # Skip automated/noreply senders before any processing
-                msg_headers = dict(msg.items())
-                if _is_automated_sender(sender_addr, msg_headers):
-                    logger.debug("[Email] Skipping automated sender: %s", sender_addr)
-                    continue
                body = _extract_text_body(msg)
                attachments = _extract_attachments(msg, skip_attachments=self._skip_attachments)

@@ -403,11 +348,6 @@ class EmailAdapter(BasePlatformAdapter):
        if sender_addr == self._address.lower():
            return

-        # Never reply to automated senders
-        if _is_automated_sender(sender_addr, {}):
-            logger.debug("[Email] Dropping automated sender at dispatch: %s", sender_addr)
-            return
-
        subject = msg_data["subject"]
        body = msg_data["body"].strip()
        attachments = msg_data["attachments"]
@@ -40,9 +40,7 @@ logger = logging.getLogger(__name__)
 MAX_MESSAGE_LENGTH = 4000

 # Store directory for E2EE keys and sync state.
-# Uses get_hermes_home() so each profile gets its own Matrix store.
-from hermes_constants import get_hermes_dir as _get_hermes_dir
-_STORE_DIR = _get_hermes_dir("platforms/matrix/store", "matrix/store")
+_STORE_DIR = Path.home() / ".hermes" / "matrix" / "store"

 # Grace period: ignore messages older than this many seconds before startup.
 _STARTUP_GRACE_SECONDS = 5
@@ -163,49 +161,22 @@ class MatrixAdapter(BasePlatformAdapter):
        # Authenticate.
        if self._access_token:
            client.access_token = self._access_token
-
-            # With access-token auth, always resolve whoami so we validate the
-            # token and learn the device_id. The device_id matters for E2EE:
-            # without it, matrix-nio can send plain messages but may fail to
-            # decrypt inbound encrypted events or encrypt outbound room sends.
-            resp = await client.whoami()
-            if isinstance(resp, nio.WhoamiResponse):
-                resolved_user_id = getattr(resp, "user_id", "") or self._user_id
-                resolved_device_id = getattr(resp, "device_id", "")
-                if resolved_user_id:
-                    self._user_id = resolved_user_id
-
-                # restore_login() is the matrix-nio path that binds the access
-                # token to a specific device and loads the crypto store.
-                if resolved_device_id and hasattr(client, "restore_login"):
-                    client.restore_login(
-                        self._user_id or resolved_user_id,
-                        resolved_device_id,
-                        self._access_token,
-                    )
+            # Resolve user_id if not set.
+            if not self._user_id:
+                resp = await client.whoami()
+                if isinstance(resp, nio.WhoamiResponse):
+                    self._user_id = resp.user_id
+                    client.user_id = resp.user_id
+                    logger.info("Matrix: authenticated as %s", self._user_id)
                else:
-                    if self._user_id:
-                        client.user_id = self._user_id
-                    if resolved_device_id:
-                        client.device_id = resolved_device_id
-                    client.access_token = self._access_token
-                    if self._encryption:
-                        logger.warning(
-                            "Matrix: access-token login did not restore E2EE state; "
-                            "encrypted rooms may fail until a device_id is available"
-                        )
-
-                logger.info(
-                    "Matrix: using access token for %s%s",
-                    self._user_id or "(unknown user)",
-                    f" (device {resolved_device_id})" if resolved_device_id else "",
-                )
+                    logger.error(
+                        "Matrix: whoami failed — check MATRIX_ACCESS_TOKEN and MATRIX_HOMESERVER"
+                    )
+                    await client.close()
+                    return False
            else:
-                logger.error(
-                    "Matrix: whoami failed — check MATRIX_ACCESS_TOKEN and MATRIX_HOMESERVER"
-                )
-                await client.close()
-                return False
+                client.user_id = self._user_id
+                logger.info("Matrix: using access token for %s", self._user_id)
        elif self._password and self._user_id:
            resp = await client.login(
                self._password,
@@ -223,18 +194,13 @@ class MatrixAdapter(BasePlatformAdapter):
            return False

        # If E2EE is enabled, load the crypto store.
-        if self._encryption and getattr(client, "olm", None):
+        if self._encryption and hasattr(client, "olm"):
            try:
                if client.should_upload_keys:
                    await client.keys_upload()
                logger.info("Matrix: E2EE crypto initialized")
            except Exception as exc:
                logger.warning("Matrix: crypto init issue: %s", exc)
-        elif self._encryption:
-            logger.warning(
-                "Matrix: E2EE requested but crypto store is not loaded; "
-                "encrypted rooms may fail"
-            )

        # Register event callbacks.
        client.add_event_callback(self._on_room_message, nio.RoomMessageText)
@@ -264,7 +230,6 @@ class MatrixAdapter(BasePlatformAdapter):
            )
            # Build DM room cache from m.direct account data.
            await self._refresh_dm_cache()
-            await self._run_e2ee_maintenance()
        else:
            logger.warning("Matrix: initial sync returned %s", type(resp).__name__)

@@ -336,48 +301,13 @@ class MatrixAdapter(BasePlatformAdapter):
                    relates_to["m.in_reply_to"] = {"event_id": reply_to}
                msg_content["m.relates_to"] = relates_to

-            async def _room_send_once(*, ignore_unverified_devices: bool = False):
-                return await asyncio.wait_for(
-                    self._client.room_send(
-                        chat_id,
-                        "m.room.message",
-                        msg_content,
-                        ignore_unverified_devices=ignore_unverified_devices,
-                    ),
-                    timeout=45,
-                )
-
-            try:
-                resp = await _room_send_once(ignore_unverified_devices=False)
-            except Exception as exc:
-                retryable = isinstance(exc, asyncio.TimeoutError)
-                olm_unverified = getattr(nio, "OlmUnverifiedDeviceError", None)
-                send_retry = getattr(nio, "SendRetryError", None)
-                if isinstance(olm_unverified, type) and isinstance(exc, olm_unverified):
-                    retryable = True
-                if isinstance(send_retry, type) and isinstance(exc, send_retry):
-                    retryable = True
-
-                if not retryable:
-                    logger.error("Matrix: failed to send to %s: %s", chat_id, exc)
-                    return SendResult(success=False, error=str(exc))
-
-                logger.warning(
-                    "Matrix: initial encrypted send to %s failed (%s); "
-                    "retrying after E2EE maintenance with ignored unverified devices",
-                    chat_id,
-                    exc,
-                )
-                await self._run_e2ee_maintenance()
-                try:
-                    resp = await _room_send_once(ignore_unverified_devices=True)
-                except Exception as retry_exc:
-                    logger.error("Matrix: failed to send to %s after retry: %s", chat_id, retry_exc)
-                    return SendResult(success=False, error=str(retry_exc))
-
+            resp = await self._client.room_send(
+                chat_id,
+                "m.room.message",
+                msg_content,
+            )
            if isinstance(resp, nio.RoomSendResponse):
                last_event_id = resp.event_id
-                logger.info("Matrix: sent event %s to %s", last_event_id, chat_id)
            else:
                err = getattr(resp, "message", str(resp))
                logger.error("Matrix: failed to send to %s: %s", chat_id, err)
@@ -635,9 +565,6 @@ class MatrixAdapter(BasePlatformAdapter):
                        getattr(resp, "message", resp),
                    )
                    await asyncio.sleep(5)
-                    continue
-
-                await self._run_e2ee_maintenance()
            except asyncio.CancelledError:
                return
            except Exception as exc:
@@ -646,38 +573,6 @@ class MatrixAdapter(BasePlatformAdapter):
                logger.warning("Matrix: sync error: %s — retrying in 5s", exc)
                await asyncio.sleep(5)

-    async def _run_e2ee_maintenance(self) -> None:
-        """Run matrix-nio E2EE housekeeping between syncs.
-
-        Hermes uses a custom sync loop instead of matrix-nio's sync_forever(),
-        so we need to explicitly drive the key management work that sync_forever()
-        normally handles for encrypted rooms.
-        """
-        client = self._client
-        if not client or not self._encryption or not getattr(client, "olm", None):
-            return
-
-        tasks = [asyncio.create_task(client.send_to_device_messages())]
-
-        if client.should_upload_keys:
-            tasks.append(asyncio.create_task(client.keys_upload()))
-
-        if client.should_query_keys:
-            tasks.append(asyncio.create_task(client.keys_query()))
-
-        if client.should_claim_keys:
-            users = client.get_users_for_key_claiming()
-            if users:
-                tasks.append(asyncio.create_task(client.keys_claim(users)))
-
-        for task in asyncio.as_completed(tasks):
-            try:
-                await task
-            except asyncio.CancelledError:
-                raise
-            except Exception as exc:
-                logger.warning("Matrix: E2EE maintenance task failed: %s", exc)
-
    # ------------------------------------------------------------------
    # Event callbacks
    # ------------------------------------------------------------------
@@ -345,8 +345,7 @@ class TelegramAdapter(BasePlatformAdapter):
    def _persist_dm_topic_thread_id(self, chat_id: int, topic_name: str, thread_id: int) -> None:
        """Save a newly created thread_id back into config.yaml so it persists across restarts."""
        try:
-            from hermes_constants import get_hermes_home
-            config_path = get_hermes_home() / "config.yaml"
+            config_path = _Path.home() / ".hermes" / "config.yaml"
            if not config_path.exists():
                logger.warning("[%s] Config file not found at %s, cannot persist thread_id", self.name, config_path)
                return
@@ -708,15 +707,9 @@ class TelegramAdapter(BasePlatformAdapter):
            except ImportError:
                _NetErr = OSError  # type: ignore[misc,assignment]

-            try:
-                from telegram.error import BadRequest as _BadReq
-            except ImportError:
-                _BadReq = None  # type: ignore[assignment,misc]
-
            for i, chunk in enumerate(chunks):
                should_thread = self._should_thread_reply(reply_to, i)
                reply_to_id = int(reply_to) if should_thread else None
-                effective_thread_id = int(thread_id) if thread_id else None

                msg = None
                for _send_attempt in range(3):
@@ -728,7 +721,7 @@ class TelegramAdapter(BasePlatformAdapter):
                                text=chunk,
                                parse_mode=ParseMode.MARKDOWN_V2,
                                reply_to_message_id=reply_to_id,
-                                message_thread_id=effective_thread_id,
+                                message_thread_id=int(thread_id) if thread_id else None,
                            )
                        except Exception as md_error:
                            # Markdown parsing failed, try plain text
@@ -740,30 +733,12 @@ class TelegramAdapter(BasePlatformAdapter):
                                    text=plain_chunk,
                                    parse_mode=None,
                                    reply_to_message_id=reply_to_id,
-                                    message_thread_id=effective_thread_id,
+                                    message_thread_id=int(thread_id) if thread_id else None,
                                )
                            else:
                                raise
                        break  # success
                    except _NetErr as send_err:
-                        # BadRequest is a subclass of NetworkError in
-                        # python-telegram-bot but represents permanent errors
-                        # (not transient network issues). Detect and handle
-                        # specific cases instead of blindly retrying.
-                        if _BadReq and isinstance(send_err, _BadReq):
-                            err_lower = str(send_err).lower()
-                            if "thread not found" in err_lower and effective_thread_id is not None:
-                                # Thread doesn't exist — retry without
-                                # message_thread_id so the message still
-                                # reaches the chat.
-                                logger.warning(
-                                    "[%s] Thread %s not found, retrying without message_thread_id",
-                                    self.name, effective_thread_id,
-                                )
-                                effective_thread_id = None
-                                continue
-                            # Other BadRequest errors are permanent — don't retry
-                            raise
                        if _send_attempt < 2:
                            wait = 2 ** _send_attempt
                            logger.warning("[%s] Network error on send (attempt %d/3), retrying in %ds: %s",
@@ -1758,8 +1733,7 @@ class TelegramAdapter(BasePlatformAdapter):
        recognized without a gateway restart.
        """
        try:
-            from hermes_constants import get_hermes_home
-            config_path = get_hermes_home() / "config.yaml"
+            config_path = _Path.home() / ".hermes" / "config.yaml"
            if not config_path.exists():
                return

@@ -12,7 +12,6 @@ from __future__ import annotations
 import asyncio
 import ipaddress
 import logging
-import os
 import socket
 from typing import Iterable, Optional

@@ -44,14 +43,6 @@ _DOH_PROVIDERS: list[dict] = [
 _SEED_FALLBACK_IPS: list[str] = ["149.154.167.220"]


-def _resolve_proxy_url() -> str | None:
-    for key in ("HTTPS_PROXY", "HTTP_PROXY", "ALL_PROXY", "https_proxy", "http_proxy", "all_proxy"):
-        value = (os.environ.get(key) or "").strip()
-        if value:
-            return value
-    return None
-
-
 class TelegramFallbackTransport(httpx.AsyncBaseTransport):
    """Retry Telegram Bot API requests via fallback IPs while preserving TLS/SNI.

@@ -63,9 +54,6 @@ class TelegramFallbackTransport(httpx.AsyncBaseTransport):

    def __init__(self, fallback_ips: Iterable[str], **transport_kwargs):
        self._fallback_ips = [ip for ip in dict.fromkeys(_normalize_fallback_ips(fallback_ips))]
-        proxy_url = _resolve_proxy_url()
-        if proxy_url and "proxy" not in transport_kwargs:
-            transport_kwargs["proxy"] = proxy_url
        self._primary = httpx.AsyncHTTPTransport(**transport_kwargs)
        self._fallbacks = {
            ip: httpx.AsyncHTTPTransport(**transport_kwargs) for ip in self._fallback_ips
@@ -27,7 +27,6 @@ import hashlib
 import hmac
 import json
 import logging
-import os
 import re
 import subprocess
 import time
@@ -54,7 +53,6 @@ logger = logging.getLogger(__name__)
 DEFAULT_HOST = "0.0.0.0"
 DEFAULT_PORT = 8644
 _INSECURE_NO_AUTH = "INSECURE_NO_AUTH"
-_DYNAMIC_ROUTES_FILENAME = "webhook_subscriptions.json"


 def check_webhook_requirements() -> bool:
@@ -70,10 +68,7 @@ class WebhookAdapter(BasePlatformAdapter):
        self._host: str = config.extra.get("host", DEFAULT_HOST)
        self._port: int = int(config.extra.get("port", DEFAULT_PORT))
        self._global_secret: str = config.extra.get("secret", "")
-        self._static_routes: Dict[str, dict] = config.extra.get("routes", {})
-        self._dynamic_routes: Dict[str, dict] = {}
-        self._dynamic_routes_mtime: float = 0.0
-        self._routes: Dict[str, dict] = dict(self._static_routes)
+        self._routes: Dict[str, dict] = config.extra.get("routes", {})
        self._runner = None

        # Delivery info keyed by session chat_id — consumed by send()
@@ -101,9 +96,6 @@ class WebhookAdapter(BasePlatformAdapter):
    # ------------------------------------------------------------------

    async def connect(self) -> bool:
-        # Load agent-created subscriptions before validating
-        self._reload_dynamic_routes()
-
        # Validate routes at startup — secret is required per route
        for name, route in self._routes.items():
            secret = route.get("secret", self._global_secret)
@@ -190,46 +182,8 @@ class WebhookAdapter(BasePlatformAdapter):
        """GET /health — simple health check."""
        return web.json_response({"status": "ok", "platform": "webhook"})

-    def _reload_dynamic_routes(self) -> None:
-        """Reload agent-created subscriptions from disk if the file changed."""
-        from pathlib import Path as _Path
-        hermes_home = _Path(
-            os.getenv("HERMES_HOME", str(_Path.home() / ".hermes"))
-        ).expanduser()
-        subs_path = hermes_home / _DYNAMIC_ROUTES_FILENAME
-        if not subs_path.exists():
-            if self._dynamic_routes:
-                self._dynamic_routes = {}
-                self._routes = dict(self._static_routes)
-                logger.debug("[webhook] Dynamic subscriptions file removed, cleared dynamic routes")
-            return
-        try:
-            mtime = subs_path.stat().st_mtime
-            if mtime <= self._dynamic_routes_mtime:
-                return  # No change
-            data = json.loads(subs_path.read_text(encoding="utf-8"))
-            if not isinstance(data, dict):
-                return
-            # Merge: static routes take precedence over dynamic ones
-            self._dynamic_routes = {
-                k: v for k, v in data.items()
-                if k not in self._static_routes
-            }
-            self._routes = {**self._dynamic_routes, **self._static_routes}
-            self._dynamic_routes_mtime = mtime
-            logger.info(
-                "[webhook] Reloaded %d dynamic route(s): %s",
-                len(self._dynamic_routes),
-                ", ".join(self._dynamic_routes.keys()) or "(none)",
-            )
-        except Exception as e:
-            logger.warning("[webhook] Failed to reload dynamic routes: %s", e)
-
    async def _handle_webhook(self, request: "web.Request") -> "web.Response":
        """POST /webhooks/{route_name} — receive and process a webhook event."""
-        # Hot-reload dynamic subscriptions on each request (mtime-gated, cheap)
-        self._reload_dynamic_routes()
-
        route_name = request.match_info.get("route_name", "")
        route_config = self._routes.get(route_name)

@@ -26,7 +26,6 @@ from pathlib import Path
 from typing import Dict, Optional, Any

 from hermes_cli.config import get_hermes_home
-from hermes_constants import get_hermes_dir

 logger = logging.getLogger(__name__)

@@ -135,7 +134,7 @@ class WhatsAppAdapter(BasePlatformAdapter):
        )
        self._session_path: Path = Path(config.extra.get(
            "session_path",
-            get_hermes_dir("platforms/whatsapp/session", "whatsapp/session")
+            get_hermes_home() / "whatsapp" / "session"
        ))
        self._reply_prefix: Optional[str] = config.extra.get("reply_prefix")
        self._message_queue: asyncio.Queue = asyncio.Queue()
@@ -527,7 +526,6 @@ class WhatsAppAdapter(BasePlatformAdapter):
        image_path: str,
        caption: Optional[str] = None,
        reply_to: Optional[str] = None,
-        **kwargs,
    ) -> SendResult:
        """Send a local image file natively via bridge."""
        return await self._send_media_to_bridge(chat_id, image_path, "image", caption)
@@ -538,7 +536,6 @@ class WhatsAppAdapter(BasePlatformAdapter):
        video_path: str,
        caption: Optional[str] = None,
        reply_to: Optional[str] = None,
-        **kwargs,
    ) -> SendResult:
        """Send a video natively via bridge — plays inline in WhatsApp."""
        return await self._send_media_to_bridge(chat_id, video_path, "video", caption)
@@ -550,7 +547,6 @@ class WhatsAppAdapter(BasePlatformAdapter):
        caption: Optional[str] = None,
        file_name: Optional[str] = None,
        reply_to: Optional[str] = None,
-        **kwargs,
    ) -> SendResult:
        """Send a document/file as a downloadable attachment via bridge."""
        return await self._send_media_to_bridge(
@@ -288,7 +288,7 @@ def _resolve_gateway_model(config: dict | None = None) -> str:
    if isinstance(model_cfg, str):
        model = model_cfg
    elif isinstance(model_cfg, dict):
-        model = model_cfg.get("default") or model_cfg.get("model") or model
+        model = model_cfg.get("default", model)
    return model


@@ -432,7 +432,7 @@ class GatewayRunner:
            from honcho_integration.session import HonchoSessionManager

            hcfg = HonchoClientConfig.from_global_config()
-            if not hcfg.enabled or not (hcfg.api_key or hcfg.base_url):
+            if not hcfg.enabled or not hcfg.api_key:
                return None, hcfg

            client = get_honcho_client(hcfg)
@@ -745,22 +745,10 @@ class GatewayRunner:
                logger.error("No connected messaging platforms remain. Shutting down gateway cleanly.")
            await self.stop()
        elif not self.adapters and self._failed_platforms:
-            # All platforms are down and queued for background reconnection.
-            # If the error is retryable, exit with failure so systemd Restart=on-failure
-            # can restart the process. Otherwise stay alive and keep retrying in background.
-            if adapter.fatal_error_retryable:
-                self._exit_reason = adapter.fatal_error_message or "All messaging platforms failed with retryable errors"
-                self._exit_with_failure = True
-                logger.error(
-                    "All messaging platforms failed with retryable errors. "
-                    "Shutting down gateway for service restart (systemd will retry)."
-                )
-                await self.stop()
-            else:
-                logger.warning(
-                    "No connected messaging platforms remain, but %d platform(s) queued for reconnection",
-                    len(self._failed_platforms),
-                )
+            logger.warning(
+                "No connected messaging platforms remain, but %d platform(s) queued for reconnection",
+                len(self._failed_platforms),
+            )

    def _request_clean_exit(self, reason: str) -> None:
        self._exit_cleanly = True
@@ -2093,7 +2081,7 @@ class GatewayRunner:
                    if isinstance(_model_cfg, str):
                        _hyg_model = _model_cfg
                    elif isinstance(_model_cfg, dict):
-                        _hyg_model = _model_cfg.get("default") or _model_cfg.get("model") or _hyg_model
+                        _hyg_model = _model_cfg.get("default", _hyg_model)
                        # Read explicit context_length override from model config
                        # (same as run_agent.py lines 995-1005)
                        _raw_ctx = _model_cfg.get("context_length")
@@ -2216,15 +2204,6 @@ class GatewayRunner:
                                    ),
                                )

-                                # _compress_context ends the old session and creates
-                                # a new session_id.  Write compressed messages into
-                                # the NEW session so the old transcript stays intact
-                                # and searchable via session_search.
-                                _hyg_new_sid = _hyg_agent.session_id
-                                if _hyg_new_sid != session_entry.session_id:
-                                    session_entry.session_id = _hyg_new_sid
-                                    self.session_store._save()
-
                                self.session_store.rewrite_transcript(
                                    session_entry.session_id, _compressed
                                )
@@ -4019,22 +3998,13 @@ class GatewayRunner:
            loop = asyncio.get_event_loop()
            compressed, _ = await loop.run_in_executor(
                None,
-                lambda: tmp_agent._compress_context(msgs, "", approx_tokens=approx_tokens)
+                lambda: tmp_agent._compress_context(msgs, "", approx_tokens=approx_tokens),
            )

-            # _compress_context already calls end_session() on the old session
-            # (preserving its full transcript in SQLite) and creates a new
-            # session_id for the continuation.  Write the compressed messages
-            # into the NEW session so the original history stays searchable.
-            new_session_id = tmp_agent.session_id
-            if new_session_id != session_entry.session_id:
-                session_entry.session_id = new_session_id
-                self.session_store._save()
-
-            self.session_store.rewrite_transcript(new_session_id, compressed)
+            self.session_store.rewrite_transcript(session_entry.session_id, compressed)
            # Reset stored token count — transcript changed, old value is stale
            self.session_store.update_session(
-                session_entry.session_key, last_prompt_tokens=0
+                session_entry.session_key, last_prompt_tokens=0,
            )
            new_count = len(compressed)
            new_tokens = estimate_messages_tokens_rough(compressed)
@@ -4190,7 +4160,7 @@ class GatewayRunner:
            ]
            ctx = agent.context_compressor
            if ctx.last_prompt_tokens:
-                pct = min(100, ctx.last_prompt_tokens / ctx.context_length * 100) if ctx.context_length else 0
+                pct = ctx.last_prompt_tokens / ctx.context_length * 100 if ctx.context_length else 0
                lines.append(f"Context: {ctx.last_prompt_tokens:,} / {ctx.context_length:,} ({pct:.0f}%)")
            if ctx.compression_count:
                lines.append(f"Compressions: {ctx.compression_count}")
@@ -5004,17 +4974,12 @@ class GatewayRunner:
            progress_queue.put(msg)
        
        # Background task to send progress messages
-        # Accumulates tool lines into a single message that gets edited.
-        #
-        # Threading metadata is platform-specific:
-        # - Slack DM threading needs event_message_id fallback (reply thread)
-        # - Telegram uses message_thread_id only for forum topics; passing a
-        #   normal DM/group message id as thread_id causes send failures
-        # - Other platforms should use explicit source.thread_id only
-        if source.platform == Platform.SLACK:
-            _progress_thread_id = source.thread_id or event_message_id
-        else:
-            _progress_thread_id = source.thread_id
+        # Accumulates tool lines into a single message that gets edited
+        # For DM top-level Slack messages, source.thread_id is None but the
+        # final reply will be threaded under the original message via reply_to.
+        # Use event_message_id as fallback so progress messages land in the
+        # same thread as the final response instead of going to the DM root.
+        _progress_thread_id = source.thread_id or event_message_id
        _progress_metadata = {"thread_id": _progress_thread_id} if _progress_thread_id else None

        async def send_progress_messages():
@@ -11,5 +11,5 @@ Provides subcommands for:
 - hermes cron          - Manage cron jobs
 """

-__version__ = "0.5.0"
-__release_date__ = "2026.3.28"
+__version__ = "0.4.0"
+__release_date__ = "2026.3.23"
@@ -160,7 +160,7 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
        id="alibaba",
        name="Alibaba Cloud (DashScope)",
        auth_type="api_key",
-        inference_base_url="https://coding-intl.dashscope.aliyuncs.com/v1",
+        inference_base_url="https://dashscope-intl.aliyuncs.com/apps/anthropic",
        api_key_env_vars=("DASHSCOPE_API_KEY",),
        base_url_env_var="DASHSCOPE_BASE_URL",
    ),
@@ -212,14 +212,6 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
        api_key_env_vars=("KILOCODE_API_KEY",),
        base_url_env_var="KILOCODE_BASE_URL",
    ),
-    "huggingface": ProviderConfig(
-        id="huggingface",
-        name="Hugging Face",
-        auth_type="api_key",
-        inference_base_url="https://router.huggingface.co/v1",
-        api_key_env_vars=("HF_TOKEN",),
-        base_url_env_var="HF_BASE_URL",
-    ),
 }


@@ -693,7 +685,6 @@ def resolve_provider(
        "github-copilot-acp": "copilot-acp", "copilot-acp-agent": "copilot-acp",
        "aigateway": "ai-gateway", "vercel": "ai-gateway", "vercel-ai-gateway": "ai-gateway",
        "opencode": "opencode-zen", "zen": "opencode-zen",
-        "hf": "huggingface", "hugging-face": "huggingface", "huggingface-hub": "huggingface",
        "go": "opencode-go", "opencode-go-sub": "opencode-go",
        "kilo": "kilocode", "kilo-code": "kilocode", "kilo-gateway": "kilocode",
    }
@@ -138,12 +138,6 @@ DEFAULT_CONFIG = {
    "toolsets": ["hermes-cli"],
    "agent": {
        "max_turns": 90,
-        # Tool-use enforcement: injects system prompt guidance that tells the
-        # model to actually call tools instead of describing intended actions.
-        # Values: "auto" (default — applies to gpt/codex models), true/false
-        # (force on/off for all models), or a list of model-name substrings
-        # to match (e.g. ["gpt", "codex", "gemini", "qwen"]).
-        "tool_use_enforcement": "auto",
    },
    
    "terminal": {
@@ -227,49 +221,42 @@ DEFAULT_CONFIG = {
            "model": "",
            "base_url": "",
            "api_key": "",
-            "timeout": 30,         # seconds — increase for slow local models
        },
        "compression": {
            "provider": "auto",
            "model": "",
            "base_url": "",
            "api_key": "",
-            "timeout": 120,        # seconds — compression summarises large contexts; increase for local models
        },
        "session_search": {
            "provider": "auto",
            "model": "",
            "base_url": "",
            "api_key": "",
-            "timeout": 30,
        },
        "skills_hub": {
            "provider": "auto",
            "model": "",
            "base_url": "",
            "api_key": "",
-            "timeout": 30,
        },
        "approval": {
            "provider": "auto",
            "model": "",           # fast/cheap model recommended (e.g. gemini-flash, haiku)
            "base_url": "",
            "api_key": "",
-            "timeout": 30,
        },
        "mcp": {
            "provider": "auto",
            "model": "",
            "base_url": "",
            "api_key": "",
-            "timeout": 30,
        },
        "flush_memories": {
            "provider": "auto",
            "model": "",
            "base_url": "",
            "api_key": "",
-            "timeout": 30,
        },
    },
    
@@ -560,14 +547,14 @@ OPTIONAL_ENV_VARS = {
        "category": "provider",
    },
    "DASHSCOPE_API_KEY": {
-        "description": "Alibaba Cloud DashScope API key (Qwen + multi-provider models)",
+        "description": "Alibaba Cloud DashScope API key for Qwen models",
        "prompt": "DashScope API Key",
        "url": "https://modelstudio.console.alibabacloud.com/",
        "password": True,
        "category": "provider",
    },
    "DASHSCOPE_BASE_URL": {
-        "description": "Custom DashScope base URL (default: coding-intl OpenAI-compat endpoint)",
+        "description": "Custom DashScope base URL (default: international endpoint)",
        "prompt": "DashScope Base URL",
        "url": "",
        "password": False,
@@ -606,31 +593,8 @@ OPTIONAL_ENV_VARS = {
        "category": "provider",
        "advanced": True,
    },
-    "HF_TOKEN": {
-        "description": "Hugging Face token for Inference Providers (20+ open models via router.huggingface.co)",
-        "prompt": "Hugging Face Token",
-        "url": "https://huggingface.co/settings/tokens",
-        "password": True,
-        "category": "provider",
-    },
-    "HF_BASE_URL": {
-        "description": "Hugging Face Inference Providers base URL override",
-        "prompt": "HF base URL (leave empty for default)",
-        "url": None,
-        "password": False,
-        "category": "provider",
-        "advanced": True,
-    },

    # ── Tool API keys ──
-    "EXA_API_KEY": {
-        "description": "Exa API key for AI-native web search and contents",
-        "prompt": "Exa API key",
-        "url": "https://exa.ai/",
-        "tools": ["web_search", "web_extract"],
-        "password": True,
-        "category": "tool",
-    },
    "PARALLEL_API_KEY": {
        "description": "Parallel API key for AI-native web search and extract",
        "prompt": "Parallel API key",
@@ -1687,7 +1651,6 @@ def show_config():
    keys = [
        ("OPENROUTER_API_KEY", "OpenRouter"),
        ("VOICE_TOOLS_OPENAI_KEY", "OpenAI (STT/TTS)"),
-        ("EXA_API_KEY", "Exa"),
        ("PARALLEL_API_KEY", "Parallel"),
        ("FIRECRAWL_API_KEY", "Firecrawl"),
        ("TAVILY_API_KEY", "Tavily"),
@@ -1847,7 +1810,7 @@ def set_config_value(key: str, value: str):
    # Check if it's an API key (goes to .env)
    api_keys = [
        'OPENROUTER_API_KEY', 'OPENAI_API_KEY', 'ANTHROPIC_API_KEY', 'VOICE_TOOLS_OPENAI_KEY',
-        'EXA_API_KEY', 'PARALLEL_API_KEY', 'FIRECRAWL_API_KEY', 'FIRECRAWL_API_URL', 'TAVILY_API_KEY',
+        'PARALLEL_API_KEY', 'FIRECRAWL_API_KEY', 'FIRECRAWL_API_URL', 'TAVILY_API_KEY',
        'BROWSERBASE_API_KEY', 'BROWSERBASE_PROJECT_ID', 'BROWSER_USE_API_KEY',
        'FAL_KEY', 'TELEGRAM_BOT_TOKEN', 'DISCORD_BOT_TOKEN',
        'TERMINAL_SSH_HOST', 'TERMINAL_SSH_USER', 'TERMINAL_SSH_KEY',
@@ -56,7 +56,7 @@ def _honcho_is_configured_for_doctor() -> bool:
        from honcho_integration.client import HonchoClientConfig

        cfg = HonchoClientConfig.from_global_config()
-        return bool(cfg.enabled and (cfg.api_key or cfg.base_url))
+        return bool(cfg.enabled and cfg.api_key)
    except Exception:
        return False

@@ -708,8 +708,8 @@ def run_doctor(args):
            check_warn("Honcho config not found", "run: hermes honcho setup")
        elif not hcfg.enabled:
            check_info(f"Honcho disabled (set enabled: true in {_honcho_cfg_path} to activate)")
-        elif not (hcfg.api_key or hcfg.base_url):
-            check_fail("Honcho API key or base URL not set", "run: hermes honcho setup")
+        elif not hcfg.api_key:
+            check_fail("Honcho API key not set", "run: hermes honcho setup")
            issues.append("No Honcho API key — run 'hermes honcho setup'")
        else:
            from honcho_integration.client import get_honcho_client, reset_honcho_client
@@ -125,43 +125,20 @@ _SERVICE_BASE = "hermes-gateway"
 SERVICE_DESCRIPTION = "Hermes Agent Gateway - Messaging Platform Integration"


-def _profile_suffix() -> str:
-    """Derive a service-name suffix from the current HERMES_HOME.
-
-    Returns ``""`` for the default ``~/.hermes``, the profile name for
-    ``~/.hermes/profiles/<name>``, or a short hash for any other custom
-    HERMES_HOME path.
-    """
-    import hashlib
-    import re
-    from pathlib import Path as _Path
-    home = get_hermes_home().resolve()
-    default = (_Path.home() / ".hermes").resolve()
-    if home == default:
-        return ""
-    # Detect ~/.hermes/profiles/<name> pattern → use the profile name
-    profiles_root = (default / "profiles").resolve()
-    try:
-        rel = home.relative_to(profiles_root)
-        parts = rel.parts
-        if len(parts) == 1 and re.match(r"^[a-z0-9][a-z0-9_-]{0,63}$", parts[0]):
-            return parts[0]
-    except ValueError:
-        pass
-    # Fallback: short hash for arbitrary HERMES_HOME paths
-    return hashlib.sha256(str(home).encode()).hexdigest()[:8]
-
-
 def get_service_name() -> str:
    """Derive a systemd service name scoped to this HERMES_HOME.

    Default ``~/.hermes`` returns ``hermes-gateway`` (backward compatible).
-    Profile ``~/.hermes/profiles/coder`` returns ``hermes-gateway-coder``.
-    Any other HERMES_HOME appends a short hash for uniqueness.
+    Any other HERMES_HOME appends a short hash so multiple installations
+    can each have their own systemd service without conflicting.
    """
-    suffix = _profile_suffix()
-    if not suffix:
+    import hashlib
+    from pathlib import Path as _Path  # local import to avoid monkeypatch interference
+    home = get_hermes_home().resolve()
+    default = (_Path.home() / ".hermes").resolve()
+    if home == default:
        return _SERVICE_BASE
+    suffix = hashlib.sha256(str(home).encode()).hexdigest()[:8]
    return f"{_SERVICE_BASE}-{suffix}"


@@ -392,14 +369,7 @@ def print_systemd_linger_guidance() -> None:
        print("  sudo loginctl enable-linger $USER")

 def get_launchd_plist_path() -> Path:
-    """Return the launchd plist path, scoped per profile.
-
-    Default ``~/.hermes`` → ``ai.hermes.gateway.plist`` (backward compatible).
-    Profile ``~/.hermes/profiles/coder`` → ``ai.hermes.gateway-coder.plist``.
-    """
-    suffix = _profile_suffix()
-    name = f"ai.hermes.gateway-{suffix}" if suffix else "ai.hermes.gateway"
-    return Path.home() / "Library" / "LaunchAgents" / f"{name}.plist"
+    return Path.home() / "Library" / "LaunchAgents" / "ai.hermes.gateway.plist"

 def _detect_venv_dir() -> Path | None:
    """Detect the active virtualenv directory.
@@ -450,17 +420,6 @@ def get_hermes_cli_path() -> str:
 # Systemd (Linux)
 # =============================================================================

-def _build_user_local_paths(home: Path, path_entries: list[str]) -> list[str]:
-    """Return user-local bin dirs that exist and aren't already in *path_entries*."""
-    candidates = [
-        str(home / ".local" / "bin"),       # uv, uvx, pip-installed CLIs
-        str(home / ".cargo" / "bin"),        # Rust/cargo tools
-        str(home / "go" / "bin"),            # Go tools
-        str(home / ".npm-global" / "bin"),   # npm global packages
-    ]
-    return [p for p in candidates if p not in path_entries and Path(p).exists()]
-
-
 def generate_systemd_unit(system: bool = False, run_as_user: str | None = None) -> str:
    python_path = get_python_path()
    working_dir = str(PROJECT_ROOT)
@@ -475,16 +434,13 @@ def generate_systemd_unit(system: bool = False, run_as_user: str | None = None)
        resolved_node_dir = str(Path(resolved_node).resolve().parent)
        if resolved_node_dir not in path_entries:
            path_entries.append(resolved_node_dir)
+    path_entries.extend(["/usr/local/sbin", "/usr/local/bin", "/usr/sbin", "/usr/bin", "/sbin", "/bin"])
+    sane_path = ":".join(path_entries)

    hermes_home = str(get_hermes_home().resolve())

-    common_bin_paths = ["/usr/local/sbin", "/usr/local/bin", "/usr/sbin", "/usr/bin", "/sbin", "/bin"]
-
    if system:
        username, group_name, home_dir = _system_service_identity(run_as_user)
-        path_entries.extend(_build_user_local_paths(Path(home_dir), path_entries))
-        path_entries.extend(common_bin_paths)
-        sane_path = ":".join(path_entries)
        return f"""[Unit]
 Description={SERVICE_DESCRIPTION}
 After=network-online.target
@@ -516,9 +472,6 @@ StandardError=journal
 WantedBy=multi-user.target
 """

-    path_entries.extend(_build_user_local_paths(Path.home(), path_entries))
-    path_entries.extend(common_bin_paths)
-    sane_path = ":".join(path_entries)
    return f"""[Unit]
 Description={SERVICE_DESCRIPTION}
 After=network.target
@@ -799,46 +752,18 @@ def systemd_status(deep: bool = False, system: bool = False):
 # Launchd (macOS)
 # =============================================================================

-def get_launchd_label() -> str:
-    """Return the launchd service label, scoped per profile."""
-    suffix = _profile_suffix()
-    return f"ai.hermes.gateway-{suffix}" if suffix else "ai.hermes.gateway"
-
-
 def generate_launchd_plist() -> str:
    python_path = get_python_path()
    working_dir = str(PROJECT_ROOT)
-    hermes_home = str(get_hermes_home().resolve())
    log_dir = get_hermes_home() / "logs"
    log_dir.mkdir(parents=True, exist_ok=True)
-    label = get_launchd_label()
-    # Build a sane PATH for the launchd plist.  launchd provides only a
-    # minimal default (/usr/bin:/bin:/usr/sbin:/sbin) which misses Homebrew,
-    # nvm, cargo, etc.  We prepend venv/bin and node_modules/.bin (matching
-    # the systemd unit), then capture the user's full shell PATH so every
-    # user-installed tool (node, ffmpeg, …) is reachable.
-    detected_venv = _detect_venv_dir()
-    venv_bin = str(detected_venv / "bin") if detected_venv else str(PROJECT_ROOT / "venv" / "bin")
-    venv_dir = str(detected_venv) if detected_venv else str(PROJECT_ROOT / "venv")
-    node_bin = str(PROJECT_ROOT / "node_modules" / ".bin")
-    # Resolve the directory containing the node binary (e.g. Homebrew, nvm)
-    # so it's explicitly in PATH even if the user's shell PATH changes later.
-    priority_dirs = [venv_bin, node_bin]
-    resolved_node = shutil.which("node")
-    if resolved_node:
-        resolved_node_dir = str(Path(resolved_node).resolve().parent)
-        if resolved_node_dir not in priority_dirs:
-            priority_dirs.append(resolved_node_dir)
-    sane_path = ":".join(
-        dict.fromkeys(priority_dirs + [p for p in os.environ.get("PATH", "").split(":") if p])
-    )
-
+    
    return f"""<?xml version="1.0" encoding="UTF-8"?>
 <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
 <plist version="1.0">
 <dict>
    <key>Label</key>
-    <string>{label}</string>
+    <string>ai.hermes.gateway</string>
    
    <key>ProgramArguments</key>
    <array>
@@ -853,16 +778,6 @@ def generate_launchd_plist() -> str:
    <key>WorkingDirectory</key>
    <string>{working_dir}</string>
    
-    <key>EnvironmentVariables</key>
-    <dict>
-        <key>PATH</key>
-        <string>{sane_path}</string>
-        <key>VIRTUAL_ENV</key>
-        <string>{venv_dir}</string>
-        <key>HERMES_HOME</key>
-        <string>{hermes_home}</string>
-    </dict>
-    
    <key>RunAtLoad</key>
    <true/>
    
@@ -948,33 +863,20 @@ def launchd_uninstall():
    print("✓ Service uninstalled")

 def launchd_start():
-    plist_path = get_launchd_plist_path()
-    label = get_launchd_label()
-
-    # Self-heal if the plist is missing entirely (e.g., manual cleanup, failed upgrade)
-    if not plist_path.exists():
-        print("↻ launchd plist missing; regenerating service definition")
-        plist_path.parent.mkdir(parents=True, exist_ok=True)
-        plist_path.write_text(generate_launchd_plist(), encoding="utf-8")
-        subprocess.run(["launchctl", "load", str(plist_path)], check=True)
-        subprocess.run(["launchctl", "start", label], check=True)
-        print("✓ Service started")
-        return
-
    refresh_launchd_plist_if_needed()
+    plist_path = get_launchd_plist_path()
    try:
-        subprocess.run(["launchctl", "start", label], check=True)
+        subprocess.run(["launchctl", "start", "ai.hermes.gateway"], check=True)
    except subprocess.CalledProcessError as e:
-        if e.returncode != 3:
+        if e.returncode != 3 or not plist_path.exists():
            raise
        print("↻ launchd job was unloaded; reloading service definition")
        subprocess.run(["launchctl", "load", str(plist_path)], check=True)
-        subprocess.run(["launchctl", "start", label], check=True)
+        subprocess.run(["launchctl", "start", "ai.hermes.gateway"], check=True)
    print("✓ Service started")

 def launchd_stop():
-    label = get_launchd_label()
-    subprocess.run(["launchctl", "stop", label], check=True)
+    subprocess.run(["launchctl", "stop", "ai.hermes.gateway"], check=True)
    print("✓ Service stopped")

 def _wait_for_gateway_exit(timeout: float = 10.0, force_after: float = 5.0):
@@ -1029,9 +931,8 @@ def launchd_restart():

 def launchd_status(deep: bool = False):
    plist_path = get_launchd_plist_path()
-    label = get_launchd_label()
    result = subprocess.run(
-        ["launchctl", "list", label],
+        ["launchctl", "list", "ai.hermes.gateway"],
        capture_output=True,
        text=True
    )
@@ -1536,7 +1437,7 @@ def _is_service_running() -> bool:
        return False
    elif is_macos() and get_launchd_plist_path().exists():
        result = subprocess.run(
-            ["launchctl", "list", get_launchd_label()],
+            ["launchctl", "list", "ai.hermes.gateway"],
            capture_output=True, text=True
        )
        return result.returncode == 0
@@ -795,7 +795,6 @@ def cmd_model(args):
        "ai-gateway": "AI Gateway",
        "kilocode": "Kilo Code",
        "alibaba": "Alibaba Cloud (DashScope)",
-        "huggingface": "Hugging Face",
        "custom": "Custom endpoint",
    }
    active_label = provider_labels.get(active, active)
@@ -821,8 +820,7 @@ def cmd_model(args):
        ("opencode-zen", "OpenCode Zen (35+ curated models, pay-as-you-go)"),
        ("opencode-go", "OpenCode Go (open models, $10/month subscription)"),
        ("ai-gateway", "AI Gateway (Vercel — 200+ models, pay-per-use)"),
-        ("alibaba", "Alibaba Cloud / DashScope Coding (Qwen + multi-provider)"),
-        ("huggingface", "Hugging Face Inference Providers (20+ open models)"),
+        ("alibaba", "Alibaba Cloud / DashScope (Qwen models, Anthropic-compatible)"),
    ]

    # Add user-defined custom providers from config.yaml
@@ -832,8 +830,8 @@ def cmd_model(args):
        for entry in custom_providers_cfg:
            if not isinstance(entry, dict):
                continue
-            name = (entry.get("name") or "").strip()
-            base_url = (entry.get("base_url") or "").strip()
+            name = entry.get("name", "").strip()
+            base_url = entry.get("base_url", "").strip()
            if not name or not base_url:
                continue
            # Generate a stable key from the name
@@ -895,7 +893,7 @@ def cmd_model(args):
        _model_flow_anthropic(config, current_model)
    elif selected_provider == "kimi-coding":
        _model_flow_kimi(config, current_model)
-    elif selected_provider in ("zai", "minimax", "minimax-cn", "kilocode", "opencode-zen", "opencode-go", "ai-gateway", "alibaba", "huggingface"):
+    elif selected_provider in ("zai", "minimax", "minimax-cn", "kilocode", "opencode-zen", "opencode-go", "ai-gateway", "alibaba"):
        _model_flow_api_key_provider(config, selected_provider, current_model)


@@ -1504,18 +1502,6 @@ _PROVIDER_MODELS = {
        "google/gemini-3-pro-preview",
        "google/gemini-3-flash-preview",
    ],
-    # Curated HF model list — only agentic models that map to OpenRouter defaults.
-    # Format: HF model ID → OpenRouter equivalent noted in comment
-    "huggingface": [
-        "Qwen/Qwen3.5-397B-A17B",                  # ↔ qwen/qwen3.5-plus
-        "Qwen/Qwen3.5-35B-A3B",                     # ↔ qwen/qwen3.5-35b-a3b
-        "deepseek-ai/DeepSeek-V3.2",                # ↔ deepseek/deepseek-chat
-        "moonshotai/Kimi-K2.5",                      # ↔ moonshotai/kimi-k2.5
-        "MiniMaxAI/MiniMax-M2.5",                    # ↔ minimax/minimax-m2.5
-        "zai-org/GLM-5",                             # ↔ z-ai/glm-5
-        "XiaomiMiMo/MiMo-V2-Flash",                 # ↔ xiaomi/mimo-v2-pro
-        "moonshotai/Kimi-K2-Thinking",               # ↔ moonshotai/kimi-k2-thinking
-    ],
 }


@@ -2045,25 +2031,19 @@ def _model_flow_api_key_provider(config, provider_id, current_model=""):
        save_env_value(base_url_env, override)
        effective_base = override

-    # Model selection — try live /models endpoint first, fall back to defaults.
-    # Providers with large live catalogs (100+ models) use a curated list instead
-    # so users see familiar model names rather than an overwhelming dump.
-    curated = _PROVIDER_MODELS.get(provider_id, [])
-    if curated and len(curated) >= 8:
-        # Curated list is substantial — use it directly, skip live probe
-        live_models = None
-    else:
-        from hermes_cli.models import fetch_api_models
-        api_key_for_probe = existing_key or (get_env_value(key_env) if key_env else "")
-        live_models = fetch_api_models(api_key_for_probe, effective_base)
+    # Model selection — try live /models endpoint first, fall back to defaults
+    from hermes_cli.models import fetch_api_models
+    api_key_for_probe = existing_key or (get_env_value(key_env) if key_env else "")
+    live_models = fetch_api_models(api_key_for_probe, effective_base)

    if live_models:
        model_list = live_models
        print(f"  Found {len(model_list)} model(s) from {pconfig.name} API")
    else:
-        model_list = curated
+        model_list = _PROVIDER_MODELS.get(provider_id, [])
        if model_list:
-            print(f"  Showing {len(model_list)} curated models — use \"Enter custom model name\" for others.")
+            print("  ⚠ Could not auto-detect models from API — showing defaults.")
+            print("    Use \"Enter custom model name\" if you don't see your model.")
        # else: no defaults either, will fall through to raw input

    if model_list:
@@ -2339,12 +2319,6 @@ def cmd_cron(args):
    cron_command(args)


-def cmd_webhook(args):
-    """Webhook subscription management."""
-    from hermes_cli.webhook import webhook_command
-    webhook_command(args)
-
-
 def cmd_doctor(args):
    """Check configuration and dependencies."""
    from hermes_cli.doctor import run_doctor
@@ -2476,18 +2450,8 @@ def _update_via_zip(args):
            )
    else:
        # Use sys.executable to explicitly call the venv's pip module,
-        # avoiding PEP 668 'externally-managed-environment' errors on Debian/Ubuntu.
-        # Some environments lose pip inside the venv; bootstrap it back with
-        # ensurepip before trying the editable install.
+        # avoiding PEP 668 'externally-managed-environment' errors on Debian/Ubuntu
        pip_cmd = [sys.executable, "-m", "pip"]
-        try:
-            subprocess.run(pip_cmd + ["--version"], cwd=PROJECT_ROOT, check=True, capture_output=True)
-        except subprocess.CalledProcessError:
-            subprocess.run(
-                [sys.executable, "-m", "ensurepip", "--upgrade", "--default-pip"],
-                cwd=PROJECT_ROOT,
-                check=True,
-            )
        try:
            subprocess.run(pip_cmd + ["install", "-e", ".[all]", "--quiet"], cwd=PROJECT_ROOT, check=True)
        except subprocess.CalledProcessError:
@@ -2648,12 +2612,7 @@ def _restore_stashed_changes(
            print("Resolve conflicts manually, then run: git stash drop")

        print(f"Restore your changes with: git stash apply {stash_ref}")
-        # In non-interactive mode (gateway /update), don't abort — the code
-        # update itself succeeded, only the stash restore had conflicts.
-        # Aborting would report the entire update as failed.
-        if prompt_user:
-            sys.exit(1)
-        return False
+        sys.exit(1)

    stash_selector = _resolve_stash_selector(git_cmd, cwd, stash_ref)
    if stash_selector is None:
@@ -2727,60 +2686,30 @@ def cmd_update(args):

    # Fetch and pull
    try:
+        print("→ Fetching updates...")
        git_cmd = ["git"]
        if sys.platform == "win32":
            git_cmd = ["git", "-c", "windows.appendAtomically=false"]
-
-        print("→ Fetching updates...")
-        fetch_result = subprocess.run(
-            git_cmd + ["fetch", "origin"],
-            cwd=PROJECT_ROOT,
-            capture_output=True,
-            text=True,
-        )
-        if fetch_result.returncode != 0:
-            stderr = fetch_result.stderr.strip()
-            if "Could not resolve host" in stderr or "unable to access" in stderr:
-                print("✗ Network error — cannot reach the remote repository.")
-                print(f"  {stderr.splitlines()[0]}" if stderr else "")
-            elif "Authentication failed" in stderr or "could not read Username" in stderr:
-                print("✗ Authentication failed — check your git credentials or SSH key.")
-            else:
-                print(f"✗ Failed to fetch updates from origin.")
-                if stderr:
-                    print(f"  {stderr.splitlines()[0]}")
-            sys.exit(1)
-
-        # Get current branch (returns literal "HEAD" when detached)
+        
+        subprocess.run(git_cmd + ["fetch", "origin"], cwd=PROJECT_ROOT, check=True)
+        
+        # Get current branch
        result = subprocess.run(
            git_cmd + ["rev-parse", "--abbrev-ref", "HEAD"],
            cwd=PROJECT_ROOT,
            capture_output=True,
            text=True,
-            check=True,
+            check=True
        )
-        current_branch = result.stdout.strip()
+        branch = result.stdout.strip()

-        # Always update against main
-        branch = "main"
-
-        # If user is on a non-main branch or detached HEAD, switch to main
-        if current_branch != "main":
-            label = "detached HEAD" if current_branch == "HEAD" else f"branch '{current_branch}'"
-            print(f"  ⚠ Currently on {label} — switching to main for update...")
-            # Stash before checkout so uncommitted work isn't lost
-            auto_stash_ref = _stash_local_changes_if_needed(git_cmd, PROJECT_ROOT)
-            subprocess.run(
-                git_cmd + ["checkout", "main"],
-                cwd=PROJECT_ROOT,
-                capture_output=True,
-                text=True,
-                check=True,
-            )
-        else:
-            auto_stash_ref = _stash_local_changes_if_needed(git_cmd, PROJECT_ROOT)
-
-        prompt_for_restore = auto_stash_ref is not None and sys.stdin.isatty() and sys.stdout.isatty()
+        # Fall back to main if the current branch doesn't exist on the remote
+        verify = subprocess.run(
+            git_cmd + ["rev-parse", "--verify", f"origin/{branch}"],
+            cwd=PROJECT_ROOT, capture_output=True, text=True,
+        )
+        if verify.returncode != 0:
+            branch = "main"

        # Check if there are updates
        result = subprocess.run(
@@ -2788,69 +2717,31 @@ def cmd_update(args):
            cwd=PROJECT_ROOT,
            capture_output=True,
            text=True,
-            check=True,
+            check=True
        )
        commit_count = int(result.stdout.strip())
-
+        
        if commit_count == 0:
            _invalidate_update_cache()
-            # Restore stash and switch back to original branch if we moved
-            if auto_stash_ref is not None:
-                _restore_stashed_changes(
-                    git_cmd, PROJECT_ROOT, auto_stash_ref,
-                    prompt_user=prompt_for_restore,
-                )
-            if current_branch not in ("main", "HEAD"):
-                subprocess.run(
-                    git_cmd + ["checkout", current_branch],
-                    cwd=PROJECT_ROOT, capture_output=True, text=True, check=False,
-                )
            print("✓ Already up to date!")
            return
-
+        
        print(f"→ Found {commit_count} new commit(s)")

+        auto_stash_ref = _stash_local_changes_if_needed(git_cmd, PROJECT_ROOT)
+        prompt_for_restore = auto_stash_ref is not None and sys.stdin.isatty() and sys.stdout.isatty()
+
        print("→ Pulling updates...")
-        update_succeeded = False
        try:
-            pull_result = subprocess.run(
-                git_cmd + ["pull", "--ff-only", "origin", branch],
-                cwd=PROJECT_ROOT,
-                capture_output=True,
-                text=True,
-            )
-            if pull_result.returncode != 0:
-                # ff-only failed — local and remote have diverged (e.g. upstream
-                # force-pushed or rebase).  Since local changes are already
-                # stashed, reset to match the remote exactly.
-                print("  ⚠ Fast-forward not possible (history diverged), resetting to match remote...")
-                reset_result = subprocess.run(
-                    git_cmd + ["reset", "--hard", f"origin/{branch}"],
-                    cwd=PROJECT_ROOT,
-                    capture_output=True,
-                    text=True,
-                )
-                if reset_result.returncode != 0:
-                    print(f"✗ Failed to reset to origin/{branch}.")
-                    if reset_result.stderr.strip():
-                        print(f"  {reset_result.stderr.strip()}")
-                    print("  Try manually: git fetch origin && git reset --hard origin/main")
-                    sys.exit(1)
-            update_succeeded = True
+            subprocess.run(git_cmd + ["pull", "--ff-only", "origin", branch], cwd=PROJECT_ROOT, check=True)
        finally:
            if auto_stash_ref is not None:
-                # Don't attempt stash restore if the code update itself failed —
-                # working tree is in an unknown state.
-                if not update_succeeded:
-                    print(f"  ℹ️  Local changes preserved in stash (ref: {auto_stash_ref})")
-                    print(f"  Restore manually with: git stash apply")
-                else:
-                    _restore_stashed_changes(
-                        git_cmd,
-                        PROJECT_ROOT,
-                        auto_stash_ref,
-                        prompt_user=prompt_for_restore,
-                    )
+                _restore_stashed_changes(
+                    git_cmd,
+                    PROJECT_ROOT,
+                    auto_stash_ref,
+                    prompt_user=prompt_for_restore,
+                )
        
        _invalidate_update_cache()
        
@@ -2873,18 +2764,8 @@ def cmd_update(args):
                )
        else:
            # Use sys.executable to explicitly call the venv's pip module,
-            # avoiding PEP 668 'externally-managed-environment' errors on Debian/Ubuntu.
-            # Some environments lose pip inside the venv; bootstrap it back with
-            # ensurepip before trying the editable install.
+            # avoiding PEP 668 'externally-managed-environment' errors on Debian/Ubuntu
            pip_cmd = [sys.executable, "-m", "pip"]
-            try:
-                subprocess.run(pip_cmd + ["--version"], cwd=PROJECT_ROOT, check=True, capture_output=True)
-            except subprocess.CalledProcessError:
-                subprocess.run(
-                    [sys.executable, "-m", "ensurepip", "--upgrade", "--default-pip"],
-                    cwd=PROJECT_ROOT,
-                    check=True,
-                )
            try:
                subprocess.run(pip_cmd + ["install", "-e", ".[all]", "--quiet"], cwd=PROJECT_ROOT, check=True)
            except subprocess.CalledProcessError:
@@ -2943,15 +2824,10 @@ def cmd_update(args):
                print(f"  ℹ️  {len(missing_config)} new config option(s) available")
            
            print()
-            if not (sys.stdin.isatty() and sys.stdout.isatty()):
-                print("  ℹ Non-interactive session — skipping config migration prompt.")
-                print("    Run 'hermes config migrate' later to apply any new config/env options.")
-                response = "n"
+            if sys.stdin.isatty():
+                response = input("Would you like to configure them now? [Y/n]: ").strip().lower()
            else:
-                try:
-                    response = input("Would you like to configure them now? [Y/n]: ").strip().lower()
-                except EOFError:
-                    response = "n"
+                response = "n"
            
            if response in ('', 'y', 'yes'):
                print()
@@ -2999,11 +2875,10 @@ def cmd_update(args):
            # Check for macOS launchd service
            if is_macos():
                try:
-                    from hermes_cli.gateway import get_launchd_label
                    plist_path = get_launchd_plist_path()
                    if plist_path.exists():
                        check = subprocess.run(
-                            ["launchctl", "list", get_launchd_label()],
+                            ["launchctl", "list", "ai.hermes.gateway"],
                            capture_output=True, text=True, timeout=5,
                        )
                        has_launchd_service = check.returncode == 0
@@ -3059,13 +2934,12 @@ def cmd_update(args):
                    # after a manual SIGTERM, which would race with the
                    # PID file cleanup.
                    print("→ Restarting gateway service...")
-                    _launchd_label = get_launchd_label()
                    stop = subprocess.run(
-                        ["launchctl", "stop", _launchd_label],
+                        ["launchctl", "stop", "ai.hermes.gateway"],
                        capture_output=True, text=True, timeout=10,
                    )
                    start = subprocess.run(
-                        ["launchctl", "start", _launchd_label],
+                        ["launchctl", "start", "ai.hermes.gateway"],
                        capture_output=True, text=True, timeout=10,
                    )
                    if start.returncode == 0:
@@ -3248,7 +3122,7 @@ For more help on a command:
    )
    chat_parser.add_argument(
        "--provider",
-        choices=["auto", "openrouter", "nous", "openai-codex", "copilot-acp", "copilot", "anthropic", "huggingface", "zai", "kimi-coding", "minimax", "minimax-cn", "kilocode"],
+        choices=["auto", "openrouter", "nous", "openai-codex", "copilot-acp", "copilot", "anthropic", "zai", "kimi-coding", "minimax", "minimax-cn", "kilocode"],
        default=None,
        help="Inference provider (default: auto)"
    )
@@ -3549,38 +3423,7 @@ For more help on a command:
    cron_subparsers.add_parser("tick", help="Run due jobs once and exit")

    cron_parser.set_defaults(func=cmd_cron)
-
-    # =========================================================================
-    # webhook command
-    # =========================================================================
-    webhook_parser = subparsers.add_parser(
-        "webhook",
-        help="Manage dynamic webhook subscriptions",
-        description="Create, list, and remove webhook subscriptions for event-driven agent activation",
-    )
-    webhook_subparsers = webhook_parser.add_subparsers(dest="webhook_action")
-
-    wh_sub = webhook_subparsers.add_parser("subscribe", aliases=["add"], help="Create a webhook subscription")
-    wh_sub.add_argument("name", help="Route name (used in URL: /webhooks/<name>)")
-    wh_sub.add_argument("--prompt", default="", help="Prompt template with {dot.notation} payload refs")
-    wh_sub.add_argument("--events", default="", help="Comma-separated event types to accept")
-    wh_sub.add_argument("--description", default="", help="What this subscription does")
-    wh_sub.add_argument("--skills", default="", help="Comma-separated skill names to load")
-    wh_sub.add_argument("--deliver", default="log", help="Delivery target: log, telegram, discord, slack, etc.")
-    wh_sub.add_argument("--deliver-chat-id", default="", help="Target chat ID for cross-platform delivery")
-    wh_sub.add_argument("--secret", default="", help="HMAC secret (auto-generated if omitted)")
-
-    webhook_subparsers.add_parser("list", aliases=["ls"], help="List all dynamic subscriptions")
-
-    wh_rm = webhook_subparsers.add_parser("remove", aliases=["rm"], help="Remove a subscription")
-    wh_rm.add_argument("name", help="Subscription name to remove")
-
-    wh_test = webhook_subparsers.add_parser("test", help="Send a test POST to a webhook route")
-    wh_test.add_argument("name", help="Subscription name to test")
-    wh_test.add_argument("--payload", default="", help="JSON payload to send (default: test payload)")
-
-    webhook_parser.set_defaults(func=cmd_webhook)
-
+    
    # =========================================================================
    # doctor command
    # =========================================================================
@@ -3713,7 +3556,7 @@ For more help on a command:
    skills_snapshot = skills_subparsers.add_parser("snapshot", help="Export/import skill configurations")
    snapshot_subparsers = skills_snapshot.add_subparsers(dest="snapshot_action")
    snap_export = snapshot_subparsers.add_parser("export", help="Export installed skills to a file")
-    snap_export.add_argument("output", help="Output JSON file path (use - for stdout)")
+    snap_export.add_argument("output", help="Output JSON file path")
    snap_import = snapshot_subparsers.add_parser("import", help="Import and install skills from a file")
    snap_import.add_argument("input", help="Input JSON file path")
    snap_import.add_argument("--force", action="store_true", help="Force install despite caution verdict")
@@ -3990,7 +3833,7 @@ For more help on a command:
    sessions_list.add_argument("--limit", type=int, default=20, help="Max sessions to show")

    sessions_export = sessions_subparsers.add_parser("export", help="Export sessions to a JSONL file")
-    sessions_export.add_argument("output", help="Output JSONL file path (use - for stdout)")
+    sessions_export.add_argument("output", help="Output JSONL file path")
    sessions_export.add_argument("--source", help="Filter by source")
    sessions_export.add_argument("--session-id", help="Export a specific session")

@@ -4071,25 +3914,15 @@ For more help on a command:
                if not data:
                    print(f"Session '{args.session_id}' not found.")
                    return
-                line = _json.dumps(data, ensure_ascii=False) + "\n"
-                if args.output == "-":
-                    import sys
-                    sys.stdout.write(line)
-                else:
-                    with open(args.output, "w", encoding="utf-8") as f:
-                        f.write(line)
-                    print(f"Exported 1 session to {args.output}")
+                with open(args.output, "w", encoding="utf-8") as f:
+                    f.write(_json.dumps(data, ensure_ascii=False) + "\n")
+                print(f"Exported 1 session to {args.output}")
            else:
                sessions = db.export_all(source=args.source)
-                if args.output == "-":
-                    import sys
+                with open(args.output, "w", encoding="utf-8") as f:
                    for s in sessions:
-                        sys.stdout.write(_json.dumps(s, ensure_ascii=False) + "\n")
-                else:
-                    with open(args.output, "w", encoding="utf-8") as f:
-                        for s in sessions:
-                            f.write(_json.dumps(s, ensure_ascii=False) + "\n")
-                    print(f"Exported {len(sessions)} sessions to {args.output}")
+                        f.write(_json.dumps(s, ensure_ascii=False) + "\n")
+                print(f"Exported {len(sessions)} sessions to {args.output}")

        elif action == "delete":
            resolved_session_id = db.resolve_session_id(args.session_id)
@@ -208,31 +208,14 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
        "google/gemini-3-pro-preview",
        "google/gemini-3-flash-preview",
    ],
-    # Alibaba DashScope Coding platform (coding-intl) — default endpoint.
-    # Supports Qwen models + third-party providers (GLM, Kimi, MiniMax).
-    # Users with classic DashScope keys should override DASHSCOPE_BASE_URL
-    # to https://dashscope-intl.aliyuncs.com/compatible-mode/v1 (OpenAI-compat)
-    # or https://dashscope-intl.aliyuncs.com/apps/anthropic (Anthropic-compat).
    "alibaba": [
        "qwen3.5-plus",
+        "qwen3-max",
        "qwen3-coder-plus",
        "qwen3-coder-next",
-        # Third-party models available on coding-intl
-        "glm-5",
-        "glm-4.7",
-        "kimi-k2.5",
-        "MiniMax-M2.5",
-    ],
-    # Curated HF model list — only agentic models that map to OpenRouter defaults.
-    "huggingface": [
-        "Qwen/Qwen3.5-397B-A17B",
-        "Qwen/Qwen3.5-35B-A3B",
-        "deepseek-ai/DeepSeek-V3.2",
-        "moonshotai/Kimi-K2.5",
-        "MiniMaxAI/MiniMax-M2.5",
-        "zai-org/GLM-5",
-        "XiaomiMiMo/MiMo-V2-Flash",
-        "moonshotai/Kimi-K2-Thinking",
+        "qwen-plus-latest",
+        "qwen3.5-flash",
+        "qwen-vl-max",
    ],
 }

@@ -253,7 +236,6 @@ _PROVIDER_LABELS = {
    "ai-gateway": "AI Gateway",
    "kilocode": "Kilo Code",
    "alibaba": "Alibaba Cloud (DashScope)",
-    "huggingface": "Hugging Face",
    "custom": "Custom endpoint",
 }

@@ -289,9 +271,6 @@ _PROVIDER_ALIASES = {
    "aliyun": "alibaba",
    "qwen": "alibaba",
    "alibaba-cloud": "alibaba",
-    "hf": "huggingface",
-    "hugging-face": "huggingface",
-    "huggingface-hub": "huggingface",
 }


@@ -325,7 +304,7 @@ def list_available_providers() -> list[dict[str, str]]:
    # Canonical providers in display order
    _PROVIDER_ORDER = [
        "openrouter", "nous", "openai-codex", "copilot", "copilot-acp",
-        "huggingface", "zai", "kimi-coding", "minimax", "minimax-cn", "kilocode", "anthropic", "alibaba",
+        "zai", "kimi-coding", "minimax", "minimax-cn", "kilocode", "anthropic", "alibaba",
        "opencode-zen", "opencode-go",
        "ai-gateway", "deepseek", "custom",
    ]
@@ -385,23 +385,16 @@ class PluginManager:
    # Hook invocation
    # -----------------------------------------------------------------------

-    def invoke_hook(self, hook_name: str, **kwargs: Any) -> List[Any]:
+    def invoke_hook(self, hook_name: str, **kwargs: Any) -> None:
        """Call all registered callbacks for *hook_name*.

        Each callback is wrapped in its own try/except so a misbehaving
        plugin cannot break the core agent loop.
-
-        Returns a list of non-``None`` return values from callbacks.
-        This allows hooks like ``pre_llm_call`` to contribute context
-        that the agent core can collect and inject.
        """
        callbacks = self._hooks.get(hook_name, [])
-        results: List[Any] = []
        for cb in callbacks:
            try:
-                ret = cb(**kwargs)
-                if ret is not None:
-                    results.append(ret)
+                cb(**kwargs)
            except Exception as exc:
                logger.warning(
                    "Hook '%s' callback %s raised: %s",
@@ -409,7 +402,6 @@ class PluginManager:
                    getattr(cb, "__name__", repr(cb)),
                    exc,
                )
-        return results

    # -----------------------------------------------------------------------
    # Introspection
@@ -454,12 +446,9 @@ def discover_plugins() -> None:
    get_plugin_manager().discover_and_load()


-def invoke_hook(hook_name: str, **kwargs: Any) -> List[Any]:
-    """Invoke a lifecycle hook on all loaded plugins.
-
-    Returns a list of non-``None`` return values from plugin callbacks.
-    """
-    return get_plugin_manager().invoke_hook(hook_name, **kwargs)
+def invoke_hook(hook_name: str, **kwargs: Any) -> None:
+    """Invoke a lifecycle hook on all loaded plugins."""
+    get_plugin_manager().invoke_hook(hook_name, **kwargs)


 def get_plugin_tool_names() -> Set[str]:
@@ -63,11 +63,8 @@ def _get_model_config() -> Dict[str, Any]:
    model_cfg = config.get("model")
    if isinstance(model_cfg, dict):
        cfg = dict(model_cfg)
-        # Accept "model" as alias for "default" (users intuitively write model.model)
-        if not cfg.get("default") and cfg.get("model"):
-            cfg["default"] = cfg["model"]
-        default = (cfg.get("default") or "").strip()
-        base_url = (cfg.get("base_url") or "").strip()
+        default = cfg.get("default", "").strip()
+        base_url = cfg.get("base_url", "").strip()
        is_local = "localhost" in base_url or "127.0.0.1" in base_url
        is_fallback = not default or default == "anthropic/claude-opus-4.6"
        if is_local and is_fallback and base_url:
@@ -206,7 +203,7 @@ def _resolve_named_custom_runtime(
        or _detect_api_mode_for_url(base_url)
        or "chat_completions",
        "base_url": base_url,
-        "api_key": api_key or "no-key-required",
+        "api_key": api_key,
        "source": f"custom_provider:{custom_provider.get('name', requested_provider)}",
    }

@@ -410,6 +407,12 @@ def resolve_runtime_provider(
            # (e.g. https://api.minimax.io/anthropic, https://dashscope.../anthropic)
            elif base_url.rstrip("/").endswith("/anthropic"):
                api_mode = "anthropic_messages"
+            # MiniMax providers always use Anthropic Messages API.
+            # Auto-correct stale /v1 URLs (from old .env or config) to /anthropic.
+            elif provider in ("minimax", "minimax-cn"):
+                api_mode = "anthropic_messages"
+                if base_url.rstrip("/").endswith("/v1"):
+                    base_url = base_url.rstrip("/")[:-3] + "/anthropic"
        return {
            "provider": provider,
            "api_mode": api_mode,
@@ -80,11 +80,6 @@ _DEFAULT_PROVIDER_MODELS = {
    "minimax-cn": ["MiniMax-M2.7", "MiniMax-M2.7-highspeed", "MiniMax-M2.5", "MiniMax-M2.5-highspeed", "MiniMax-M2.1"],
    "ai-gateway": ["anthropic/claude-opus-4.6", "anthropic/claude-sonnet-4.6", "openai/gpt-5", "google/gemini-3-flash"],
    "kilocode": ["anthropic/claude-opus-4.6", "anthropic/claude-sonnet-4.6", "openai/gpt-5.4", "google/gemini-3-pro-preview", "google/gemini-3-flash-preview"],
-    "huggingface": [
-        "Qwen/Qwen3.5-397B-A17B", "Qwen/Qwen3-235B-A22B-Thinking-2507",
-        "Qwen/Qwen3-Coder-480B-A35B-Instruct", "deepseek-ai/DeepSeek-R1-0528",
-        "deepseek-ai/DeepSeek-V3.2", "moonshotai/Kimi-K2.5",
-    ],
 }


@@ -585,11 +580,11 @@ def _print_setup_summary(config: dict, hermes_home):
    else:
        tool_status.append(("Mixture of Agents", False, "OPENROUTER_API_KEY"))

-    # Web tools (Exa, Parallel, Firecrawl, or Tavily)
-    if get_env_value("EXA_API_KEY") or get_env_value("PARALLEL_API_KEY") or get_env_value("FIRECRAWL_API_KEY") or get_env_value("FIRECRAWL_API_URL") or get_env_value("TAVILY_API_KEY"):
+    # Web tools (Parallel, Firecrawl, or Tavily)
+    if get_env_value("PARALLEL_API_KEY") or get_env_value("FIRECRAWL_API_KEY") or get_env_value("FIRECRAWL_API_URL") or get_env_value("TAVILY_API_KEY"):
        tool_status.append(("Web Search & Extract", True, None))
    else:
-        tool_status.append(("Web Search & Extract", False, "EXA_API_KEY, PARALLEL_API_KEY, FIRECRAWL_API_KEY, or TAVILY_API_KEY"))
+        tool_status.append(("Web Search & Extract", False, "PARALLEL_API_KEY, FIRECRAWL_API_KEY, or TAVILY_API_KEY"))

    # Browser tools (local Chromium or Browserbase cloud)
    import shutil
@@ -889,7 +884,6 @@ def setup_model_provider(config: dict):
        "OpenCode Go (open models, $10/month subscription)",
        "GitHub Copilot (uses GITHUB_TOKEN or gh auth token)",
        "GitHub Copilot ACP (spawns `copilot --acp --stdio`)",
-        "Hugging Face Inference Providers (20+ open models)",
    ]
    if keep_label:
        provider_choices.append(keep_label)
@@ -1534,26 +1528,7 @@ def setup_model_provider(config: dict):
        _set_model_provider(config, "copilot-acp", pconfig.inference_base_url)
        selected_base_url = pconfig.inference_base_url

-    elif provider_idx == 16:  # Hugging Face Inference Providers
-        selected_provider = "huggingface"
-        print()
-        print_header("Hugging Face API Token")
-        pconfig = PROVIDER_REGISTRY["huggingface"]
-        print_info(f"Provider: {pconfig.name}")
-        print_info("Get your token at: https://huggingface.co/settings/tokens")
-        print_info("Required permission: 'Make calls to Inference Providers'")
-        print()
-
-        api_key = prompt("  HF Token", password=True)
-        if api_key:
-            save_env_value("HF_TOKEN", api_key)
-            # Clear OpenRouter env vars to prevent routing confusion
-            save_env_value("OPENAI_BASE_URL", "")
-            save_env_value("OPENAI_API_KEY", "")
-        _set_model_provider(config, "huggingface", pconfig.inference_base_url)
-        selected_base_url = pconfig.inference_base_url
-
-    # else: provider_idx == 17 (Keep current) — only shown when a provider already exists
+    # else: provider_idx == 16 (Keep current) — only shown when a provider already exists
    # Normalize "keep current" to an explicit provider so downstream logic
    # doesn't fall back to the generic OpenRouter/static-model path.
    if selected_provider is None:
@@ -2092,11 +2067,11 @@ def setup_terminal_backend(config: dict):
        print_info("Serverless cloud sandboxes. Each session gets its own container.")
        print_info("Requires a Modal account: https://modal.com")

-        # Check if modal SDK is installed
+        # Check if swe-rex[modal] is installed
        try:
-            __import__("modal")
+            __import__("swe_rex")
        except ImportError:
-            print_info("Installing modal SDK...")
+            print_info("Installing swe-rex[modal]...")
            import subprocess

            uv_bin = shutil.which("uv")
@@ -2108,22 +2083,22 @@ def setup_terminal_backend(config: dict):
                        "install",
                        "--python",
                        sys.executable,
-                        "modal",
+                        "swe-rex[modal]",
                    ],
                    capture_output=True,
                    text=True,
                )
            else:
                result = subprocess.run(
-                    [sys.executable, "-m", "pip", "install", "modal"],
+                    [sys.executable, "-m", "pip", "install", "swe-rex[modal]"],
                    capture_output=True,
                    text=True,
                )
            if result.returncode == 0:
-                print_success("modal SDK installed")
+                print_success("swe-rex[modal] installed")
            else:
                print_warning(
-                    "Install failed — run manually: pip install modal"
+                    "Install failed — run manually: pip install 'swe-rex[modal]'"
                )

        # Modal token
@@ -24,10 +24,6 @@ PLATFORMS = {
    "whatsapp": "📱 WhatsApp",
    "signal":   "📡 Signal",
    "email":    "📧 Email",
-    "homeassistant": "🏠 Home Assistant",
-    "mattermost": "💬 Mattermost",
-    "matrix":   "💬 Matrix",
-    "dingtalk": "💬 DingTalk",
 }

 # ─── Config Helpers ───────────────────────────────────────────────────────────
@@ -304,8 +304,7 @@ def do_browse(page: int = 1, page_size: int = 20, source: str = "all",


 def do_install(identifier: str, category: str = "", force: bool = False,
-               console: Optional[Console] = None, skip_confirm: bool = False,
-               invalidate_cache: bool = True) -> None:
+               console: Optional[Console] = None, skip_confirm: bool = False) -> None:
    """Fetch, quarantine, scan, confirm, and install a skill."""
    from tools.skills_hub import (
        GitHubAuth, create_source_router, ensure_hub_dirs,
@@ -418,17 +417,6 @@ def do_install(identifier: str, category: str = "", force: bool = False,
    c.print(f"[bold green]Installed:[/] {install_dir.relative_to(SKILLS_DIR)}")
    c.print(f"[dim]Files: {', '.join(bundle.files.keys())}[/]\n")

-    if invalidate_cache:
-        # Invalidate the skills prompt cache so the new skill appears immediately
-        try:
-            from agent.prompt_builder import clear_skills_system_prompt_cache
-            clear_skills_system_prompt_cache(clear_snapshot=True)
-        except Exception:
-            pass
-    else:
-        c.print("[dim]Skill will be available in your next session.[/]")
-        c.print("[dim]Use /reset to start a new session now, or --now to activate immediately (invalidates prompt cache).[/]\n")
-

 def do_inspect(identifier: str, console: Optional[Console] = None) -> None:
    """Preview a skill's SKILL.md content without installing."""
@@ -615,8 +603,7 @@ def do_audit(name: Optional[str] = None, console: Optional[Console] = None) -> N


 def do_uninstall(name: str, console: Optional[Console] = None,
-                 skip_confirm: bool = False,
-                 invalidate_cache: bool = True) -> None:
+                 skip_confirm: bool = False) -> None:
    """Remove a hub-installed skill with confirmation."""
    from tools.skills_hub import uninstall_skill

@@ -636,15 +623,6 @@ def do_uninstall(name: str, console: Optional[Console] = None,
    success, msg = uninstall_skill(name)
    if success:
        c.print(f"[bold green]{msg}[/]\n")
-        if invalidate_cache:
-            try:
-                from agent.prompt_builder import clear_skills_system_prompt_cache
-                clear_skills_system_prompt_cache(clear_snapshot=True)
-            except Exception:
-                pass
-        else:
-            c.print("[dim]Change will take effect in your next session.[/]")
-            c.print("[dim]Use /reset to start a new session now, or --now to apply immediately (invalidates prompt cache).[/]\n")
    else:
        c.print(f"[bold red]Error:[/] {msg}\n")

@@ -887,15 +865,10 @@ def do_snapshot_export(output_path: str, console: Optional[Console] = None) -> N
        "taps": tap_list,
    }

-    payload = json.dumps(snapshot, indent=2, ensure_ascii=False) + "\n"
-    if output_path == "-":
-        import sys
-        sys.stdout.write(payload)
-    else:
-        out = Path(output_path)
-        out.write_text(payload)
-        c.print(f"[bold green]Snapshot exported:[/] {out}")
-        c.print(f"[dim]{len(installed)} skill(s), {len(tap_list)} tap(s)[/]\n")
+    out = Path(output_path)
+    out.write_text(json.dumps(snapshot, indent=2, ensure_ascii=False) + "\n")
+    c.print(f"[bold green]Snapshot exported:[/] {out}")
+    c.print(f"[dim]{len(installed)} skill(s), {len(tap_list)} tap(s)[/]\n")


 def do_snapshot_import(input_path: str, force: bool = False,
@@ -1086,23 +1059,19 @@ def handle_skills_slash(cmd: str, console: Optional[Console] = None) -> None:

    elif action == "install":
        if not args:
-            c.print("[bold red]Usage:[/] /skills install <identifier> [--category <cat>] [--force] [--now]\n")
+            c.print("[bold red]Usage:[/] /skills install <identifier> [--category <cat>] [--force|--yes]\n")
            return
        identifier = args[0]
        category = ""
-        # Slash commands run inside prompt_toolkit where input() hangs.
-        # Always skip confirmation — the user typing the command is implicit consent.
-        skip_confirm = True
+        # --yes / -y bypasses confirmation prompt (needed in TUI mode)
+        # --force handles reinstall override
+        skip_confirm = any(flag in args for flag in ("--yes", "-y"))
        force = "--force" in args
-        # --now invalidates prompt cache immediately (costs more money).
-        # Default: defer to next session to preserve cache.
-        invalidate_cache = "--now" in args
        for i, a in enumerate(args):
            if a == "--category" and i + 1 < len(args):
                category = args[i + 1]
        do_install(identifier, category=category, force=force,
-                   skip_confirm=skip_confirm, invalidate_cache=invalidate_cache,
-                   console=c)
+                   skip_confirm=skip_confirm, console=c)

    elif action == "inspect":
        if not args:
@@ -1132,13 +1101,10 @@ def handle_skills_slash(cmd: str, console: Optional[Console] = None) -> None:

    elif action == "uninstall":
        if not args:
-            c.print("[bold red]Usage:[/] /skills uninstall <name> [--now]\n")
+            c.print("[bold red]Usage:[/] /skills uninstall <name> [--yes]\n")
            return
-        # Slash commands run inside prompt_toolkit where input() hangs.
-        skip_confirm = True
-        invalidate_cache = "--now" in args
-        do_uninstall(args[0], console=c, skip_confirm=skip_confirm,
-                     invalidate_cache=invalidate_cache)
+        skip_confirm = any(flag in args for flag in ("--yes", "-y"))
+        do_uninstall(args[0], console=c, skip_confirm=skip_confirm)

    elif action == "publish":
        if not args:
@@ -292,9 +292,8 @@ def show_status(args):
        print("  Manager:      systemd (user)")
        
    elif sys.platform == 'darwin':
-        from hermes_cli.gateway import get_launchd_label
        result = subprocess.run(
-            ["launchctl", "list", get_launchd_label()],
+            ["launchctl", "list", "ai.hermes.gateway"],
            capture_output=True,
            text=True
        )
@@ -108,8 +108,7 @@ def _get_effective_configurable_toolsets():
    """
    result = list(CONFIGURABLE_TOOLSETS)
    try:
-        from hermes_cli.plugins import discover_plugins, get_plugin_toolsets
-        discover_plugins()  # idempotent — ensures plugins are loaded
+        from hermes_cli.plugins import get_plugin_toolsets
        result.extend(get_plugin_toolsets())
    except Exception:
        pass
@@ -119,8 +118,7 @@ def _get_effective_configurable_toolsets():
 def _get_plugin_toolset_keys() -> set:
    """Return the set of toolset keys provided by plugins."""
    try:
-        from hermes_cli.plugins import discover_plugins, get_plugin_toolsets
-        discover_plugins()  # idempotent — ensures plugins are loaded
+        from hermes_cli.plugins import get_plugin_toolsets
        return {ts_key for ts_key, _, _ in get_plugin_toolsets()}
    except Exception:
        return set()
@@ -135,10 +133,8 @@ PLATFORMS = {
    "signal":   {"label": "📡 Signal",     "default_toolset": "hermes-signal"},
    "homeassistant": {"label": "🏠 Home Assistant", "default_toolset": "hermes-homeassistant"},
    "email":    {"label": "📧 Email",      "default_toolset": "hermes-email"},
-    "matrix":   {"label": "💬 Matrix",     "default_toolset": "hermes-matrix"},
    "dingtalk": {"label": "💬 DingTalk",   "default_toolset": "hermes-dingtalk"},
    "api_server": {"label": "🌐 API Server", "default_toolset": "hermes-api-server"},
-    "mattermost": {"label": "💬 Mattermost", "default_toolset": "hermes-mattermost"},
 }


@@ -190,14 +186,6 @@ TOOL_CATEGORIES = {
                    {"key": "FIRECRAWL_API_KEY", "prompt": "Firecrawl API key", "url": "https://firecrawl.dev"},
                ],
            },
-            {
-                "name": "Exa",
-                "tag": "AI-native search and contents",
-                "web_backend": "exa",
-                "env_vars": [
-                    {"key": "EXA_API_KEY", "prompt": "Exa API key", "url": "https://exa.ai"},
-                ],
-            },
            {
                "name": "Parallel",
                "tag": "AI-native search and extract",
@@ -1,256 +0,0 @@
-"""hermes webhook — manage dynamic webhook subscriptions from the CLI.
-
-Usage:
-    hermes webhook subscribe <name> [options]
-    hermes webhook list
-    hermes webhook remove <name>
-    hermes webhook test <name> [--payload '{"key": "value"}']
-
-Subscriptions persist to ~/.hermes/webhook_subscriptions.json and are
-hot-reloaded by the webhook adapter without a gateway restart.
-"""
-
-import json
-import os
-import re
-import secrets
-import time
-from pathlib import Path
-from typing import Dict, Optional
-
-
-_SUBSCRIPTIONS_FILENAME = "webhook_subscriptions.json"
-
-
-def _hermes_home() -> Path:
-    return Path(
-        os.getenv("HERMES_HOME", str(Path.home() / ".hermes"))
-    ).expanduser()
-
-
-def _subscriptions_path() -> Path:
-    return _hermes_home() / _SUBSCRIPTIONS_FILENAME
-
-
-def _load_subscriptions() -> Dict[str, dict]:
-    path = _subscriptions_path()
-    if not path.exists():
-        return {}
-    try:
-        data = json.loads(path.read_text(encoding="utf-8"))
-        return data if isinstance(data, dict) else {}
-    except Exception:
-        return {}
-
-
-def _save_subscriptions(subs: Dict[str, dict]) -> None:
-    path = _subscriptions_path()
-    path.parent.mkdir(parents=True, exist_ok=True)
-    tmp_path = path.with_suffix(".tmp")
-    tmp_path.write_text(
-        json.dumps(subs, indent=2, ensure_ascii=False),
-        encoding="utf-8",
-    )
-    os.replace(str(tmp_path), str(path))
-
-
-def _get_webhook_config() -> dict:
-    """Load webhook platform config. Returns {} if not configured."""
-    try:
-        from hermes_cli.config import load_config
-        cfg = load_config()
-        return cfg.get("platforms", {}).get("webhook", {})
-    except Exception:
-        return {}
-
-
-def _is_webhook_enabled() -> bool:
-    return bool(_get_webhook_config().get("enabled"))
-
-
-def _get_webhook_base_url() -> str:
-    wh = _get_webhook_config().get("extra", {})
-    host = wh.get("host", "0.0.0.0")
-    port = wh.get("port", 8644)
-    display_host = "localhost" if host == "0.0.0.0" else host
-    return f"http://{display_host}:{port}"
-
-
-_SETUP_HINT = """
-  Webhook platform is not enabled. To set it up:
-
-  1. Run the gateway setup wizard:
-     hermes gateway setup
-
-  2. Or manually add to ~/.hermes/config.yaml:
-     platforms:
-       webhook:
-         enabled: true
-         extra:
-           host: "0.0.0.0"
-           port: 8644
-           secret: "your-global-hmac-secret"
-
-  3. Or set environment variables in ~/.hermes/.env:
-     WEBHOOK_ENABLED=true
-     WEBHOOK_PORT=8644
-     WEBHOOK_SECRET=your-global-secret
-
-  Then start the gateway: hermes gateway run
-"""
-
-
-def _require_webhook_enabled() -> bool:
-    """Check webhook is enabled. Print setup guide and return False if not."""
-    if _is_webhook_enabled():
-        return True
-    print(_SETUP_HINT)
-    return False
-
-
-def webhook_command(args):
-    """Entry point for 'hermes webhook' subcommand."""
-    sub = getattr(args, "webhook_action", None)
-
-    if not sub:
-        print("Usage: hermes webhook {subscribe|list|remove|test}")
-        print("Run 'hermes webhook --help' for details.")
-        return
-
-    if not _require_webhook_enabled():
-        return
-
-    if sub in ("subscribe", "add"):
-        _cmd_subscribe(args)
-    elif sub in ("list", "ls"):
-        _cmd_list(args)
-    elif sub in ("remove", "rm"):
-        _cmd_remove(args)
-    elif sub == "test":
-        _cmd_test(args)
-
-
-def _cmd_subscribe(args):
-    name = args.name.strip().lower().replace(" ", "-")
-    if not re.match(r'^[a-z0-9][a-z0-9_-]*$', name):
-        print(f"Error: Invalid name '{name}'. Use lowercase alphanumeric with hyphens/underscores.")
-        return
-
-    subs = _load_subscriptions()
-    is_update = name in subs
-
-    secret = args.secret or secrets.token_urlsafe(32)
-    events = [e.strip() for e in args.events.split(",")] if args.events else []
-
-    route = {
-        "description": args.description or f"Agent-created subscription: {name}",
-        "events": events,
-        "secret": secret,
-        "prompt": args.prompt or "",
-        "skills": [s.strip() for s in args.skills.split(",")] if args.skills else [],
-        "deliver": args.deliver or "log",
-        "created_at": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
-    }
-
-    if args.deliver_chat_id:
-        route["deliver_extra"] = {"chat_id": args.deliver_chat_id}
-
-    subs[name] = route
-    _save_subscriptions(subs)
-
-    base_url = _get_webhook_base_url()
-    status = "Updated" if is_update else "Created"
-
-    print(f"\n  {status} webhook subscription: {name}")
-    print(f"  URL:    {base_url}/webhooks/{name}")
-    print(f"  Secret: {secret}")
-    if events:
-        print(f"  Events: {', '.join(events)}")
-    else:
-        print("  Events: (all)")
-    print(f"  Deliver: {route['deliver']}")
-    if route.get("prompt"):
-        prompt_preview = route["prompt"][:80] + ("..." if len(route["prompt"]) > 80 else "")
-        print(f"  Prompt: {prompt_preview}")
-    print(f"\n  Configure your service to POST to the URL above.")
-    print(f"  Use the secret for HMAC-SHA256 signature validation.")
-    print(f"  The gateway must be running to receive events (hermes gateway run).\n")
-
-
-def _cmd_list(args):
-    subs = _load_subscriptions()
-    if not subs:
-        print("  No dynamic webhook subscriptions.")
-        print("  Create one with: hermes webhook subscribe <name>")
-        return
-
-    base_url = _get_webhook_base_url()
-    print(f"\n  {len(subs)} webhook subscription(s):\n")
-    for name, route in subs.items():
-        events = ", ".join(route.get("events", [])) or "(all)"
-        deliver = route.get("deliver", "log")
-        desc = route.get("description", "")
-        print(f"  ◆ {name}")
-        if desc:
-            print(f"    {desc}")
-        print(f"    URL:     {base_url}/webhooks/{name}")
-        print(f"    Events:  {events}")
-        print(f"    Deliver: {deliver}")
-        print()
-
-
-def _cmd_remove(args):
-    name = args.name.strip().lower()
-    subs = _load_subscriptions()
-
-    if name not in subs:
-        print(f"  No subscription named '{name}'.")
-        print("  Note: Static routes from config.yaml cannot be removed here.")
-        return
-
-    del subs[name]
-    _save_subscriptions(subs)
-    print(f"  Removed webhook subscription: {name}")
-
-
-def _cmd_test(args):
-    """Send a test POST to a webhook route."""
-    name = args.name.strip().lower()
-    subs = _load_subscriptions()
-
-    if name not in subs:
-        print(f"  No subscription named '{name}'.")
-        return
-
-    route = subs[name]
-    secret = route.get("secret", "")
-    base_url = _get_webhook_base_url()
-    url = f"{base_url}/webhooks/{name}"
-
-    payload = args.payload or '{"test": true, "event_type": "test", "message": "Hello from hermes webhook test"}'
-
-    import hmac
-    import hashlib
-    sig = "sha256=" + hmac.new(
-        secret.encode(), payload.encode(), hashlib.sha256
-    ).hexdigest()
-
-    print(f"  Sending test POST to {url}")
-    try:
-        import urllib.request
-        req = urllib.request.Request(
-            url,
-            data=payload.encode(),
-            headers={
-                "Content-Type": "application/json",
-                "X-Hub-Signature-256": sig,
-                "X-GitHub-Event": "test",
-            },
-            method="POST",
-        )
-        with urllib.request.urlopen(req, timeout=10) as resp:
-            body = resp.read().decode()
-            print(f"  Response ({resp.status}): {body}")
-    except Exception as e:
-        print(f"  Error: {e}")
-        print("  Is the gateway running? (hermes gateway run)")
@@ -17,27 +17,6 @@ def get_hermes_home() -> Path:
    return Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))


-def get_hermes_dir(new_subpath: str, old_name: str) -> Path:
-    """Resolve a Hermes subdirectory with backward compatibility.
-
-    New installs get the consolidated layout (e.g. ``cache/images``).
-    Existing installs that already have the old path (e.g. ``image_cache``)
-    keep using it — no migration required.
-
-    Args:
-        new_subpath: Preferred path relative to HERMES_HOME (e.g. ``"cache/images"``).
-        old_name: Legacy path relative to HERMES_HOME (e.g. ``"image_cache"``).
-
-    Returns:
-        Absolute ``Path`` — old location if it exists on disk, otherwise the new one.
-    """
-    home = get_hermes_home()
-    old_path = home / old_name
-    if old_path.exists():
-        return old_path
-    return home / new_subpath
-
-
 VALID_REASONING_EFFORTS = ("xhigh", "high", "medium", "low", "minimal")


@@ -15,20 +15,15 @@ Key design decisions:
 """

 import json
-import logging
 import os
-import random
 import re
 import sqlite3
 import threading
 import time
 from pathlib import Path
 from hermes_constants import get_hermes_home
-from typing import Any, Callable, Dict, List, Optional, TypeVar
+from typing import Dict, Any, List, Optional

-logger = logging.getLogger(__name__)
-
-T = TypeVar("T")

 DEFAULT_DB_PATH = get_hermes_home() / "state.db"

@@ -121,38 +116,18 @@ class SessionDB:
    single writer via WAL mode). Each method opens its own cursor.
    """

-    # ── Write-contention tuning ──
-    # With multiple hermes processes (gateway + CLI sessions + worktree agents)
-    # all sharing one state.db, WAL write-lock contention causes visible TUI
-    # freezes.  SQLite's built-in busy handler uses a deterministic sleep
-    # schedule that causes convoy effects under high concurrency.
-    #
-    # Instead, we keep the SQLite timeout short (1s) and handle retries at the
-    # application level with random jitter, which naturally staggers competing
-    # writers and avoids the convoy.
-    _WRITE_MAX_RETRIES = 15
-    _WRITE_RETRY_MIN_S = 0.020   # 20ms
-    _WRITE_RETRY_MAX_S = 0.150   # 150ms
-    # Attempt a PASSIVE WAL checkpoint every N successful writes.
-    _CHECKPOINT_EVERY_N_WRITES = 50
-
    def __init__(self, db_path: Path = None):
        self.db_path = db_path or DEFAULT_DB_PATH
        self.db_path.parent.mkdir(parents=True, exist_ok=True)

        self._lock = threading.Lock()
-        self._write_count = 0
        self._conn = sqlite3.connect(
            str(self.db_path),
            check_same_thread=False,
-            # Short timeout — application-level retry with random jitter
-            # handles contention instead of sitting in SQLite's internal
-            # busy handler for up to 30s.
-            timeout=1.0,
-            # Autocommit mode: Python's default isolation_level="" auto-starts
-            # transactions on DML, which conflicts with our explicit
-            # BEGIN IMMEDIATE.  None = we manage transactions ourselves.
-            isolation_level=None,
+            # 30s gives the WAL writer (CLI or gateway) time to finish a batch
+            # flush before the concurrent reader/writer gives up.  10s was too
+            # short when the CLI is doing frequent memory flushes.
+            timeout=30.0,
        )
        self._conn.row_factory = sqlite3.Row
        self._conn.execute("PRAGMA journal_mode=WAL")
@@ -160,96 +135,6 @@ class SessionDB:

        self._init_schema()

-    # ── Core write helper ──
-
-    def _execute_write(self, fn: Callable[[sqlite3.Connection], T]) -> T:
-        """Execute a write transaction with BEGIN IMMEDIATE and jitter retry.
-
-        *fn* receives the connection and should perform INSERT/UPDATE/DELETE
-        statements.  The caller must NOT call ``commit()`` — that's handled
-        here after *fn* returns.
-
-        BEGIN IMMEDIATE acquires the WAL write lock at transaction start
-        (not at commit time), so lock contention surfaces immediately.
-        On ``database is locked``, we release the Python lock, sleep a
-        random 20-150ms, and retry — breaking the convoy pattern that
-        SQLite's built-in deterministic backoff creates.
-
-        Returns whatever *fn* returns.
-        """
-        last_err: Optional[Exception] = None
-        for attempt in range(self._WRITE_MAX_RETRIES):
-            try:
-                with self._lock:
-                    self._conn.execute("BEGIN IMMEDIATE")
-                    try:
-                        result = fn(self._conn)
-                        self._conn.commit()
-                    except BaseException:
-                        try:
-                            self._conn.rollback()
-                        except Exception:
-                            pass
-                        raise
-                # Success — periodic best-effort checkpoint.
-                self._write_count += 1
-                if self._write_count % self._CHECKPOINT_EVERY_N_WRITES == 0:
-                    self._try_wal_checkpoint()
-                return result
-            except sqlite3.OperationalError as exc:
-                err_msg = str(exc).lower()
-                if "locked" in err_msg or "busy" in err_msg:
-                    last_err = exc
-                    if attempt < self._WRITE_MAX_RETRIES - 1:
-                        jitter = random.uniform(
-                            self._WRITE_RETRY_MIN_S,
-                            self._WRITE_RETRY_MAX_S,
-                        )
-                        time.sleep(jitter)
-                        continue
-                # Non-lock error or retries exhausted — propagate.
-                raise
-        # Retries exhausted (shouldn't normally reach here).
-        raise last_err or sqlite3.OperationalError(
-            "database is locked after max retries"
-        )
-
-    def _try_wal_checkpoint(self) -> None:
-        """Best-effort PASSIVE WAL checkpoint.  Never blocks, never raises.
-
-        Flushes committed WAL frames back into the main DB file for any
-        frames that no other connection currently needs.  Keeps the WAL
-        from growing unbounded when many processes hold persistent
-        connections.
-        """
-        try:
-            with self._lock:
-                result = self._conn.execute(
-                    "PRAGMA wal_checkpoint(PASSIVE)"
-                ).fetchone()
-                if result and result[1] > 0:
-                    logger.debug(
-                        "WAL checkpoint: %d/%d pages checkpointed",
-                        result[2], result[1],
-                    )
-        except Exception:
-            pass  # Best effort — never fatal.
-
-    def close(self):
-        """Close the database connection.
-
-        Attempts a PASSIVE WAL checkpoint first so that exiting processes
-        help keep the WAL file from growing unbounded.
-        """
-        with self._lock:
-            if self._conn:
-                try:
-                    self._conn.execute("PRAGMA wal_checkpoint(PASSIVE)")
-                except Exception:
-                    pass
-                self._conn.close()
-                self._conn = None
-
    def _init_schema(self):
        """Create tables and FTS if they don't exist, run migrations."""
        cursor = self._conn.cursor()
@@ -371,8 +256,8 @@ class SessionDB:
        parent_session_id: str = None,
    ) -> str:
        """Create a new session record. Returns the session_id."""
-        def _do(conn):
-            conn.execute(
+        with self._lock:
+            self._conn.execute(
                """INSERT OR IGNORE INTO sessions (id, source, user_id, model, model_config,
                   system_prompt, parent_session_id, started_at)
                   VALUES (?, ?, ?, ?, ?, ?, ?, ?)""",
@@ -387,35 +272,35 @@ class SessionDB:
                    time.time(),
                ),
            )
-        self._execute_write(_do)
+            self._conn.commit()
        return session_id

    def end_session(self, session_id: str, end_reason: str) -> None:
        """Mark a session as ended."""
-        def _do(conn):
-            conn.execute(
+        with self._lock:
+            self._conn.execute(
                "UPDATE sessions SET ended_at = ?, end_reason = ? WHERE id = ?",
                (time.time(), end_reason, session_id),
            )
-        self._execute_write(_do)
+            self._conn.commit()

    def reopen_session(self, session_id: str) -> None:
        """Clear ended_at/end_reason so a session can be resumed."""
-        def _do(conn):
-            conn.execute(
+        with self._lock:
+            self._conn.execute(
                "UPDATE sessions SET ended_at = NULL, end_reason = NULL WHERE id = ?",
                (session_id,),
            )
-        self._execute_write(_do)
+            self._conn.commit()

    def update_system_prompt(self, session_id: str, system_prompt: str) -> None:
        """Store the full assembled system prompt snapshot."""
-        def _do(conn):
-            conn.execute(
+        with self._lock:
+            self._conn.execute(
                "UPDATE sessions SET system_prompt = ? WHERE id = ?",
                (system_prompt, session_id),
            )
-        self._execute_write(_do)
+            self._conn.commit()

    def update_token_counts(
        self,
@@ -485,27 +370,29 @@ class SessionDB:
                   billing_mode = COALESCE(billing_mode, ?),
                   model = COALESCE(model, ?)
                   WHERE id = ?"""
-        params = (
-            input_tokens,
-            output_tokens,
-            cache_read_tokens,
-            cache_write_tokens,
-            reasoning_tokens,
-            estimated_cost_usd,
-            actual_cost_usd,
-            actual_cost_usd,
-            cost_status,
-            cost_source,
-            pricing_version,
-            billing_provider,
-            billing_base_url,
-            billing_mode,
-            model,
-            session_id,
-        )
-        def _do(conn):
-            conn.execute(sql, params)
-        self._execute_write(_do)
+        with self._lock:
+            self._conn.execute(
+                sql,
+                (
+                    input_tokens,
+                    output_tokens,
+                    cache_read_tokens,
+                    cache_write_tokens,
+                    reasoning_tokens,
+                    estimated_cost_usd,
+                    actual_cost_usd,
+                    actual_cost_usd,
+                    cost_status,
+                    cost_source,
+                    pricing_version,
+                    billing_provider,
+                    billing_base_url,
+                    billing_mode,
+                    model,
+                    session_id,
+                ),
+            )
+            self._conn.commit()

    def ensure_session(
        self,
@@ -519,14 +406,14 @@ class SessionDB:
        create_session() call (e.g. transient SQLite lock at agent startup).
        INSERT OR IGNORE is safe to call even when the row already exists.
        """
-        def _do(conn):
-            conn.execute(
+        with self._lock:
+            self._conn.execute(
                """INSERT OR IGNORE INTO sessions
                   (id, source, model, started_at)
                   VALUES (?, ?, ?, ?)""",
                (session_id, source, model, time.time()),
            )
-        self._execute_write(_do)
+            self._conn.commit()

    def set_token_counts(
        self,
@@ -552,8 +439,8 @@ class SessionDB:
        conversation run (e.g. the gateway, where the cached agent's
        session_prompt_tokens already reflects the running total).
        """
-        def _do(conn):
-            conn.execute(
+        with self._lock:
+            self._conn.execute(
                """UPDATE sessions SET
                   input_tokens = ?,
                   output_tokens = ?,
@@ -592,7 +479,7 @@ class SessionDB:
                    session_id,
                ),
            )
-        self._execute_write(_do)
+            self._conn.commit()

    def get_session(self, session_id: str) -> Optional[Dict[str, Any]]:
        """Get a session by ID."""
@@ -686,10 +573,10 @@ class SessionDB:
        Empty/whitespace-only strings are normalized to None (clearing the title).
        """
        title = self.sanitize_title(title)
-        def _do(conn):
+        with self._lock:
            if title:
                # Check uniqueness (allow the same session to keep its own title)
-                cursor = conn.execute(
+                cursor = self._conn.execute(
                    "SELECT id FROM sessions WHERE title = ? AND id != ?",
                    (title, session_id),
                )
@@ -698,12 +585,12 @@ class SessionDB:
                    raise ValueError(
                        f"Title '{title}' is already in use by session {conflict['id']}"
                    )
-            cursor = conn.execute(
+            cursor = self._conn.execute(
                "UPDATE sessions SET title = ? WHERE id = ?",
                (title, session_id),
            )
-            return cursor.rowcount
-        rowcount = self._execute_write(_do)
+            self._conn.commit()
+            rowcount = cursor.rowcount
        return rowcount > 0

    def get_session_title(self, session_id: str) -> Optional[str]:
@@ -875,24 +762,17 @@ class SessionDB:
        Also increments the session's message_count (and tool_call_count
        if role is 'tool' or tool_calls is present).
        """
-        # Serialize structured fields to JSON before entering the write txn
-        reasoning_details_json = (
-            json.dumps(reasoning_details)
-            if reasoning_details else None
-        )
-        codex_items_json = (
-            json.dumps(codex_reasoning_items)
-            if codex_reasoning_items else None
-        )
-        tool_calls_json = json.dumps(tool_calls) if tool_calls else None
-
-        # Pre-compute tool call count
-        num_tool_calls = 0
-        if tool_calls is not None:
-            num_tool_calls = len(tool_calls) if isinstance(tool_calls, list) else 1
-
-        def _do(conn):
-            cursor = conn.execute(
+        with self._lock:
+            # Serialize structured fields to JSON for storage
+            reasoning_details_json = (
+                json.dumps(reasoning_details)
+                if reasoning_details else None
+            )
+            codex_items_json = (
+                json.dumps(codex_reasoning_items)
+                if codex_reasoning_items else None
+            )
+            cursor = self._conn.execute(
                """INSERT INTO messages (session_id, role, content, tool_call_id,
                   tool_calls, tool_name, timestamp, token_count, finish_reason,
                   reasoning, reasoning_details, codex_reasoning_items)
@@ -902,7 +782,7 @@ class SessionDB:
                    role,
                    content,
                    tool_call_id,
-                    tool_calls_json,
+                    json.dumps(tool_calls) if tool_calls else None,
                    tool_name,
                    time.time(),
                    token_count,
@@ -915,20 +795,25 @@ class SessionDB:
            msg_id = cursor.lastrowid

            # Update counters
+            # Count actual tool calls from the tool_calls list (not from tool responses).
+            # A single assistant message can contain multiple parallel tool calls.
+            num_tool_calls = 0
+            if tool_calls is not None:
+                num_tool_calls = len(tool_calls) if isinstance(tool_calls, list) else 1
            if num_tool_calls > 0:
-                conn.execute(
+                self._conn.execute(
                    """UPDATE sessions SET message_count = message_count + 1,
                       tool_call_count = tool_call_count + ? WHERE id = ?""",
                    (num_tool_calls, session_id),
                )
            else:
-                conn.execute(
+                self._conn.execute(
                    "UPDATE sessions SET message_count = message_count + 1 WHERE id = ?",
                    (session_id,),
                )
-            return msg_id

-        return self._execute_write(_do)
+            self._conn.commit()
+        return msg_id

    def get_messages(self, session_id: str) -> List[Dict[str, Any]]:
        """Load all messages for a session, ordered by timestamp."""
@@ -1222,53 +1107,54 @@ class SessionDB:

    def clear_messages(self, session_id: str) -> None:
        """Delete all messages for a session and reset its counters."""
-        def _do(conn):
-            conn.execute(
+        with self._lock:
+            self._conn.execute(
                "DELETE FROM messages WHERE session_id = ?", (session_id,)
            )
-            conn.execute(
+            self._conn.execute(
                "UPDATE sessions SET message_count = 0, tool_call_count = 0 WHERE id = ?",
                (session_id,),
            )
-        self._execute_write(_do)
+            self._conn.commit()

    def delete_session(self, session_id: str) -> bool:
        """Delete a session and all its messages. Returns True if found."""
-        def _do(conn):
-            cursor = conn.execute(
+        with self._lock:
+            cursor = self._conn.execute(
                "SELECT COUNT(*) FROM sessions WHERE id = ?", (session_id,)
            )
            if cursor.fetchone()[0] == 0:
                return False
-            conn.execute("DELETE FROM messages WHERE session_id = ?", (session_id,))
-            conn.execute("DELETE FROM sessions WHERE id = ?", (session_id,))
+            self._conn.execute("DELETE FROM messages WHERE session_id = ?", (session_id,))
+            self._conn.execute("DELETE FROM sessions WHERE id = ?", (session_id,))
+            self._conn.commit()
            return True
-        return self._execute_write(_do)

    def prune_sessions(self, older_than_days: int = 90, source: str = None) -> int:
        """
        Delete sessions older than N days. Returns count of deleted sessions.
        Only prunes ended sessions (not active ones).
        """
-        cutoff = time.time() - (older_than_days * 86400)
+        import time as _time
+        cutoff = _time.time() - (older_than_days * 86400)

-        def _do(conn):
+        with self._lock:
            if source:
-                cursor = conn.execute(
+                cursor = self._conn.execute(
                    """SELECT id FROM sessions
                       WHERE started_at < ? AND ended_at IS NOT NULL AND source = ?""",
                    (cutoff, source),
                )
            else:
-                cursor = conn.execute(
+                cursor = self._conn.execute(
                    "SELECT id FROM sessions WHERE started_at < ? AND ended_at IS NOT NULL",
                    (cutoff,),
                )
            session_ids = [row["id"] for row in cursor.fetchall()]

            for sid in session_ids:
-                conn.execute("DELETE FROM messages WHERE session_id = ?", (sid,))
-                conn.execute("DELETE FROM sessions WHERE id = ?", (sid,))
-            return len(session_ids)
+                self._conn.execute("DELETE FROM messages WHERE session_id = ?", (sid,))
+                self._conn.execute("DELETE FROM sessions WHERE id = ?", (sid,))

-        return self._execute_write(_do)
+            self._conn.commit()
+        return len(session_ids)
@@ -270,7 +270,7 @@ def cmd_status(args) -> None:
            print(f"    {peer}: {mode}")
    print(f"  Write freq:     {hcfg.write_frequency}")

-    if hcfg.enabled and (hcfg.api_key or hcfg.base_url):
+    if hcfg.enabled and hcfg.api_key:
        print("\n  Connection... ", end="", flush=True)
        try:
            get_honcho_client(hcfg)
@@ -278,7 +278,7 @@ def cmd_status(args) -> None:
        except Exception as e:
            print(f"FAILED ({e})\n")
    else:
-        reason = "disabled" if not hcfg.enabled else "no API key or base URL"
+        reason = "disabled" if not hcfg.enabled else "no API key"
        print(f"\n  Not connected ({reason})\n")


@@ -417,18 +417,9 @@ def get_honcho_client(config: HonchoClientConfig | None = None) -> Honcho:
    else:
        logger.info("Initializing Honcho client (host: %s, workspace: %s)", config.host, config.workspace_id)

-    # Local Honcho instances don't require an API key, but the SDK
-    # expects a non-empty string.  Use a placeholder for local URLs.
-    _is_local = resolved_base_url and (
-        "localhost" in resolved_base_url
-        or "127.0.0.1" in resolved_base_url
-        or "::1" in resolved_base_url
-    )
-    effective_api_key = config.api_key or ("local" if _is_local else None)
-
    kwargs: dict = {
        "workspace_id": config.workspace_id,
-        "api_key": effective_api_key,
+        "api_key": config.api_key,
        "environment": config.environment,
    }
    if resolved_base_url:
@@ -10,12 +10,6 @@
 # container recreation. Environment variables are written to $HERMES_HOME/.env
 # and read by hermes at startup — no container recreation needed for env changes.
 #
-# Tool resolution: the hermes wrapper uses --suffix PATH for nix store tools,
-# so apt/uv-installed versions take priority. The container entrypoint provisions
-# extensible tools on first boot: nodejs/npm via apt, uv via curl, and a Python
-# 3.11 venv (bootstrapped entirely by uv) at ~/.venv with pip seeded. Agents get
-# writable tool prefixes for npm i -g, pip install, uv tool install, etc.
-#
 # Usage:
 #   services.hermes-agent = {
 #     enable = true;
@@ -111,52 +105,22 @@
      fi
      mkdir -p "$TARGET_HOME"
      chown "$HERMES_UID:$HERMES_GID" "$TARGET_HOME"
-      chmod 0750 "$TARGET_HOME"

      # Ensure HERMES_HOME is owned by the target user
      if [ -n "''${HERMES_HOME:-}" ] && [ -d "$HERMES_HOME" ]; then
        chown -R "$HERMES_UID:$HERMES_GID" "$HERMES_HOME"
      fi

-      # ── Provision apt packages (first boot only, cached in writable layer) ──
-      # sudo: agent self-modification
-      # nodejs/npm: writable node so npm i -g works (nix store copies are read-only)
-      # curl: needed for uv installer
-      if [ ! -f /var/lib/hermes-tools-provisioned ] && command -v apt-get >/dev/null 2>&1; then
-        echo "First boot: provisioning agent tools..."
-        apt-get update -qq
-        apt-get install -y -qq sudo nodejs npm curl
-        touch /var/lib/hermes-tools-provisioned
+      # Install sudo on Debian/Ubuntu if missing (first boot only, cached in writable layer)
+      if command -v apt-get >/dev/null 2>&1 && ! command -v sudo >/dev/null 2>&1; then
+        apt-get update -qq >/dev/null 2>&1 && apt-get install -y -qq sudo >/dev/null 2>&1 || true
      fi
-
      if command -v sudo >/dev/null 2>&1 && [ ! -f /etc/sudoers.d/hermes ]; then
        mkdir -p /etc/sudoers.d
        echo "$TARGET_USER ALL=(ALL) NOPASSWD:ALL" > /etc/sudoers.d/hermes
        chmod 0440 /etc/sudoers.d/hermes
      fi

-      # uv (Python manager) — not in Ubuntu repos, retry-safe outside the sentinel
-      if ! command -v uv >/dev/null 2>&1 && [ ! -x "$TARGET_HOME/.local/bin/uv" ] && command -v curl >/dev/null 2>&1; then
-        su -s /bin/sh "$TARGET_USER" -c 'curl -LsSf https://astral.sh/uv/install.sh | sh' || true
-      fi
-
-      # Python 3.11 venv — gives the agent a writable Python with pip.
-      # Uses uv to install Python 3.11 (Ubuntu 24.04 ships 3.12).
-      # --seed includes pip/setuptools so bare `pip install` works.
-      _UV_BIN="$TARGET_HOME/.local/bin/uv"
-      if [ ! -d "$TARGET_HOME/.venv" ] && [ -x "$_UV_BIN" ]; then
-        su -s /bin/sh "$TARGET_USER" -c "
-          export PATH=\"\$HOME/.local/bin:\$PATH\"
-          uv python install 3.11
-          uv venv --python 3.11 --seed \"\$HOME/.venv\"
-        " || true
-      fi
-
-      # Put the agent venv first on PATH so python/pip resolve to writable copies
-      if [ -d "$TARGET_HOME/.venv/bin" ]; then
-        export PATH="$TARGET_HOME/.venv/bin:$PATH"
-      fi
-
      if command -v setpriv >/dev/null 2>&1; then
        exec setpriv --reuid="$HERMES_UID" --regid="$HERMES_GID" --init-groups "$@"
      elif command -v su >/dev/null 2>&1; then
@@ -552,8 +516,8 @@
      # ── Directories ───────────────────────────────────────────────────
      {
        systemd.tmpfiles.rules = [
-          "d ${cfg.stateDir}                0750 ${cfg.user} ${cfg.group} - -"
-          "d ${cfg.stateDir}/.hermes        0750 ${cfg.user} ${cfg.group} - -"
+          "d ${cfg.stateDir}                0755 ${cfg.user} ${cfg.group} - -"
+          "d ${cfg.stateDir}/.hermes        0755 ${cfg.user} ${cfg.group} - -"
          "d ${cfg.stateDir}/home           0750 ${cfg.user} ${cfg.group} - -"
          "d ${cfg.workingDirectory}         0750 ${cfg.user} ${cfg.group} - -"
        ];
@@ -567,23 +531,21 @@
          mkdir -p ${cfg.stateDir}/home
          mkdir -p ${cfg.workingDirectory}
          chown ${cfg.user}:${cfg.group} ${cfg.stateDir} ${cfg.stateDir}/.hermes ${cfg.stateDir}/home ${cfg.workingDirectory}
-          chmod 0750 ${cfg.stateDir} ${cfg.stateDir}/.hermes ${cfg.stateDir}/home ${cfg.workingDirectory}

          # Merge Nix settings into existing config.yaml.
          # Preserves user-added keys (skills, streaming, etc.); Nix keys win.
          # If configFile is user-provided (not generated), overwrite instead of merge.
          ${if cfg.configFile != null then ''
-            install -o ${cfg.user} -g ${cfg.group} -m 0640 -D ${configFile} ${cfg.stateDir}/.hermes/config.yaml
+            install -o ${cfg.user} -g ${cfg.group} -m 0644 -D ${configFile} ${cfg.stateDir}/.hermes/config.yaml
          '' else ''
            ${configMergeScript} ${generatedConfigFile} ${cfg.stateDir}/.hermes/config.yaml
            chown ${cfg.user}:${cfg.group} ${cfg.stateDir}/.hermes/config.yaml
-            chmod 0640 ${cfg.stateDir}/.hermes/config.yaml
+            chmod 0644 ${cfg.stateDir}/.hermes/config.yaml
          ''}

          # Managed mode marker (so interactive shells also detect NixOS management)
          touch ${cfg.stateDir}/.hermes/.managed
          chown ${cfg.user}:${cfg.group} ${cfg.stateDir}/.hermes/.managed
-          chmod 0644 ${cfg.stateDir}/.hermes/.managed

          # Seed auth file if provided
          ${lib.optionalString (cfg.authFile != null) ''
@@ -615,7 +577,7 @@ HERMES_NIX_ENV_EOF

          # Link documents into workspace
          ${lib.concatStringsSep "\n" (lib.mapAttrsToList (name: _value: ''
-            install -o ${cfg.user} -g ${cfg.group} -m 0640 ${documentDerivation}/${name} ${cfg.workingDirectory}/${name}
+            install -o ${cfg.user} -g ${cfg.group} -m 0644 ${documentDerivation}/${name} ${cfg.workingDirectory}/${name}
          '') cfg.documents)}
        '';
      }
@@ -35,7 +35,7 @@

          ${pkgs.lib.concatMapStringsSep "\n" (name: ''
            makeWrapper ${hermesVenv}/bin/${name} $out/bin/${name} \
-              --suffix PATH : "${runtimePath}" \
+              --prefix PATH : "${runtimePath}" \
              --set HERMES_BUNDLED_SKILLS $out/share/hermes-agent/skills
          '') [ "hermes" "hermes-agent" "hermes-acp" ]}

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

 [project]
 name = "hermes-agent"
-version = "0.5.0"
+version = "0.4.0"
 description = "The self-improving AI agent — creates skills from experience, improves them during use, and runs anywhere"
 readme = "README.md"
 requires-python = ">=3.11"
@@ -26,7 +26,6 @@ dependencies = [
  # Interactive CLI (prompt_toolkit is used directly by cli.py)
  "prompt_toolkit>=3.0.52,<4",
  # Tools
-  "exa-py>=2.9.0,<3",
  "firecrawl-py>=4.16.0,<5",
  "parallel-web>=0.4.2,<1",
  "fal-client>=0.13.1,<1",
@@ -38,7 +37,7 @@ dependencies = [
 ]

 [project.optional-dependencies]
-modal = ["modal>=1.0.0,<2"]
+modal = ["swe-rex[modal]>=1.4.0,<2"]
 daytona = ["daytona>=0.148.0,<1"]
 dev = ["pytest>=9.0.2,<10", "pytest-asyncio>=1.3.0,<2", "pytest-xdist>=3.0,<4", "mcp>=1.2.0,<2"]
 messaging = ["python-telegram-bot>=22.6,<23", "discord.py[voice]>=2.7.1,<3", "aiohttp>=3.13.3,<4", "slack-bolt>=1.18.0,<2", "slack-sdk>=3.27.0,<4"]
@@ -62,12 +62,7 @@ else:


 # Import our tool system
-from model_tools import (
-    get_tool_definitions,
-    get_toolset_for_tool,
-    handle_function_call,
-    check_toolset_requirements,
-)
+from model_tools import get_tool_definitions, handle_function_call, check_toolset_requirements
 from tools.terminal_tool import cleanup_vm
 from tools.interrupt import set_interrupt as _set_interrupt
 from tools.browser_tool import cleanup_browser
@@ -88,7 +83,7 @@ from agent.model_metadata import (
 )
 from agent.context_compressor import ContextCompressor
 from agent.prompt_caching import apply_anthropic_cache_control
-from agent.prompt_builder import build_skills_system_prompt, build_context_files_prompt, load_soul_md, TOOL_USE_ENFORCEMENT_GUIDANCE, TOOL_USE_ENFORCEMENT_MODELS
+from agent.prompt_builder import build_skills_system_prompt, build_context_files_prompt, load_soul_md
 from agent.usage_pricing import estimate_usage_cost, normalize_usage
 from agent.display import (
    KawaiiSpinner, build_tool_preview as _build_tool_preview,
@@ -361,85 +356,6 @@ def _inject_honcho_turn_context(content, turn_context: str):
    return f"{text}\n\n{note}"


-# Budget warning text patterns injected by _get_budget_warning().
-_BUDGET_WARNING_RE = re.compile(
-    r"\[BUDGET(?:\s+WARNING)?:\s+Iteration\s+\d+/\d+\..*?\]",
-    re.DOTALL,
-)
-
-
-# Regex to match lone surrogate code points (U+D800..U+DFFF).
-# These are invalid in UTF-8 and cause UnicodeEncodeError when the OpenAI SDK
-# serialises messages to JSON.  Common source: clipboard paste from Google Docs
-# or other rich-text editors on some platforms.
-_SURROGATE_RE = re.compile(r'[\ud800-\udfff]')
-
-
-def _sanitize_surrogates(text: str) -> str:
-    """Replace lone surrogate code points with U+FFFD (replacement character).
-
-    Surrogates are invalid in UTF-8 and will crash ``json.dumps()`` inside the
-    OpenAI SDK.  This is a fast no-op when the text contains no surrogates.
-    """
-    if _SURROGATE_RE.search(text):
-        return _SURROGATE_RE.sub('\ufffd', text)
-    return text
-
-
-def _sanitize_messages_surrogates(messages: list) -> bool:
-    """Sanitize surrogate characters from all string content in a messages list.
-
-    Walks message dicts in-place.  Returns True if any surrogates were found
-    and replaced, False otherwise.
-    """
-    found = False
-    for msg in messages:
-        if not isinstance(msg, dict):
-            continue
-        content = msg.get("content")
-        if isinstance(content, str) and _SURROGATE_RE.search(content):
-            msg["content"] = _SURROGATE_RE.sub('\ufffd', content)
-            found = True
-        elif isinstance(content, list):
-            for part in content:
-                if isinstance(part, dict):
-                    text = part.get("text")
-                    if isinstance(text, str) and _SURROGATE_RE.search(text):
-                        part["text"] = _SURROGATE_RE.sub('\ufffd', text)
-                        found = True
-    return found
-
-
-def _strip_budget_warnings_from_history(messages: list) -> None:
-    """Remove budget pressure warnings from tool-result messages in-place.
-
-    Budget warnings are turn-scoped signals that must not leak into replayed
-    history.  They live in tool-result ``content`` either as a JSON key
-    (``_budget_warning``) or appended plain text.
-    """
-    for msg in messages:
-        if not isinstance(msg, dict) or msg.get("role") != "tool":
-            continue
-        content = msg.get("content")
-        if not isinstance(content, str) or "_budget_warning" not in content and "[BUDGET" not in content:
-            continue
-
-        # Try JSON first (the common case: _budget_warning key in a dict)
-        try:
-            parsed = json.loads(content)
-            if isinstance(parsed, dict) and "_budget_warning" in parsed:
-                del parsed["_budget_warning"]
-                msg["content"] = json.dumps(parsed, ensure_ascii=False)
-                continue
-        except (json.JSONDecodeError, TypeError):
-            pass
-
-        # Fallback: strip the text pattern from plain-text tool results
-        cleaned = _BUDGET_WARNING_RE.sub("", content).strip()
-        if cleaned != content:
-            msg["content"] = cleaned
-
-
 class AIAgent:
    """
    AI Agent with tool calling capabilities.
@@ -618,7 +534,6 @@ class AIAgent:
        self.tool_progress_callback = tool_progress_callback
        self.thinking_callback = thinking_callback
        self.reasoning_callback = reasoning_callback
-        self._reasoning_deltas_fired = False  # Set by _fire_reasoning_delta, reset per API call
        self.clarify_callback = clarify_callback
        self.step_callback = step_callback
        self.stream_delta_callback = stream_delta_callback
@@ -861,25 +776,6 @@ class AIAgent:
                    }
            
            self._client_kwargs = client_kwargs  # stored for rebuilding after interrupt
-
-            # Enable fine-grained tool streaming for Claude on OpenRouter.
-            # Without this, Anthropic buffers the entire tool call and goes
-            # silent for minutes while thinking — OpenRouter's upstream proxy
-            # times out during the silence.  The beta header makes Anthropic
-            # stream tool call arguments token-by-token, keeping the
-            # connection alive.
-            _effective_base = str(client_kwargs.get("base_url", "")).lower()
-            if "openrouter" in _effective_base and "claude" in (self.model or "").lower():
-                headers = client_kwargs.get("default_headers") or {}
-                existing_beta = headers.get("x-anthropic-beta", "")
-                _FINE_GRAINED = "fine-grained-tool-streaming-2025-05-14"
-                if _FINE_GRAINED not in existing_beta:
-                    if existing_beta:
-                        headers["x-anthropic-beta"] = f"{existing_beta},{_FINE_GRAINED}"
-                    else:
-                        headers["x-anthropic-beta"] = _FINE_GRAINED
-                    client_kwargs["default_headers"] = headers
-
            self.api_key = client_kwargs.get("api_key", "")
            try:
                self.client = self._create_openai_client(client_kwargs, reason="agent_init", shared=True)
@@ -1084,8 +980,8 @@ class AIAgent:
                    else:
                        if not hcfg.enabled:
                            logger.debug("Honcho disabled in global config")
-                        elif not (hcfg.api_key or hcfg.base_url):
-                            logger.debug("Honcho enabled but no API key or base URL configured")
+                        elif not hcfg.api_key:
+                            logger.debug("Honcho enabled but no API key configured")
                        else:
                            logger.debug("Honcho enabled but missing API key or disabled in config")
            except Exception as e:
@@ -1122,13 +1018,6 @@ class AIAgent:
        except Exception:
            pass

-        # Tool-use enforcement config: "auto" (default — matches hardcoded
-        # model list), true (always), false (never), or list of substrings.
-        _agent_section = _agent_cfg.get("agent", {})
-        if not isinstance(_agent_section, dict):
-            _agent_section = {}
-        self._tool_use_enforcement = _agent_section.get("tool_use_enforcement", "auto")
-
        # Initialize context compressor for automatic context management
        # Compresses conversation when approaching model's context limit
        # Configuration via config.yaml (compression section)
@@ -2292,14 +2181,8 @@ class AIAgent:
    # ── Honcho integration helpers ──

    def _honcho_should_activate(self, hcfg) -> bool:
-        """Return True when Honcho should be active.
-
-        Self-hosted Honcho may be configured with a base_url and no API key,
-        so activation should accept either credential style.
-        """
-        if not hcfg or not hcfg.enabled:
-            return False
-        if not (hcfg.api_key or hcfg.base_url):
+        """Return True when remote Honcho should be active."""
+        if not hcfg or not hcfg.enabled or not hcfg.api_key:
            return False
        return True

@@ -2565,30 +2448,6 @@ class AIAgent:
        if tool_guidance:
            prompt_parts.append(" ".join(tool_guidance))

-        # Tool-use enforcement: tells the model to actually call tools instead
-        # of describing intended actions.  Controlled by config.yaml
-        # agent.tool_use_enforcement:
-        #   "auto" (default) — matches TOOL_USE_ENFORCEMENT_MODELS
-        #   true  — always inject (all models)
-        #   false — never inject
-        #   list  — custom model-name substrings to match
-        if self.valid_tool_names:
-            _enforce = self._tool_use_enforcement
-            _inject = False
-            if _enforce is True or (isinstance(_enforce, str) and _enforce.lower() in ("true", "always", "yes", "on")):
-                _inject = True
-            elif _enforce is False or (isinstance(_enforce, str) and _enforce.lower() in ("false", "never", "no", "off")):
-                _inject = False
-            elif isinstance(_enforce, list):
-                model_lower = (self.model or "").lower()
-                _inject = any(p.lower() in model_lower for p in _enforce if isinstance(p, str))
-            else:
-                # "auto" or any unrecognised value — use hardcoded defaults
-                model_lower = (self.model or "").lower()
-                _inject = any(p in model_lower for p in TOOL_USE_ENFORCEMENT_MODELS)
-            if _inject:
-                prompt_parts.append(TOOL_USE_ENFORCEMENT_GUIDANCE)
-
        # Honcho CLI awareness: tell Hermes about its own management commands
        # so it can refer the user to them rather than reinventing answers.
        if self._honcho and self._honcho_session_key:
@@ -2661,13 +2520,7 @@ class AIAgent:

        has_skills_tools = any(name in self.valid_tool_names for name in ['skills_list', 'skill_view', 'skill_manage'])
        if has_skills_tools:
-            avail_toolsets = {
-                toolset
-                for toolset in (
-                    get_toolset_for_tool(tool_name) for tool_name in self.valid_tool_names
-                )
-                if toolset
-            }
+            avail_toolsets = {ts for ts, avail in check_toolset_requirements().items() if avail}
            skills_prompt = build_skills_system_prompt(
                available_tools=self.valid_tool_names,
                available_toolsets=avail_toolsets,
@@ -3551,7 +3404,6 @@ class AIAgent:
        max_stream_retries = 1
        has_tool_calls = False
        first_delta_fired = False
-        self._reasoning_deltas_fired = False
        for attempt in range(max_stream_retries + 1):
            try:
                with active_client.responses.stream(**api_kwargs) as stream:
@@ -3828,7 +3680,6 @@ class AIAgent:

    def _fire_reasoning_delta(self, text: str) -> None:
        """Fire reasoning callback if registered."""
-        self._reasoning_deltas_fired = True
        cb = self.reasoning_callback
        if cb is not None:
            try:
@@ -3907,7 +3758,7 @@ class AIAgent:
        def _call_chat_completions():
            """Stream a chat completions response."""
            import httpx as _httpx
-            _base_timeout = float(os.getenv("HERMES_API_TIMEOUT", 1800.0))
+            _base_timeout = float(os.getenv("HERMES_API_TIMEOUT", 900.0))
            _stream_read_timeout = float(os.getenv("HERMES_STREAM_READ_TIMEOUT", 60.0))
            stream_kwargs = {
                **api_kwargs,
@@ -3923,28 +3774,16 @@ class AIAgent:
            request_client_holder["client"] = self._create_request_openai_client(
                reason="chat_completion_stream_request"
            )
-            # Reset stale-stream timer so the detector measures from this
-            # attempt's start, not a previous attempt's last chunk.
-            last_chunk_time["t"] = time.time()
            stream = request_client_holder["client"].chat.completions.create(**stream_kwargs)

            content_parts: list = []
            tool_calls_acc: dict = {}
            tool_gen_notified: set = set()
-            # Ollama-compatible endpoints reuse index 0 for every tool call
-            # in a parallel batch, distinguishing them only by id.  Track
-            # the last seen id per raw index so we can detect a new tool
-            # call starting at the same index and redirect it to a fresh slot.
-            _last_id_at_idx: dict = {}      # raw_index -> last seen non-empty id
-            _active_slot_by_idx: dict = {}  # raw_index -> current slot in tool_calls_acc
            finish_reason = None
            model_name = None
            role = "assistant"
            reasoning_parts: list = []
            usage_obj = None
-            # Reset per-call reasoning tracking so _build_assistant_message
-            # knows whether reasoning was already displayed during streaming.
-            self._reasoning_deltas_fired = False

            for chunk in stream:
                last_chunk_time["t"] = time.time()
@@ -3978,45 +3817,11 @@ class AIAgent:
                        _fire_first_delta()
                        self._fire_stream_delta(delta.content)
                        deltas_were_sent["yes"] = True
-                    else:
-                        # Tool calls suppress regular content streaming (avoids
-                        # displaying chatty "I'll use the tool..." text alongside
-                        # tool calls).  But reasoning tags embedded in suppressed
-                        # content should still reach the display — otherwise the
-                        # reasoning box only appears as a post-response fallback,
-                        # rendering it confusingly after the already-streamed
-                        # response.  Route suppressed content through the stream
-                        # delta callback so its tag extraction can fire the
-                        # reasoning display.  Non-reasoning text is harmlessly
-                        # suppressed by the CLI's _stream_delta when the stream
-                        # box is already closed (tool boundary flush).
-                        if self.stream_delta_callback:
-                            try:
-                                self.stream_delta_callback(delta.content)
-                            except Exception:
-                                pass

                # Accumulate tool call deltas — notify display on first name
                if delta and delta.tool_calls:
                    for tc_delta in delta.tool_calls:
-                        raw_idx = tc_delta.index if tc_delta.index is not None else 0
-                        delta_id = tc_delta.id or ""
-
-                        # Ollama fix: detect a new tool call reusing the same
-                        # raw index (different id) and redirect to a fresh slot.
-                        if raw_idx not in _active_slot_by_idx:
-                            _active_slot_by_idx[raw_idx] = raw_idx
-                        if (
-                            delta_id
-                            and raw_idx in _last_id_at_idx
-                            and delta_id != _last_id_at_idx[raw_idx]
-                        ):
-                            new_slot = max(tool_calls_acc, default=-1) + 1
-                            _active_slot_by_idx[raw_idx] = new_slot
-                        if delta_id:
-                            _last_id_at_idx[raw_idx] = delta_id
-                        idx = _active_slot_by_idx[raw_idx]
-
+                        idx = tc_delta.index if tc_delta.index is not None else 0
                        if idx not in tool_calls_acc:
                            tool_calls_acc[idx] = {
                                "id": tc_delta.id or "",
@@ -4098,10 +3903,7 @@ class AIAgent:
            works unchanged.
            """
            has_tool_use = False
-            self._reasoning_deltas_fired = False

-            # Reset stale-stream timer for this attempt
-            last_chunk_time["t"] = time.time()
            # Use the Anthropic SDK's streaming context manager
            with self._anthropic_client.messages.stream(**api_kwargs) as stream:
                for event in stream:
@@ -4169,37 +3971,7 @@ class AIAgent:
                            e, (_httpx.ConnectError, _httpx.RemoteProtocolError, ConnectionError)
                        )

-                        # SSE error events from proxies (e.g. OpenRouter sends
-                        # {"error":{"message":"Network connection lost."}}) are
-                        # raised as APIError by the OpenAI SDK.  These are
-                        # semantically identical to httpx connection drops —
-                        # the upstream stream died — and should be retried with
-                        # a fresh connection.  Distinguish from HTTP errors:
-                        # APIError from SSE has no status_code, while
-                        # APIStatusError (4xx/5xx) always has one.
-                        _is_sse_conn_err = False
-                        if not _is_timeout and not _is_conn_err:
-                            from openai import APIError as _APIError
-                            if isinstance(e, _APIError) and not getattr(e, "status_code", None):
-                                _err_lower_sse = str(e).lower()
-                                _SSE_CONN_PHRASES = (
-                                    "connection lost",
-                                    "connection reset",
-                                    "connection closed",
-                                    "connection terminated",
-                                    "network error",
-                                    "network connection",
-                                    "terminated",
-                                    "peer closed",
-                                    "broken pipe",
-                                    "upstream connect error",
-                                )
-                                _is_sse_conn_err = any(
-                                    phrase in _err_lower_sse
-                                    for phrase in _SSE_CONN_PHRASES
-                                )
-
-                        if _is_timeout or _is_conn_err or _is_sse_conn_err:
+                        if _is_timeout or _is_conn_err:
                            # Transient network / timeout error. Retry the
                            # streaming request with a fresh connection first.
                            if _stream_attempt < _max_stream_retries:
@@ -4244,10 +4016,6 @@ class AIAgent:
                            )

                        try:
-                            # Reset stale timer — the non-streaming fallback
-                            # uses its own client; prevent the stale detector
-                            # from firing on stale timestamps from failed streams.
-                            last_chunk_time["t"] = time.time()
                            result["response"] = self._interruptible_api_call(api_kwargs)
                        except Exception as fallback_err:
                            result["error"] = fallback_err
@@ -4257,19 +4025,7 @@ class AIAgent:
                if request_client is not None:
                    self._close_request_openai_client(request_client, reason="stream_request_complete")

-        _stream_stale_timeout_base = float(os.getenv("HERMES_STREAM_STALE_TIMEOUT", 180.0))
-        # Scale the stale timeout for large contexts: slow models (like Opus)
-        # can legitimately think for minutes before producing the first token
-        # when the context is large.  Without this, the stale detector kills
-        # healthy connections during the model's thinking phase, producing
-        # spurious RemoteProtocolError ("peer closed connection").
-        _est_tokens = sum(len(str(v)) for v in api_kwargs.get("messages", [])) // 4
-        if _est_tokens > 100_000:
-            _stream_stale_timeout = max(_stream_stale_timeout_base, 300.0)
-        elif _est_tokens > 50_000:
-            _stream_stale_timeout = max(_stream_stale_timeout_base, 240.0)
-        else:
-            _stream_stale_timeout = _stream_stale_timeout_base
+        _stream_stale_timeout = float(os.getenv("HERMES_STREAM_STALE_TIMEOUT", 90.0))

        t = threading.Thread(target=_call, daemon=True)
        t.start()
@@ -4583,10 +4339,6 @@ class AIAgent:
        if self.api_mode == "anthropic_messages":
            from agent.anthropic_adapter import build_anthropic_kwargs
            anthropic_messages = self._prepare_anthropic_messages_for_api(api_messages)
-            # Pass context_length so the adapter can clamp max_tokens if the
-            # user configured a smaller context window than the model's output limit.
-            ctx_len = getattr(self, "context_compressor", None)
-            ctx_len = ctx_len.context_length if ctx_len else None
            return build_anthropic_kwargs(
                model=self.model,
                messages=anthropic_messages,
@@ -4595,7 +4347,6 @@ class AIAgent:
                reasoning_config=self.reasoning_config,
                is_oauth=self._is_anthropic_oauth,
                preserve_dots=self._anthropic_preserve_dots(),
-                context_length=ctx_len,
            )

        if self.api_mode == "codex_responses":
@@ -4707,25 +4458,11 @@ class AIAgent:
            "model": self.model,
            "messages": sanitized_messages,
            "tools": self.tools if self.tools else None,
-            "timeout": float(os.getenv("HERMES_API_TIMEOUT", 1800.0)),
+            "timeout": float(os.getenv("HERMES_API_TIMEOUT", 900.0)),
        }

        if self.max_tokens is not None:
            api_kwargs.update(self._max_tokens_param(self.max_tokens))
-        elif self._is_openrouter_url() and "claude" in (self.model or "").lower():
-            # OpenRouter translates requests to Anthropic's Messages API,
-            # which requires max_tokens as a mandatory field.  When we omit
-            # it, OpenRouter picks a default that can be too low — the model
-            # spends its output budget on thinking and has almost nothing
-            # left for the actual response (especially large tool calls like
-            # write_file).  Sending the model's real output limit ensures
-            # full capacity.  Other providers handle the default fine.
-            try:
-                from agent.anthropic_adapter import _get_anthropic_max_output
-                _model_output_limit = _get_anthropic_max_output(self.model)
-                api_kwargs["max_tokens"] = _model_output_limit
-            except Exception:
-                pass  # fail open — let OpenRouter pick its default

        extra_body = {}

@@ -4861,15 +4598,11 @@ class AIAgent:
            logging.debug(f"Captured reasoning ({len(reasoning_text)} chars): {reasoning_text}")

        if reasoning_text and self.reasoning_callback:
-            # Skip callback when streaming is active — reasoning was already
-            # displayed during the stream via one of two paths:
-            #   (a) _fire_reasoning_delta (structured reasoning_content deltas)
-            #   (b) _stream_delta tag extraction (<think>/<REASONING_SCRATCHPAD>)
-            # When streaming is NOT active, always fire so non-streaming modes
-            # (gateway, batch, quiet) still get reasoning.
-            # Any reasoning that wasn't shown during streaming is caught by the
-            # CLI post-response display fallback (cli.py _reasoning_shown_this_turn).
-            if not self.stream_delta_callback:
+            # Skip callback for <think>-extracted reasoning when streaming is active.
+            # _stream_delta() already displayed <think> blocks during streaming;
+            # firing the callback again would cause duplicate display.
+            # Structured reasoning (from reasoning_content field) always fires.
+            if _from_structured or not self.stream_delta_callback:
                try:
                    self.reasoning_callback(reasoning_text)
                except Exception:
@@ -6007,14 +5740,6 @@ class AIAgent:
        # Installed once, transparent when streams are healthy, prevents crash on write.
        _install_safe_stdio()

-        # Sanitize surrogate characters from user input.  Clipboard paste from
-        # rich-text editors (Google Docs, Word, etc.) can inject lone surrogates
-        # that are invalid UTF-8 and crash JSON serialization in the OpenAI SDK.
-        if isinstance(user_message, str):
-            user_message = _sanitize_surrogates(user_message)
-        if isinstance(persist_user_message, str):
-            persist_user_message = _sanitize_surrogates(persist_user_message)
-
        # Store stream callback for _interruptible_api_call to pick up
        self._stream_callback = stream_callback
        self._persist_user_message_idx = None
@@ -6031,7 +5756,6 @@ class AIAgent:
        self._codex_incomplete_retries = 0
        self._last_content_with_tools = None
        self._mute_post_response = False
-        self._surrogate_sanitized = False
        # NOTE: _turns_since_memory and _iters_since_skill are NOT reset here.
        # They are initialized in __init__ and must persist across run_conversation
        # calls so that nudge logic accumulates correctly in CLI mode.
@@ -6039,14 +5763,6 @@ class AIAgent:
        
        # Initialize conversation (copy to avoid mutating the caller's list)
        messages = list(conversation_history) if conversation_history else []
-
-        # Strip budget pressure warnings from previous turns.  These are
-        # turn-scoped signals injected by _get_budget_warning() into tool
-        # result content.  If left in the replayed history, models (especially
-        # GPT-family) interpret them as still-active instructions and avoid
-        # making tool calls in ALL subsequent turns.
-        if messages:
-            _strip_budget_warnings_from_history(messages)
        
        # Hydrate todo store from conversation history (gateway creates a fresh
        # AIAgent per message, so the in-memory store is empty -- we need to
@@ -6142,22 +5858,6 @@ class AIAgent:
                    self._cached_system_prompt = (
                        self._cached_system_prompt + "\n\n" + self._honcho_context
                    ).strip()
-
-                # Plugin hook: on_session_start
-                # Fired once when a brand-new session is created (not on
-                # continuation).  Plugins can use this to initialise
-                # session-scoped state (e.g. warm a memory cache).
-                try:
-                    from hermes_cli.plugins import invoke_hook as _invoke_hook
-                    _invoke_hook(
-                        "on_session_start",
-                        session_id=self.session_id,
-                        model=self.model,
-                        platform=getattr(self, "platform", None) or "",
-                    )
-                except Exception as exc:
-                    logger.warning("on_session_start hook failed: %s", exc)
-
                # Store the system prompt snapshot in SQLite
                if self._session_db:
                    try:
@@ -6219,34 +5919,6 @@ class AIAgent:
                    if _preflight_tokens < self.context_compressor.threshold_tokens:
                        break  # Under threshold

-        # Plugin hook: pre_llm_call
-        # Fired once per turn before the tool-calling loop.  Plugins can
-        # return a dict with a ``context`` key whose value is a string
-        # that will be appended to the ephemeral system prompt for every
-        # API call in this turn (not persisted to session DB or cache).
-        _plugin_turn_context = ""
-        try:
-            from hermes_cli.plugins import invoke_hook as _invoke_hook
-            _pre_results = _invoke_hook(
-                "pre_llm_call",
-                session_id=self.session_id,
-                user_message=original_user_message,
-                conversation_history=list(messages),
-                is_first_turn=(not bool(conversation_history)),
-                model=self.model,
-                platform=getattr(self, "platform", None) or "",
-            )
-            _ctx_parts = []
-            for r in _pre_results:
-                if isinstance(r, dict) and r.get("context"):
-                    _ctx_parts.append(str(r["context"]))
-                elif isinstance(r, str) and r.strip():
-                    _ctx_parts.append(r)
-            if _ctx_parts:
-                _plugin_turn_context = "\n\n".join(_ctx_parts)
-        except Exception as exc:
-            logger.warning("pre_llm_call hook failed: %s", exc)
-
        # Main conversation loop
        api_call_count = 0
        final_response = None
@@ -6344,9 +6016,6 @@ class AIAgent:
            effective_system = active_system_prompt or ""
            if self.ephemeral_system_prompt:
                effective_system = (effective_system + "\n\n" + self.ephemeral_system_prompt).strip()
-            # Plugin context from pre_llm_call hooks — ephemeral, not cached.
-            if _plugin_turn_context:
-                effective_system = (effective_system + "\n\n" + _plugin_turn_context).strip()
            if effective_system:
                api_messages = [{"role": "system", "content": effective_system}] + api_messages

@@ -6623,62 +6292,6 @@ class AIAgent:
                    if finish_reason == "length":
                        self._vprint(f"{self.log_prefix}⚠️  Response truncated (finish_reason='length') - model hit max output tokens", force=True)

-                        # ── Detect thinking-budget exhaustion ──────────────
-                        # When the model spends ALL output tokens on reasoning
-                        # and has none left for the response, continuation
-                        # retries are pointless.  Detect this early and give a
-                        # targeted error instead of wasting 3 API calls.
-                        _trunc_content = None
-                        if self.api_mode == "chat_completions":
-                            _trunc_msg = response.choices[0].message if (hasattr(response, "choices") and response.choices) else None
-                            _trunc_content = getattr(_trunc_msg, "content", None) if _trunc_msg else None
-                        elif self.api_mode == "anthropic_messages":
-                            # Anthropic response.content is a list of blocks
-                            _text_parts = []
-                            for _blk in getattr(response, "content", []):
-                                if getattr(_blk, "type", None) == "text":
-                                    _text_parts.append(getattr(_blk, "text", ""))
-                            _trunc_content = "\n".join(_text_parts) if _text_parts else None
-
-                        _thinking_exhausted = (
-                            _trunc_content is not None
-                            and not self._has_content_after_think_block(_trunc_content)
-                        ) or _trunc_content is None
-
-                        if _thinking_exhausted:
-                            _exhaust_error = (
-                                "Model used all output tokens on reasoning with none left "
-                                "for the response. Try lowering reasoning effort or "
-                                "increasing max_tokens."
-                            )
-                            self._vprint(
-                                f"{self.log_prefix}💭 Reasoning exhausted the output token budget — "
-                                f"no visible response was produced.",
-                                force=True,
-                            )
-                            # Return a user-friendly message as the response so
-                            # CLI (response box) and gateway (chat message) both
-                            # display it naturally instead of a suppressed error.
-                            _exhaust_response = (
-                                "⚠️ **Thinking Budget Exhausted**\n\n"
-                                "The model used all its output tokens on reasoning "
-                                "and had none left for the actual response.\n\n"
-                                "To fix this:\n"
-                                "→ Lower reasoning effort: `/thinkon low` or `/thinkon minimal`\n"
-                                "→ Increase the output token limit: "
-                                "set `model.max_tokens` in config.yaml"
-                            )
-                            self._cleanup_task_resources(effective_task_id)
-                            self._persist_session(messages, conversation_history)
-                            return {
-                                "final_response": _exhaust_response,
-                                "messages": messages,
-                                "api_calls": api_call_count,
-                                "completed": False,
-                                "partial": True,
-                                "error": _exhaust_error,
-                            }
-
                        if self.api_mode == "chat_completions":
                            assistant_message = response.choices[0].message
                            if not assistant_message.tool_calls:
@@ -6867,24 +6480,6 @@ class AIAgent:
                    if self.thinking_callback:
                        self.thinking_callback("")

-                    # -----------------------------------------------------------
-                    # Surrogate character recovery.  UnicodeEncodeError happens
-                    # when the messages contain lone surrogates (U+D800..U+DFFF)
-                    # that are invalid UTF-8.  Common source: clipboard paste
-                    # from Google Docs or similar rich-text editors.  We sanitize
-                    # the entire messages list in-place and retry once.
-                    # -----------------------------------------------------------
-                    if isinstance(api_error, UnicodeEncodeError) and not getattr(self, '_surrogate_sanitized', False):
-                        self._surrogate_sanitized = True
-                        if _sanitize_messages_surrogates(messages):
-                            self._vprint(
-                                f"{self.log_prefix}⚠️  Stripped invalid surrogate characters from messages. Retrying...",
-                                force=True,
-                            )
-                            continue
-                        # Surrogates weren't in messages — might be in system
-                        # prompt or prefill.  Fall through to normal error path.
-
                    status_code = getattr(api_error, "status_code", None)
                    if (
                        self.api_mode == "codex_responses"
@@ -7153,13 +6748,8 @@ class AIAgent:
                    # 529 (Anthropic overloaded) is also transient.
                    # Also catch local validation errors (ValueError, TypeError) — these
                    # are programming bugs, not transient failures.
-                    # Exclude UnicodeEncodeError — it's a ValueError subclass but is
-                    # handled separately by the surrogate sanitization path above.
                    _RETRYABLE_STATUS_CODES = {413, 429, 529}
-                    is_local_validation_error = (
-                        isinstance(api_error, (ValueError, TypeError))
-                        and not isinstance(api_error, UnicodeEncodeError)
-                    )
+                    is_local_validation_error = isinstance(api_error, (ValueError, TypeError))
                    # Detect generic 400s from Anthropic OAuth (transient server-side failures).
                    # Real invalid_request_error responses include a descriptive message;
                    # transient ones contain only "Error" or are empty. (ref: issue #1608)
@@ -7229,36 +6819,6 @@ class AIAgent:
                        _final_summary = self._summarize_api_error(api_error)
                        self._vprint(f"{self.log_prefix}❌ Max retries ({max_retries}) exceeded. Giving up.", force=True)
                        self._vprint(f"{self.log_prefix}   💀 Final error: {_final_summary}", force=True)
-
-                        # Detect SSE stream-drop pattern (e.g. "Network
-                        # connection lost") and surface actionable guidance.
-                        # This typically happens when the model generates a
-                        # very large tool call (write_file with huge content)
-                        # and the proxy/CDN drops the stream mid-response.
-                        _is_stream_drop = (
-                            not getattr(api_error, "status_code", None)
-                            and any(p in error_msg for p in (
-                                "connection lost", "connection reset",
-                                "connection closed", "network connection",
-                                "network error", "terminated",
-                            ))
-                        )
-                        if _is_stream_drop:
-                            self._vprint(
-                                f"{self.log_prefix}   💡 The provider's stream "
-                                f"connection keeps dropping. This often happens "
-                                f"when the model tries to write a very large "
-                                f"file in a single tool call.",
-                                force=True,
-                            )
-                            self._vprint(
-                                f"{self.log_prefix}      Try asking the model "
-                                f"to use execute_code with Python's open() for "
-                                f"large files, or to write the file in smaller "
-                                f"sections.",
-                                force=True,
-                            )
-
                        logging.error(
                            "%sAPI call failed after %s retries. %s | provider=%s model=%s msgs=%s tokens=~%s",
                            self.log_prefix, max_retries, _final_summary,
@@ -7268,18 +6828,8 @@ class AIAgent:
                            api_kwargs, reason="max_retries_exhausted", error=api_error,
                        )
                        self._persist_session(messages, conversation_history)
-                        _final_response = f"API call failed after {max_retries} retries: {_final_summary}"
-                        if _is_stream_drop:
-                            _final_response += (
-                                "\n\nThe provider's stream connection keeps "
-                                "dropping — this often happens when generating "
-                                "very large tool call responses (e.g. write_file "
-                                "with long content). Try asking me to use "
-                                "execute_code with Python's open() for large "
-                                "files, or to write in smaller sections."
-                            )
                        return {
-                            "final_response": _final_response,
+                            "final_response": f"API call failed after {max_retries} retries: {_final_summary}",
                            "messages": messages,
                            "api_calls": api_call_count,
                            "completed": False,
@@ -7657,6 +7207,7 @@ class AIAgent:
                        except Exception:
                            pass

+                    _msg_count_before_tools = len(messages)
                    self._execute_tool_calls(assistant_message, messages, effective_task_id, api_call_count)

                    # Signal that a paragraph break is needed before the next
@@ -7674,18 +7225,18 @@ class AIAgent:
                    if _tc_names == {"execute_code"}:
                        self.iteration_budget.refund()
                    
-                    # Use real token counts from the API response to decide
-                    # compression.  prompt_tokens + completion_tokens is the
-                    # actual context size the provider reported plus the
-                    # assistant turn — a tight lower bound for the next prompt.
-                    # Tool results appended above aren't counted yet, but the
-                    # threshold (default 50%) leaves ample headroom; if tool
-                    # results push past it, the next API call will report the
-                    # real total and trigger compression then.
+                    # Estimate next prompt size using real token counts from the
+                    # last API response + rough estimate of newly appended tool
+                    # results.  This catches cases where tool results push the
+                    # context past the limit that last_prompt_tokens alone misses
+                    # (e.g. large file reads, web extractions).
                    _compressor = self.context_compressor
-                    _real_tokens = (
+                    _new_tool_msgs = messages[_msg_count_before_tools:]
+                    _new_chars = sum(len(str(m.get("content", "") or "")) for m in _new_tool_msgs)
+                    _estimated_next_prompt = (
                        _compressor.last_prompt_tokens
                        + _compressor.last_completion_tokens
+                        + _new_chars // 3  # conservative: JSON-heavy tool results ≈ 3 chars/token
                    )

                    # ── Context pressure warnings (user-facing only) ──────────
@@ -7695,12 +7246,12 @@ class AIAgent:
                    # Does not inject into messages — just prints to CLI output
                    # and fires status_callback for gateway platforms.
                    if _compressor.threshold_tokens > 0:
-                        _compaction_progress = _real_tokens / _compressor.threshold_tokens
+                        _compaction_progress = _estimated_next_prompt / _compressor.threshold_tokens
                        if _compaction_progress >= 0.85 and not self._context_pressure_warned:
                            self._context_pressure_warned = True
                            self._emit_context_pressure(_compaction_progress, _compressor)

-                    if self.compression_enabled and _compressor.should_compress(_real_tokens):
+                    if self.compression_enabled and _compressor.should_compress(_estimated_next_prompt):
                        messages, active_system_prompt = self._compress_context(
                            messages, system_message,
                            approx_tokens=self.context_compressor.last_prompt_tokens,
@@ -7947,25 +7498,6 @@ class AIAgent:
            self._honcho_sync(original_user_message, final_response)
            self._queue_honcho_prefetch(original_user_message)

-        # Plugin hook: post_llm_call
-        # Fired once per turn after the tool-calling loop completes.
-        # Plugins can use this to persist conversation data (e.g. sync
-        # to an external memory system).
-        if final_response and not interrupted:
-            try:
-                from hermes_cli.plugins import invoke_hook as _invoke_hook
-                _invoke_hook(
-                    "post_llm_call",
-                    session_id=self.session_id,
-                    user_message=original_user_message,
-                    assistant_response=final_response,
-                    conversation_history=list(messages),
-                    model=self.model,
-                    platform=getattr(self, "platform", None) or "",
-                )
-            except Exception as exc:
-                logger.warning("post_llm_call hook failed: %s", exc)
-
        # Extract reasoning from the last assistant message (if any)
        last_reasoning = None
        for msg in reversed(messages):
@@ -8031,22 +7563,6 @@ class AIAgent:
            except Exception:
                pass  # Background review is best-effort

-        # Plugin hook: on_session_end
-        # Fired at the very end of every run_conversation call.
-        # Plugins can use this for cleanup, flushing buffers, etc.
-        try:
-            from hermes_cli.plugins import invoke_hook as _invoke_hook
-            _invoke_hook(
-                "on_session_end",
-                session_id=self.session_id,
-                completed=completed,
-                interrupted=interrupted,
-                model=self.model,
-                platform=getattr(self, "platform", None) or "",
-            )
-        except Exception as exc:
-            logger.warning("on_session_end hook failed: %s", exc)
-
        return result

    def chat(self, message: str, stream_callback: Optional[callable] = None) -> str:
@@ -2,7 +2,7 @@
 # Kill all running Modal apps (sandboxes, deployments, etc.)
 #
 # Usage:
-#   bash scripts/kill_modal.sh          # Stop hermes-agent sandboxes
+#   bash scripts/kill_modal.sh          # Stop swe-rex (the sandbox app)
 #   bash scripts/kill_modal.sh --all    # Stop ALL Modal apps

 set -uo pipefail
@@ -17,10 +17,10 @@ if [[ "${1:-}" == "--all" ]]; then
        modal app stop "$app_id" 2>/dev/null || true
    done
 else
-    echo "Stopping hermes-agent sandboxes..."
-    APPS=$(echo "$APP_LIST" | grep 'hermes-agent' | grep -oE 'ap-[A-Za-z0-9]+' || true)
+    echo "Stopping swe-rex sandboxes..."
+    APPS=$(echo "$APP_LIST" | grep 'swe-rex' | grep -oE 'ap-[A-Za-z0-9]+' || true)
    if [[ -z "$APPS" ]]; then
-        echo "  No hermes-agent apps found."
+        echo "  No swe-rex apps found."
    else
        echo "$APPS" | while read app_id; do
            echo "  Stopping $app_id"
@@ -30,5 +30,5 @@ else
 fi

 echo ""
-echo "Current hermes-agent status:"
-modal app list 2>/dev/null | grep -E 'State|hermes-agent' || echo "  (none)"
+echo "Current swe-rex status:"
+modal app list 2>/dev/null | grep -E 'State|swe-rex' || echo "  (none)"
@@ -1,180 +0,0 @@
---
-name: webhook-subscriptions
-description: Create and manage webhook subscriptions for event-driven agent activation. Use when the user wants external services to trigger agent runs automatically.
-version: 1.0.0
-metadata:
-  hermes:
-    tags: [webhook, events, automation, integrations]
---
-
-# Webhook Subscriptions
-
-Create dynamic webhook subscriptions so external services (GitHub, GitLab, Stripe, CI/CD, IoT sensors, monitoring tools) can trigger Hermes agent runs by POSTing events to a URL.
-
-## Setup (Required First)
-
-The webhook platform must be enabled before subscriptions can be created. Check with:
-```bash
-hermes webhook list
-```
-
-If it says "Webhook platform is not enabled", set it up:
-
-### Option 1: Setup wizard
-```bash
-hermes gateway setup
-```
-Follow the prompts to enable webhooks, set the port, and set a global HMAC secret.
-
-### Option 2: Manual config
-Add to `~/.hermes/config.yaml`:
-```yaml
-platforms:
-  webhook:
-    enabled: true
-    extra:
-      host: "0.0.0.0"
-      port: 8644
-      secret: "generate-a-strong-secret-here"
-```
-
-### Option 3: Environment variables
-Add to `~/.hermes/.env`:
-```bash
-WEBHOOK_ENABLED=true
-WEBHOOK_PORT=8644
-WEBHOOK_SECRET=generate-a-strong-secret-here
-```
-
-After configuration, start (or restart) the gateway:
-```bash
-hermes gateway run
-# Or if using systemd:
-systemctl --user restart hermes-gateway
-```
-
-Verify it's running:
-```bash
-curl http://localhost:8644/health
-```
-
-## Commands
-
-All management is via the `hermes webhook` CLI command:
-
-### Create a subscription
-```bash
-hermes webhook subscribe <name> \
-  --prompt "Prompt template with {payload.fields}" \
-  --events "event1,event2" \
-  --description "What this does" \
-  --skills "skill1,skill2" \
-  --deliver telegram \
-  --deliver-chat-id "12345" \
-  --secret "optional-custom-secret"
-```
-
-Returns the webhook URL and HMAC secret. The user configures their service to POST to that URL.
-
-### List subscriptions
-```bash
-hermes webhook list
-```
-
-### Remove a subscription
-```bash
-hermes webhook remove <name>
-```
-
-### Test a subscription
-```bash
-hermes webhook test <name>
-hermes webhook test <name> --payload '{"key": "value"}'
-```
-
-## Prompt Templates
-
-Prompts support `{dot.notation}` for accessing nested payload fields:
-
- `{issue.title}` — GitHub issue title
- `{pull_request.user.login}` — PR author
- `{data.object.amount}` — Stripe payment amount
- `{sensor.temperature}` — IoT sensor reading
-
-If no prompt is specified, the full JSON payload is dumped into the agent prompt.
-
-## Common Patterns
-
-### GitHub: new issues
-```bash
-hermes webhook subscribe github-issues \
-  --events "issues" \
-  --prompt "New GitHub issue #{issue.number}: {issue.title}\n\nAction: {action}\nAuthor: {issue.user.login}\nBody:\n{issue.body}\n\nPlease triage this issue." \
-  --deliver telegram \
-  --deliver-chat-id "-100123456789"
-```
-
-Then in GitHub repo Settings → Webhooks → Add webhook:
- Payload URL: the returned webhook_url
- Content type: application/json
- Secret: the returned secret
- Events: "Issues"
-
-### GitHub: PR reviews
-```bash
-hermes webhook subscribe github-prs \
-  --events "pull_request" \
-  --prompt "PR #{pull_request.number} {action}: {pull_request.title}\nBy: {pull_request.user.login}\nBranch: {pull_request.head.ref}\n\n{pull_request.body}" \
-  --skills "github-code-review" \
-  --deliver github_comment
-```
-
-### Stripe: payment events
-```bash
-hermes webhook subscribe stripe-payments \
-  --events "payment_intent.succeeded,payment_intent.payment_failed" \
-  --prompt "Payment {data.object.status}: {data.object.amount} cents from {data.object.receipt_email}" \
-  --deliver telegram \
-  --deliver-chat-id "-100123456789"
-```
-
-### CI/CD: build notifications
-```bash
-hermes webhook subscribe ci-builds \
-  --events "pipeline" \
-  --prompt "Build {object_attributes.status} on {project.name} branch {object_attributes.ref}\nCommit: {commit.message}" \
-  --deliver discord \
-  --deliver-chat-id "1234567890"
-```
-
-### Generic monitoring alert
-```bash
-hermes webhook subscribe alerts \
-  --prompt "Alert: {alert.name}\nSeverity: {alert.severity}\nMessage: {alert.message}\n\nPlease investigate and suggest remediation." \
-  --deliver origin
-```
-
-## Security
-
- Each subscription gets an auto-generated HMAC-SHA256 secret (or provide your own with `--secret`)
- The webhook adapter validates signatures on every incoming POST
- Static routes from config.yaml cannot be overwritten by dynamic subscriptions
- Subscriptions persist to `~/.hermes/webhook_subscriptions.json`
-
-## How It Works
-
-1. `hermes webhook subscribe` writes to `~/.hermes/webhook_subscriptions.json`
-2. The webhook adapter hot-reloads this file on each incoming request (mtime-gated, negligible overhead)
-3. When a POST arrives matching a route, the adapter formats the prompt and triggers an agent run
-4. The agent's response is delivered to the configured target (Telegram, Discord, GitHub comment, etc.)
-
-## Troubleshooting
-
-If webhooks aren't working:
-
-1. **Is the gateway running?** Check with `systemctl --user status hermes-gateway` or `ps aux | grep gateway`
-2. **Is the webhook server listening?** `curl http://localhost:8644/health` should return `{"status": "ok"}`
-3. **Check gateway logs:** `grep webhook ~/.hermes/logs/gateway.log | tail -20`
-4. **Signature mismatch?** Verify the secret in your service matches the one from `hermes webhook list`. GitHub sends `X-Hub-Signature-256`, GitLab sends `X-Gitlab-Token`.
-5. **Firewall/NAT?** The webhook URL must be reachable from the service. For local development, use a tunnel (ngrok, cloudflared).
-6. **Wrong event type?** Check `--events` filter matches what the service sends. Use `hermes webhook test <name>` to verify the route works.
@@ -219,9 +219,6 @@ if command -v gh &>/dev/null && gh auth status &>/dev/null; then
  echo "AUTH_METHOD=gh"
 elif [ -n "$GITHUB_TOKEN" ]; then
  echo "AUTH_METHOD=curl"
-elif [ -f ~/.hermes/.env ] && grep -q "^GITHUB_TOKEN=" ~/.hermes/.env; then
-  export GITHUB_TOKEN=$(grep "^GITHUB_TOKEN=" ~/.hermes/.env | head -1 | cut -d= -f2 | tr -d '\n\r')
-  echo "AUTH_METHOD=curl"
 elif grep -q "github.com" ~/.git-credentials 2>/dev/null; then
  export GITHUB_TOKEN=$(grep "github.com" ~/.git-credentials | head -1 | sed 's|https://[^:]*:\([^@]*\)@.*|\1|')
  echo "AUTH_METHOD=curl"
@@ -23,11 +23,6 @@ if command -v gh &>/dev/null && gh auth status &>/dev/null 2>&1; then
    GH_USER=$(gh api user --jq '.login' 2>/dev/null)
 elif [ -n "$GITHUB_TOKEN" ]; then
    GH_AUTH_METHOD="curl"
-elif [ -f "$HOME/.hermes/.env" ] && grep -q "^GITHUB_TOKEN=" "$HOME/.hermes/.env" 2>/dev/null; then
-    GITHUB_TOKEN=$(grep "^GITHUB_TOKEN=" "$HOME/.hermes/.env" | head -1 | cut -d= -f2 | tr -d '\n\r')
-    if [ -n "$GITHUB_TOKEN" ]; then
-        GH_AUTH_METHOD="curl"
-    fi
 elif [ -f "$HOME/.git-credentials" ] && grep -q "github.com" "$HOME/.git-credentials" 2>/dev/null; then
    GITHUB_TOKEN=$(grep "github.com" "$HOME/.git-credentials" | head -1 | sed 's|https://[^:]*:\([^@]*\)@.*|\1|')
    if [ -n "$GITHUB_TOKEN" ]; then
@@ -27,11 +27,7 @@ if command -v gh &>/dev/null && gh auth status &>/dev/null; then
 else
  AUTH="git"
  if [ -z "$GITHUB_TOKEN" ]; then
-    if [ -f ~/.hermes/.env ] && grep -q "^GITHUB_TOKEN=" ~/.hermes/.env; then
-      GITHUB_TOKEN=$(grep "^GITHUB_TOKEN=" ~/.hermes/.env | head -1 | cut -d= -f2 | tr -d '\n\r')
-    elif grep -q "github.com" ~/.git-credentials 2>/dev/null; then
-      GITHUB_TOKEN=$(grep "github.com" ~/.git-credentials 2>/dev/null | head -1 | sed 's|https://[^:]*:\([^@]*\)@.*|\1|')
-    fi
+    GITHUB_TOKEN=$(grep "github.com" ~/.git-credentials 2>/dev/null | head -1 | sed 's|https://[^:]*:\([^@]*\)@.*|\1|')
  fi
 fi

@@ -27,11 +27,7 @@ if command -v gh &>/dev/null && gh auth status &>/dev/null; then
 else
  AUTH="git"
  if [ -z "$GITHUB_TOKEN" ]; then
-    if [ -f ~/.hermes/.env ] && grep -q "^GITHUB_TOKEN=" ~/.hermes/.env; then
-      GITHUB_TOKEN=$(grep "^GITHUB_TOKEN=" ~/.hermes/.env | head -1 | cut -d= -f2 | tr -d '\n\r')
-    elif grep -q "github.com" ~/.git-credentials 2>/dev/null; then
-      GITHUB_TOKEN=$(grep "github.com" ~/.git-credentials 2>/dev/null | head -1 | sed 's|https://[^:]*:\([^@]*\)@.*|\1|')
-    fi
+    GITHUB_TOKEN=$(grep "github.com" ~/.git-credentials 2>/dev/null | head -1 | sed 's|https://[^:]*:\([^@]*\)@.*|\1|')
  fi
 fi

@@ -29,11 +29,7 @@ else
  AUTH="git"
  # Ensure we have a token for API calls
  if [ -z "$GITHUB_TOKEN" ]; then
-    if [ -f ~/.hermes/.env ] && grep -q "^GITHUB_TOKEN=" ~/.hermes/.env; then
-      GITHUB_TOKEN=$(grep "^GITHUB_TOKEN=" ~/.hermes/.env | head -1 | cut -d= -f2 | tr -d '\n\r')
-    elif grep -q "github.com" ~/.git-credentials 2>/dev/null; then
-      GITHUB_TOKEN=$(grep "github.com" ~/.git-credentials 2>/dev/null | head -1 | sed 's|https://[^:]*:\([^@]*\)@.*|\1|')
-    fi
+    GITHUB_TOKEN=$(grep "github.com" ~/.git-credentials 2>/dev/null | head -1 | sed 's|https://[^:]*:\([^@]*\)@.*|\1|')
  fi
 fi
 echo "Using: $AUTH"
@@ -26,11 +26,7 @@ if command -v gh &>/dev/null && gh auth status &>/dev/null; then
 else
  AUTH="git"
  if [ -z "$GITHUB_TOKEN" ]; then
-    if [ -f ~/.hermes/.env ] && grep -q "^GITHUB_TOKEN=" ~/.hermes/.env; then
-      GITHUB_TOKEN=$(grep "^GITHUB_TOKEN=" ~/.hermes/.env | head -1 | cut -d= -f2 | tr -d '\n\r')
-    elif grep -q "github.com" ~/.git-credentials 2>/dev/null; then
-      GITHUB_TOKEN=$(grep "github.com" ~/.git-credentials 2>/dev/null | head -1 | sed 's|https://[^:]*:\([^@]*\)@.*|\1|')
-    fi
+    GITHUB_TOKEN=$(grep "github.com" ~/.git-credentials 2>/dev/null | head -1 | sed 's|https://[^:]*:\([^@]*\)@.*|\1|')
  fi
 fi

@@ -11,7 +11,6 @@ from agent.auxiliary_client import (
    get_text_auxiliary_client,
    get_vision_auxiliary_client,
    get_available_vision_backends,
-    resolve_vision_provider_client,
    resolve_provider_client,
    auxiliary_max_tokens_param,
    _read_codex_access_token,
@@ -639,30 +638,6 @@ class TestVisionClientFallback:
        assert client.__class__.__name__ == "AnthropicAuxiliaryClient"
        assert model == "claude-haiku-4-5-20251001"

-    def test_selected_codex_provider_short_circuits_vision_auto(self, monkeypatch):
-        def fake_load_config():
-            return {"model": {"provider": "openai-codex", "default": "gpt-5.2-codex"}}
-
-        codex_client = MagicMock()
-        with (
-            patch("hermes_cli.config.load_config", fake_load_config),
-            patch("agent.auxiliary_client._try_codex", return_value=(codex_client, "gpt-5.2-codex")) as mock_codex,
-            patch("agent.auxiliary_client._try_openrouter") as mock_openrouter,
-            patch("agent.auxiliary_client._try_nous") as mock_nous,
-            patch("agent.auxiliary_client._try_anthropic") as mock_anthropic,
-            patch("agent.auxiliary_client._try_custom_endpoint") as mock_custom,
-        ):
-            provider, client, model = resolve_vision_provider_client()
-
-        assert provider == "openai-codex"
-        assert client is codex_client
-        assert model == "gpt-5.2-codex"
-        mock_codex.assert_called_once()
-        mock_openrouter.assert_not_called()
-        mock_nous.assert_not_called()
-        mock_anthropic.assert_not_called()
-        mock_custom.assert_not_called()
-
    def test_vision_auto_includes_codex(self, codex_auth_dir):
        """Codex supports vision (gpt-5.3-codex), so auto mode should use it."""
        with patch("agent.auxiliary_client._read_nous_auth", return_value=None), \
@@ -18,8 +18,6 @@ from agent.prompt_builder import (
    build_context_files_prompt,
    CONTEXT_FILE_MAX_CHARS,
    DEFAULT_AGENT_IDENTITY,
-    TOOL_USE_ENFORCEMENT_GUIDANCE,
-    TOOL_USE_ENFORCEMENT_MODELS,
    MEMORY_GUIDANCE,
    SESSION_SEARCH_GUIDANCE,
    PLATFORM_HINTS,
@@ -234,18 +232,7 @@ class TestPromptBuilderImports:
 # =========================================================================


-import pytest
-
-
 class TestBuildSkillsSystemPrompt:
-    @pytest.fixture(autouse=True)
-    def _clear_skills_cache(self):
-        """Ensure the in-process skills prompt cache doesn't leak between tests."""
-        from agent.prompt_builder import clear_skills_system_prompt_cache
-        clear_skills_system_prompt_cache(clear_snapshot=True)
-        yield
-        clear_skills_system_prompt_cache(clear_snapshot=True)
-
    def test_empty_when_no_skills_dir(self, monkeypatch, tmp_path):
        monkeypatch.setenv("HERMES_HOME", str(tmp_path))
        result = build_skills_system_prompt()
@@ -315,7 +302,7 @@ class TestBuildSkillsSystemPrompt:

        from unittest.mock import patch

-        with patch("agent.skill_utils.sys") as mock_sys:
+        with patch("tools.skills_tool.sys") as mock_sys:
            mock_sys.platform = "darwin"
            result = build_skills_system_prompt()

@@ -343,7 +330,7 @@ class TestBuildSkillsSystemPrompt:
        from unittest.mock import patch

        with patch(
-            "agent.prompt_builder.get_disabled_skill_names",
+            "tools.skills_tool._get_disabled_skill_names",
            return_value={"old-tool"},
        ):
            result = build_skills_system_prompt()
@@ -817,13 +804,6 @@ class TestSkillShouldShow:


 class TestBuildSkillsSystemPromptConditional:
-    @pytest.fixture(autouse=True)
-    def _clear_skills_cache(self):
-        from agent.prompt_builder import clear_skills_system_prompt_cache
-        clear_skills_system_prompt_cache(clear_snapshot=True)
-        yield
-        clear_skills_system_prompt_cache(clear_snapshot=True)
-
    def test_fallback_skill_hidden_when_primary_available(self, monkeypatch, tmp_path):
        monkeypatch.setenv("HERMES_HOME", str(tmp_path))
        skill_dir = tmp_path / "skills" / "search" / "duckduckgo"
@@ -928,98 +908,3 @@ class TestBuildSkillsSystemPromptConditional:
            available_toolsets=set(),
        )
        assert "nested-null" in result
-
-
-# =========================================================================
-# Tool-use enforcement guidance
-# =========================================================================
-
-
-class TestToolUseEnforcementGuidance:
-    def test_guidance_mentions_tool_calls(self):
-        assert "tool call" in TOOL_USE_ENFORCEMENT_GUIDANCE.lower()
-
-    def test_guidance_forbids_description_only(self):
-        assert "describe" in TOOL_USE_ENFORCEMENT_GUIDANCE.lower()
-        assert "promise" in TOOL_USE_ENFORCEMENT_GUIDANCE.lower()
-
-    def test_guidance_requires_action(self):
-        assert "MUST" in TOOL_USE_ENFORCEMENT_GUIDANCE
-
-    def test_enforcement_models_includes_gpt(self):
-        assert "gpt" in TOOL_USE_ENFORCEMENT_MODELS
-
-    def test_enforcement_models_includes_codex(self):
-        assert "codex" in TOOL_USE_ENFORCEMENT_MODELS
-
-    def test_enforcement_models_is_tuple(self):
-        assert isinstance(TOOL_USE_ENFORCEMENT_MODELS, tuple)
-
-
-# =========================================================================
-# Budget warning history stripping
-# =========================================================================
-
-
-class TestStripBudgetWarningsFromHistory:
-    def test_strips_json_budget_warning_key(self):
-        import json
-        from run_agent import _strip_budget_warnings_from_history
-
-        messages = [
-            {"role": "tool", "tool_call_id": "c1", "content": json.dumps({
-                "output": "hello",
-                "exit_code": 0,
-                "_budget_warning": "[BUDGET: Iteration 55/60. 5 iterations left. Start consolidating your work.]",
-            })},
-        ]
-        _strip_budget_warnings_from_history(messages)
-        parsed = json.loads(messages[0]["content"])
-        assert "_budget_warning" not in parsed
-        assert parsed["output"] == "hello"
-        assert parsed["exit_code"] == 0
-
-    def test_strips_text_budget_warning(self):
-        from run_agent import _strip_budget_warnings_from_history
-
-        messages = [
-            {"role": "tool", "tool_call_id": "c1",
-             "content": "some result\n\n[BUDGET WARNING: Iteration 58/60. Only 2 iteration(s) left. Provide your final response NOW. No more tool calls unless absolutely critical.]"},
-        ]
-        _strip_budget_warnings_from_history(messages)
-        assert messages[0]["content"] == "some result"
-
-    def test_leaves_non_tool_messages_unchanged(self):
-        from run_agent import _strip_budget_warnings_from_history
-
-        messages = [
-            {"role": "assistant", "content": "[BUDGET WARNING: Iteration 58/60. Only 2 iteration(s) left. Provide your final response NOW. No more tool calls unless absolutely critical.]"},
-            {"role": "user", "content": "hello"},
-        ]
-        original_contents = [m["content"] for m in messages]
-        _strip_budget_warnings_from_history(messages)
-        assert [m["content"] for m in messages] == original_contents
-
-    def test_handles_empty_and_missing_content(self):
-        from run_agent import _strip_budget_warnings_from_history
-
-        messages = [
-            {"role": "tool", "tool_call_id": "c1", "content": ""},
-            {"role": "tool", "tool_call_id": "c2"},
-        ]
-        _strip_budget_warnings_from_history(messages)
-        assert messages[0]["content"] == ""
-
-    def test_strips_caution_variant(self):
-        import json
-        from run_agent import _strip_budget_warnings_from_history
-
-        messages = [
-            {"role": "tool", "tool_call_id": "c1", "content": json.dumps({
-                "output": "ok",
-                "_budget_warning": "[BUDGET: Iteration 42/60. 18 iterations left. Start consolidating your work.]",
-            })},
-        ]
-        _strip_budget_warnings_from_history(messages)
-        parsed = json.loads(messages[0]["content"])
-        assert "_budget_warning" not in parsed
@@ -54,7 +54,7 @@ class TestScanSkillCommands:
        """macOS-only skills should not register slash commands on Linux."""
        with (
            patch("tools.skills_tool.SKILLS_DIR", tmp_path),
-            patch("agent.skill_utils.sys") as mock_sys,
+            patch("tools.skills_tool.sys") as mock_sys,
        ):
            mock_sys.platform = "linux"
            _make_skill(tmp_path, "imessage", frontmatter_extra="platforms: [macos]\n")
@@ -67,7 +67,7 @@ class TestScanSkillCommands:
        """macOS-only skills should register slash commands on macOS."""
        with (
            patch("tools.skills_tool.SKILLS_DIR", tmp_path),
-            patch("agent.skill_utils.sys") as mock_sys,
+            patch("tools.skills_tool.sys") as mock_sys,
        ):
            mock_sys.platform = "darwin"
            _make_skill(tmp_path, "imessage", frontmatter_extra="platforms: [macos]\n")
@@ -78,7 +78,7 @@ class TestScanSkillCommands:
        """Skills without platforms field should register on any platform."""
        with (
            patch("tools.skills_tool.SKILLS_DIR", tmp_path),
-            patch("agent.skill_utils.sys") as mock_sys,
+            patch("tools.skills_tool.sys") as mock_sys,
        ):
            mock_sys.platform = "win32"
            _make_skill(tmp_path, "generic-tool")
@@ -20,7 +20,6 @@ from cron.jobs import (
    resume_job,
    remove_job,
    mark_job_run,
-    advance_next_run,
    get_due_jobs,
    save_job_output,
 )
@@ -340,90 +339,6 @@ class TestMarkJobRun:
        assert updated["last_error"] == "timeout"


-class TestAdvanceNextRun:
-    """Tests for advance_next_run() — crash-safety for recurring jobs."""
-
-    def test_advances_interval_job(self, tmp_cron_dir):
-        """Interval jobs should have next_run_at bumped to the next future occurrence."""
-        job = create_job(prompt="Recurring check", schedule="every 1h")
-        # Force next_run_at to 5 minutes ago (i.e. the job is due)
-        jobs = load_jobs()
-        old_next = (datetime.now() - timedelta(minutes=5)).isoformat()
-        jobs[0]["next_run_at"] = old_next
-        save_jobs(jobs)
-
-        result = advance_next_run(job["id"])
-        assert result is True
-
-        updated = get_job(job["id"])
-        from cron.jobs import _ensure_aware, _hermes_now
-        new_next_dt = _ensure_aware(datetime.fromisoformat(updated["next_run_at"]))
-        assert new_next_dt > _hermes_now(), "next_run_at should be in the future after advance"
-
-    def test_advances_cron_job(self, tmp_cron_dir):
-        """Cron-expression jobs should have next_run_at bumped to the next occurrence."""
-        pytest.importorskip("croniter")
-        job = create_job(prompt="Daily wakeup", schedule="15 6 * * *")
-        # Force next_run_at to 30 minutes ago
-        jobs = load_jobs()
-        old_next = (datetime.now() - timedelta(minutes=30)).isoformat()
-        jobs[0]["next_run_at"] = old_next
-        save_jobs(jobs)
-
-        result = advance_next_run(job["id"])
-        assert result is True
-
-        updated = get_job(job["id"])
-        from cron.jobs import _ensure_aware, _hermes_now
-        new_next_dt = _ensure_aware(datetime.fromisoformat(updated["next_run_at"]))
-        assert new_next_dt > _hermes_now(), "next_run_at should be in the future after advance"
-
-    def test_skips_oneshot_job(self, tmp_cron_dir):
-        """One-shot jobs should NOT be advanced — they need to retry on restart."""
-        job = create_job(prompt="Run once", schedule="30m")
-        original_next = get_job(job["id"])["next_run_at"]
-
-        result = advance_next_run(job["id"])
-        assert result is False
-
-        updated = get_job(job["id"])
-        assert updated["next_run_at"] == original_next, "one-shot next_run_at should be unchanged"
-
-    def test_nonexistent_job_returns_false(self, tmp_cron_dir):
-        result = advance_next_run("nonexistent-id")
-        assert result is False
-
-    def test_already_future_stays_future(self, tmp_cron_dir):
-        """If next_run_at is already in the future, advance keeps it in the future (no harm)."""
-        job = create_job(prompt="Future job", schedule="every 1h")
-        # next_run_at is already set to ~1h from now by create_job
-        advance_next_run(job["id"])
-        # Regardless of return value, the job should still be in the future
-        updated = get_job(job["id"])
-        from cron.jobs import _ensure_aware, _hermes_now
-        new_next_dt = _ensure_aware(datetime.fromisoformat(updated["next_run_at"]))
-        assert new_next_dt > _hermes_now(), "next_run_at should remain in the future"
-
-    def test_crash_safety_scenario(self, tmp_cron_dir):
-        """Simulate the crash-loop scenario: after advance, the job should NOT be due."""
-        job = create_job(prompt="Crash test", schedule="every 1h")
-        # Force next_run_at to 5 minutes ago (job is due)
-        jobs = load_jobs()
-        jobs[0]["next_run_at"] = (datetime.now() - timedelta(minutes=5)).isoformat()
-        save_jobs(jobs)
-
-        # Job should be due before advance
-        due_before = get_due_jobs()
-        assert len(due_before) == 1
-
-        # Advance (simulating what tick() does before run_job)
-        advance_next_run(job["id"])
-
-        # Now the job should NOT be due (simulates restart after crash)
-        due_after = get_due_jobs()
-        assert len(due_after) == 0, "Job should not be due after advance_next_run"
-
-
 class TestGetDueJobs:
    def test_past_due_within_window_returned(self, tmp_cron_dir):
        """Jobs within the dynamic grace window are still considered due (not stale).
@@ -687,41 +687,3 @@ class TestBuildJobPromptMissingSkill:
            result = _build_job_prompt({"skills": ["ghost-skill", "real-skill"], "prompt": "go"})
        assert "Real skill content." in result
        assert "go" in result
-
-
-class TestTickAdvanceBeforeRun:
-    """Verify that tick() calls advance_next_run before run_job for crash safety."""
-
-    def test_advance_called_before_run_job(self, tmp_path):
-        """advance_next_run must be called before run_job to prevent crash-loop re-fires."""
-        call_order = []
-
-        def fake_advance(job_id):
-            call_order.append(("advance", job_id))
-            return True
-
-        def fake_run_job(job):
-            call_order.append(("run", job["id"]))
-            return True, "output", "response", None
-
-        fake_job = {
-            "id": "test-advance",
-            "name": "test",
-            "prompt": "hello",
-            "enabled": True,
-            "schedule": {"kind": "cron", "expr": "15 6 * * *"},
-        }
-
-        with patch("cron.scheduler.get_due_jobs", return_value=[fake_job]), \
-             patch("cron.scheduler.advance_next_run", side_effect=fake_advance) as adv_mock, \
-             patch("cron.scheduler.run_job", side_effect=fake_run_job), \
-             patch("cron.scheduler.save_job_output", return_value=tmp_path / "out.md"), \
-             patch("cron.scheduler.mark_job_run"), \
-             patch("cron.scheduler._deliver_result"):
-            from cron.scheduler import tick
-            executed = tick(verbose=False)
-
-        assert executed == 1
-        adv_mock.assert_called_once_with("test-advance")
-        # advance must happen before run
-        assert call_order == [("advance", "test-advance"), ("run", "test-advance")]
@@ -28,7 +28,6 @@ from gateway.platforms.api_server import (
    _CORS_HEADERS,
    check_api_server_requirements,
    cors_middleware,
-    security_headers_middleware,
 )


@@ -215,11 +214,9 @@ def _make_adapter(api_key: str = "", cors_origins=None) -> APIServerAdapter:

 def _create_app(adapter: APIServerAdapter) -> web.Application:
    """Create the aiohttp app from the adapter (without starting the full server)."""
-    mws = [mw for mw in (cors_middleware, security_headers_middleware) if mw is not None]
-    app = web.Application(middlewares=mws)
+    app = web.Application(middlewares=[cors_middleware])
    app["api_server_adapter"] = adapter
    app.router.add_get("/health", adapter._handle_health)
-    app.router.add_get("/v1/health", adapter._handle_health)
    app.router.add_get("/v1/models", adapter._handle_models)
    app.router.add_post("/v1/chat/completions", adapter._handle_chat_completions)
    app.router.add_post("/v1/responses", adapter._handle_responses)
@@ -244,16 +241,6 @@ def auth_adapter():


 class TestHealthEndpoint:
-    @pytest.mark.asyncio
-    async def test_security_headers_present(self, adapter):
-        """Responses should include basic security headers."""
-        app = _create_app(adapter)
-        async with TestClient(TestServer(app)) as cli:
-            resp = await cli.get("/health")
-            assert resp.status == 200
-            assert resp.headers.get("X-Content-Type-Options") == "nosniff"
-            assert resp.headers.get("Referrer-Policy") == "no-referrer"
-
    @pytest.mark.asyncio
    async def test_health_returns_ok(self, adapter):
        app = _create_app(adapter)
@@ -264,17 +251,6 @@ class TestHealthEndpoint:
            assert data["status"] == "ok"
            assert data["platform"] == "hermes-agent"

-    @pytest.mark.asyncio
-    async def test_v1_health_alias_returns_ok(self, adapter):
-        """GET /v1/health should return the same response as /health."""
-        app = _create_app(adapter)
-        async with TestClient(TestServer(app)) as cli:
-            resp = await cli.get("/v1/health")
-            assert resp.status == 200
-            data = await resp.json()
-            assert data["status"] == "ok"
-            assert data["platform"] == "hermes-agent"
-

 # ---------------------------------------------------------------------------
 # /v1/models endpoint
@@ -1324,31 +1300,6 @@ class TestCORS:
            assert "POST" in resp.headers.get("Access-Control-Allow-Methods", "")
            assert "DELETE" in resp.headers.get("Access-Control-Allow-Methods", "")

-    @pytest.mark.asyncio
-    async def test_cors_allows_idempotency_key_header(self):
-        adapter = _make_adapter(cors_origins=["http://localhost:3000"])
-        app = _create_app(adapter)
-        async with TestClient(TestServer(app)) as cli:
-            resp = await cli.options(
-                "/v1/chat/completions",
-                headers={
-                    "Origin": "http://localhost:3000",
-                    "Access-Control-Request-Method": "POST",
-                    "Access-Control-Request-Headers": "Idempotency-Key",
-                },
-            )
-            assert resp.status == 200
-            assert "Idempotency-Key" in resp.headers.get("Access-Control-Allow-Headers", "")
-
-    @pytest.mark.asyncio
-    async def test_cors_sets_vary_origin_header(self):
-        adapter = _make_adapter(cors_origins=["http://localhost:3000"])
-        app = _create_app(adapter)
-        async with TestClient(TestServer(app)) as cli:
-            resp = await cli.get("/health", headers={"Origin": "http://localhost:3000"})
-            assert resp.status == 200
-            assert resp.headers.get("Vary") == "Origin"
-
    @pytest.mark.asyncio
    async def test_cors_options_preflight_allowed_for_configured_origin(self):
        """Configured origins can complete browser preflight."""
@@ -1368,21 +1319,6 @@ class TestCORS:
            assert "Authorization" in resp.headers.get("Access-Control-Allow-Headers", "")


-    @pytest.mark.asyncio
-    async def test_cors_preflight_sets_max_age(self):
-        adapter = _make_adapter(cors_origins=["http://localhost:3000"])
-        app = _create_app(adapter)
-        async with TestClient(TestServer(app)) as cli:
-            resp = await cli.options(
-                "/v1/chat/completions",
-                headers={
-                    "Origin": "http://localhost:3000",
-                    "Access-Control-Request-Method": "POST",
-                    "Access-Control-Request-Headers": "Authorization, Content-Type",
-                },
-            )
-            assert resp.status == 200
-            assert resp.headers.get("Access-Control-Max-Age") == "600"
 # ---------------------------------------------------------------------------
 # Conversation parameter
 # ---------------------------------------------------------------------------
@@ -10,7 +10,6 @@ Covers:
 """

 import asyncio
-import os
 import sys
 from pathlib import Path
 from types import SimpleNamespace
@@ -228,8 +227,7 @@ def test_persist_dm_topic_thread_id_writes_config(tmp_path):

    adapter = _make_adapter()

-    with patch.object(Path, "home", return_value=tmp_path), \
-         patch.dict(os.environ, {"HERMES_HOME": str(tmp_path / ".hermes")}):
+    with patch.object(Path, "home", return_value=tmp_path):
        adapter._persist_dm_topic_thread_id(111, "General", 999)

    with open(config_file) as f:
@@ -368,8 +366,7 @@ def test_get_dm_topic_info_hot_reloads_from_config(tmp_path):
    with open(config_file, "w") as f:
        yaml.dump(config_data, f)

-    with patch.object(Path, "home", return_value=tmp_path), \
-         patch.dict(os.environ, {"HERMES_HOME": str(tmp_path / ".hermes")}):
+    with patch.object(Path, "home", return_value=tmp_path):
        result = adapter._get_dm_topic_info("111", "555")

    assert result is not None
@@ -1,5 +1,4 @@
 """Tests for Matrix platform adapter."""
-import asyncio
 import json
 import re
 import pytest
@@ -447,199 +446,3 @@ class TestMatrixRequirements:
        monkeypatch.delenv("MATRIX_HOMESERVER", raising=False)
        from gateway.platforms.matrix import check_matrix_requirements
        assert check_matrix_requirements() is False
-
-
-# ---------------------------------------------------------------------------
-# Access-token auth / E2EE bootstrap
-# ---------------------------------------------------------------------------
-
-class TestMatrixAccessTokenAuth:
-    @pytest.mark.asyncio
-    async def test_connect_fetches_device_id_from_whoami_for_access_token(self):
-        from gateway.platforms.matrix import MatrixAdapter
-
-        config = PlatformConfig(
-            enabled=True,
-            token="syt_test_access_token",
-            extra={
-                "homeserver": "https://matrix.example.org",
-                "user_id": "@bot:example.org",
-                "encryption": True,
-            },
-        )
-        adapter = MatrixAdapter(config)
-
-        class FakeWhoamiResponse:
-            def __init__(self, user_id, device_id):
-                self.user_id = user_id
-                self.device_id = device_id
-
-        class FakeSyncResponse:
-            def __init__(self):
-                self.rooms = MagicMock(join={})
-
-        fake_client = MagicMock()
-        fake_client.whoami = AsyncMock(return_value=FakeWhoamiResponse("@bot:example.org", "DEV123"))
-        fake_client.sync = AsyncMock(return_value=FakeSyncResponse())
-        fake_client.keys_upload = AsyncMock()
-        fake_client.keys_query = AsyncMock()
-        fake_client.keys_claim = AsyncMock()
-        fake_client.send_to_device_messages = AsyncMock(return_value=[])
-        fake_client.get_users_for_key_claiming = MagicMock(return_value={})
-        fake_client.close = AsyncMock()
-        fake_client.add_event_callback = MagicMock()
-        fake_client.rooms = {}
-        fake_client.account_data = {}
-        fake_client.olm = object()
-        fake_client.should_upload_keys = False
-        fake_client.should_query_keys = False
-        fake_client.should_claim_keys = False
-
-        def _restore_login(user_id, device_id, access_token):
-            fake_client.user_id = user_id
-            fake_client.device_id = device_id
-            fake_client.access_token = access_token
-            fake_client.olm = object()
-
-        fake_client.restore_login = MagicMock(side_effect=_restore_login)
-
-        fake_nio = MagicMock()
-        fake_nio.AsyncClient = MagicMock(return_value=fake_client)
-        fake_nio.WhoamiResponse = FakeWhoamiResponse
-        fake_nio.SyncResponse = FakeSyncResponse
-        fake_nio.LoginResponse = type("LoginResponse", (), {})
-        fake_nio.RoomMessageText = type("RoomMessageText", (), {})
-        fake_nio.RoomMessageImage = type("RoomMessageImage", (), {})
-        fake_nio.RoomMessageAudio = type("RoomMessageAudio", (), {})
-        fake_nio.RoomMessageVideo = type("RoomMessageVideo", (), {})
-        fake_nio.RoomMessageFile = type("RoomMessageFile", (), {})
-        fake_nio.InviteMemberEvent = type("InviteMemberEvent", (), {})
-        fake_nio.MegolmEvent = type("MegolmEvent", (), {})
-
-        with patch.dict("sys.modules", {"nio": fake_nio}):
-            with patch.object(adapter, "_refresh_dm_cache", AsyncMock()):
-                with patch.object(adapter, "_sync_loop", AsyncMock(return_value=None)):
-                    assert await adapter.connect() is True
-
-        fake_client.restore_login.assert_called_once_with(
-            "@bot:example.org", "DEV123", "syt_test_access_token"
-        )
-        assert fake_client.access_token == "syt_test_access_token"
-        assert fake_client.user_id == "@bot:example.org"
-        assert fake_client.device_id == "DEV123"
-        fake_client.whoami.assert_awaited_once()
-
-        await adapter.disconnect()
-
-
-class TestMatrixE2EEMaintenance:
-    @pytest.mark.asyncio
-    async def test_sync_loop_runs_e2ee_maintenance_requests(self):
-        adapter = _make_adapter()
-        adapter._encryption = True
-        adapter._closing = False
-
-        class FakeSyncError:
-            pass
-
-        async def _sync_once(timeout=30000):
-            adapter._closing = True
-            return MagicMock()
-
-        fake_client = MagicMock()
-        fake_client.sync = AsyncMock(side_effect=_sync_once)
-        fake_client.send_to_device_messages = AsyncMock(return_value=[])
-        fake_client.keys_upload = AsyncMock()
-        fake_client.keys_query = AsyncMock()
-        fake_client.get_users_for_key_claiming = MagicMock(
-            return_value={"@alice:example.org": ["DEVICE1"]}
-        )
-        fake_client.keys_claim = AsyncMock()
-        fake_client.olm = object()
-        fake_client.should_upload_keys = True
-        fake_client.should_query_keys = True
-        fake_client.should_claim_keys = True
-
-        adapter._client = fake_client
-
-        fake_nio = MagicMock()
-        fake_nio.SyncError = FakeSyncError
-
-        with patch.dict("sys.modules", {"nio": fake_nio}):
-            await adapter._sync_loop()
-
-        fake_client.sync.assert_awaited_once_with(timeout=30000)
-        fake_client.send_to_device_messages.assert_awaited_once()
-        fake_client.keys_upload.assert_awaited_once()
-        fake_client.keys_query.assert_awaited_once()
-        fake_client.keys_claim.assert_awaited_once_with(
-            {"@alice:example.org": ["DEVICE1"]}
-        )
-
-
-class TestMatrixEncryptedSendFallback:
-    @pytest.mark.asyncio
-    async def test_send_retries_with_ignored_unverified_devices(self):
-        adapter = _make_adapter()
-        adapter._encryption = True
-
-        class FakeRoomSendResponse:
-            def __init__(self, event_id):
-                self.event_id = event_id
-
-        class FakeOlmUnverifiedDeviceError(Exception):
-            pass
-
-        fake_client = MagicMock()
-        fake_client.room_send = AsyncMock(side_effect=[
-            FakeOlmUnverifiedDeviceError("unverified"),
-            FakeRoomSendResponse("$event123"),
-        ])
-        adapter._client = fake_client
-        adapter._run_e2ee_maintenance = AsyncMock()
-
-        fake_nio = MagicMock()
-        fake_nio.RoomSendResponse = FakeRoomSendResponse
-        fake_nio.OlmUnverifiedDeviceError = FakeOlmUnverifiedDeviceError
-
-        with patch.dict("sys.modules", {"nio": fake_nio}):
-            result = await adapter.send("!room:example.org", "hello")
-
-        assert result.success is True
-        assert result.message_id == "$event123"
-        adapter._run_e2ee_maintenance.assert_awaited_once()
-        assert fake_client.room_send.await_count == 2
-        first_call = fake_client.room_send.await_args_list[0]
-        second_call = fake_client.room_send.await_args_list[1]
-        assert first_call.kwargs.get("ignore_unverified_devices") is False
-        assert second_call.kwargs.get("ignore_unverified_devices") is True
-
-    @pytest.mark.asyncio
-    async def test_send_retries_after_timeout_in_encrypted_room(self):
-        adapter = _make_adapter()
-        adapter._encryption = True
-
-        class FakeRoomSendResponse:
-            def __init__(self, event_id):
-                self.event_id = event_id
-
-        fake_client = MagicMock()
-        fake_client.room_send = AsyncMock(side_effect=[
-            asyncio.TimeoutError(),
-            FakeRoomSendResponse("$event456"),
-        ])
-        adapter._client = fake_client
-        adapter._run_e2ee_maintenance = AsyncMock()
-
-        fake_nio = MagicMock()
-        fake_nio.RoomSendResponse = FakeRoomSendResponse
-
-        with patch.dict("sys.modules", {"nio": fake_nio}):
-            result = await adapter.send("!room:example.org", "hello")
-
-        assert result.success is True
-        assert result.message_id == "$event456"
-        adapter._run_e2ee_maintenance.assert_awaited_once()
-        assert fake_client.room_send.await_count == 2
-        second_call = fake_client.room_send.await_args_list[1]
-        assert second_call.kwargs.get("ignore_unverified_devices") is True
@@ -171,170 +171,6 @@ class TestCacheImageFromUrl:
        mock_sleep.assert_not_called()


-# ---------------------------------------------------------------------------
-# cache_audio_from_url (base.py)
-# ---------------------------------------------------------------------------
-
-class TestCacheAudioFromUrl:
-    """Tests for gateway.platforms.base.cache_audio_from_url"""
-
-    def test_success_on_first_attempt(self, tmp_path, monkeypatch):
-        """A clean 200 response caches the audio and returns a path."""
-        monkeypatch.setattr("gateway.platforms.base.AUDIO_CACHE_DIR", tmp_path / "audio")
-
-        fake_response = MagicMock()
-        fake_response.content = b"\x00\x01 fake audio"
-        fake_response.raise_for_status = MagicMock()
-
-        mock_client = AsyncMock()
-        mock_client.get = AsyncMock(return_value=fake_response)
-        mock_client.__aenter__ = AsyncMock(return_value=mock_client)
-        mock_client.__aexit__ = AsyncMock(return_value=False)
-
-        async def run():
-            with patch("httpx.AsyncClient", return_value=mock_client):
-                from gateway.platforms.base import cache_audio_from_url
-                return await cache_audio_from_url(
-                    "http://example.com/voice.ogg", ext=".ogg"
-                )
-
-        path = asyncio.run(run())
-        assert path.endswith(".ogg")
-        mock_client.get.assert_called_once()
-
-    def test_retries_on_timeout_then_succeeds(self, tmp_path, monkeypatch):
-        """A timeout on the first attempt is retried; second attempt succeeds."""
-        monkeypatch.setattr("gateway.platforms.base.AUDIO_CACHE_DIR", tmp_path / "audio")
-
-        fake_response = MagicMock()
-        fake_response.content = b"audio data"
-        fake_response.raise_for_status = MagicMock()
-
-        mock_client = AsyncMock()
-        mock_client.get = AsyncMock(
-            side_effect=[_make_timeout_error(), fake_response]
-        )
-        mock_client.__aenter__ = AsyncMock(return_value=mock_client)
-        mock_client.__aexit__ = AsyncMock(return_value=False)
-
-        mock_sleep = AsyncMock()
-
-        async def run():
-            with patch("httpx.AsyncClient", return_value=mock_client), \
-                 patch("asyncio.sleep", mock_sleep):
-                from gateway.platforms.base import cache_audio_from_url
-                return await cache_audio_from_url(
-                    "http://example.com/voice.ogg", ext=".ogg", retries=2
-                )
-
-        path = asyncio.run(run())
-        assert path.endswith(".ogg")
-        assert mock_client.get.call_count == 2
-        mock_sleep.assert_called_once()
-
-    def test_retries_on_429_then_succeeds(self, tmp_path, monkeypatch):
-        """A 429 response on the first attempt is retried; second attempt succeeds."""
-        monkeypatch.setattr("gateway.platforms.base.AUDIO_CACHE_DIR", tmp_path / "audio")
-
-        ok_response = MagicMock()
-        ok_response.content = b"audio data"
-        ok_response.raise_for_status = MagicMock()
-
-        mock_client = AsyncMock()
-        mock_client.get = AsyncMock(
-            side_effect=[_make_http_status_error(429), ok_response]
-        )
-        mock_client.__aenter__ = AsyncMock(return_value=mock_client)
-        mock_client.__aexit__ = AsyncMock(return_value=False)
-
-        async def run():
-            with patch("httpx.AsyncClient", return_value=mock_client), \
-                 patch("asyncio.sleep", new_callable=AsyncMock):
-                from gateway.platforms.base import cache_audio_from_url
-                return await cache_audio_from_url(
-                    "http://example.com/voice.ogg", ext=".ogg", retries=2
-                )
-
-        path = asyncio.run(run())
-        assert path.endswith(".ogg")
-        assert mock_client.get.call_count == 2
-
-    def test_retries_on_500_then_succeeds(self, tmp_path, monkeypatch):
-        """A 500 response on the first attempt is retried; second attempt succeeds."""
-        monkeypatch.setattr("gateway.platforms.base.AUDIO_CACHE_DIR", tmp_path / "audio")
-
-        ok_response = MagicMock()
-        ok_response.content = b"audio data"
-        ok_response.raise_for_status = MagicMock()
-
-        mock_client = AsyncMock()
-        mock_client.get = AsyncMock(
-            side_effect=[_make_http_status_error(500), ok_response]
-        )
-        mock_client.__aenter__ = AsyncMock(return_value=mock_client)
-        mock_client.__aexit__ = AsyncMock(return_value=False)
-
-        async def run():
-            with patch("httpx.AsyncClient", return_value=mock_client), \
-                 patch("asyncio.sleep", new_callable=AsyncMock):
-                from gateway.platforms.base import cache_audio_from_url
-                return await cache_audio_from_url(
-                    "http://example.com/voice.ogg", ext=".ogg", retries=2
-                )
-
-        path = asyncio.run(run())
-        assert path.endswith(".ogg")
-        assert mock_client.get.call_count == 2
-
-    def test_raises_after_max_retries_exhausted(self, tmp_path, monkeypatch):
-        """Timeout on every attempt raises after all retries are consumed."""
-        monkeypatch.setattr("gateway.platforms.base.AUDIO_CACHE_DIR", tmp_path / "audio")
-
-        mock_client = AsyncMock()
-        mock_client.get = AsyncMock(side_effect=_make_timeout_error())
-        mock_client.__aenter__ = AsyncMock(return_value=mock_client)
-        mock_client.__aexit__ = AsyncMock(return_value=False)
-
-        async def run():
-            with patch("httpx.AsyncClient", return_value=mock_client), \
-                 patch("asyncio.sleep", new_callable=AsyncMock):
-                from gateway.platforms.base import cache_audio_from_url
-                await cache_audio_from_url(
-                    "http://example.com/voice.ogg", ext=".ogg", retries=2
-                )
-
-        with pytest.raises(httpx.TimeoutException):
-            asyncio.run(run())
-
-        # 3 total calls: initial + 2 retries
-        assert mock_client.get.call_count == 3
-
-    def test_non_retryable_4xx_raises_immediately(self, tmp_path, monkeypatch):
-        """A 404 (non-retryable) is raised immediately without any retry."""
-        monkeypatch.setattr("gateway.platforms.base.AUDIO_CACHE_DIR", tmp_path / "audio")
-
-        mock_sleep = AsyncMock()
-        mock_client = AsyncMock()
-        mock_client.get = AsyncMock(side_effect=_make_http_status_error(404))
-        mock_client.__aenter__ = AsyncMock(return_value=mock_client)
-        mock_client.__aexit__ = AsyncMock(return_value=False)
-
-        async def run():
-            with patch("httpx.AsyncClient", return_value=mock_client), \
-                 patch("asyncio.sleep", mock_sleep):
-                from gateway.platforms.base import cache_audio_from_url
-                await cache_audio_from_url(
-                    "http://example.com/voice.ogg", ext=".ogg", retries=2
-                )
-
-        with pytest.raises(httpx.HTTPStatusError):
-            asyncio.run(run())
-
-        # Only 1 attempt, no sleep
-        assert mock_client.get.call_count == 1
-        mock_sleep.assert_not_called()
-
-
 # ---------------------------------------------------------------------------
 # Slack mock setup (mirrors existing test_slack.py approach)
 # ---------------------------------------------------------------------------
@@ -62,18 +62,6 @@ class TestMessageEventGetCommand:
        event = MessageEvent(text="/")
        assert event.get_command() == ""

-    def test_command_with_at_botname(self):
-        event = MessageEvent(text="/new@TigerNanoBot")
-        assert event.get_command() == "new"
-
-    def test_command_with_at_botname_and_args(self):
-        event = MessageEvent(text="/compress@TigerNanoBot")
-        assert event.get_command() == "compress"
-
-    def test_command_mixed_case_with_at_botname(self):
-        event = MessageEvent(text="/RESET@TigerNanoBot")
-        assert event.get_command() == "reset"
-

 class TestMessageEventGetCommandArgs:
    def test_command_with_args(self):
@@ -344,7 +344,6 @@ class TestRuntimeDisconnectQueuing:
    async def test_retryable_runtime_error_queued_for_reconnect(self):
        """Retryable runtime errors should add the platform to _failed_platforms."""
        runner = _make_runner()
-        runner.stop = AsyncMock()

        adapter = StubAdapter(succeed=True)
        adapter._set_fatal_error("network_error", "DNS failure", retryable=True)
@@ -372,12 +371,8 @@ class TestRuntimeDisconnectQueuing:
        assert Platform.TELEGRAM not in runner._failed_platforms

    @pytest.mark.asyncio
-    async def test_retryable_error_exits_for_service_restart_when_all_down(self):
-        """Gateway should exit with failure when all platforms fail with retryable errors.
-
-        This lets systemd Restart=on-failure restart the process, which is more
-        reliable than in-process background reconnection after exhausted retries.
-        """
+    async def test_retryable_error_prevents_shutdown_when_queued(self):
+        """Gateway should not shut down if failed platforms are queued for reconnection."""
        runner = _make_runner()
        runner.stop = AsyncMock()

@@ -387,28 +382,7 @@ class TestRuntimeDisconnectQueuing:

        await runner._handle_adapter_fatal_error(adapter)

-        # stop() SHOULD be called — gateway exits for systemd restart
-        runner.stop.assert_called_once()
-        assert runner._exit_with_failure is True
-        assert Platform.TELEGRAM in runner._failed_platforms
-
-    @pytest.mark.asyncio
-    async def test_retryable_error_no_exit_when_other_adapters_still_connected(self):
-        """Gateway should NOT exit if some adapters are still connected."""
-        runner = _make_runner()
-        runner.stop = AsyncMock()
-
-        failing_adapter = StubAdapter(succeed=True)
-        failing_adapter._set_fatal_error("network_error", "DNS failure", retryable=True)
-        runner.adapters[Platform.TELEGRAM] = failing_adapter
-
-        # Another adapter is still connected
-        healthy_adapter = StubAdapter(succeed=True)
-        runner.adapters[Platform.DISCORD] = healthy_adapter
-
-        await runner._handle_adapter_fatal_error(failing_adapter)
-
-        # stop() should NOT have been called — Discord is still up
+        # stop() should NOT have been called since we have platforms queued
        runner.stop.assert_not_called()
        assert Platform.TELEGRAM in runner._failed_platforms

@@ -14,8 +14,8 @@ from gateway.session import SessionSource


 class ProgressCaptureAdapter(BasePlatformAdapter):
-    def __init__(self, platform=Platform.TELEGRAM):
-        super().__init__(PlatformConfig(enabled=True, token="***"), platform)
+    def __init__(self):
+        super().__init__(PlatformConfig(enabled=True, token="fake-token"), Platform.TELEGRAM)
        self.sent = []
        self.edits = []
        self.typing = []
@@ -76,7 +76,7 @@ def _make_runner(adapter):
    GatewayRunner = gateway_run.GatewayRunner

    runner = object.__new__(GatewayRunner)
-    runner.adapters = {adapter.platform: adapter}
+    runner.adapters = {Platform.TELEGRAM: adapter}
    runner._voice_mode = {}
    runner._prefill_messages = []
    runner._ephemeral_system_prompt = ""
@@ -133,87 +133,3 @@ async def test_run_agent_progress_stays_in_originating_topic(monkeypatch, tmp_pa
    ]
    assert adapter.edits
    assert all(call["metadata"] == {"thread_id": "17585"} for call in adapter.typing)
-
-
-@pytest.mark.asyncio
-async def test_run_agent_progress_does_not_use_event_message_id_for_telegram_dm(monkeypatch, tmp_path):
-    """Telegram DM progress must not reuse event message id as thread metadata."""
-    monkeypatch.setenv("HERMES_TOOL_PROGRESS_MODE", "all")
-
-    fake_dotenv = types.ModuleType("dotenv")
-    fake_dotenv.load_dotenv = lambda *args, **kwargs: None
-    monkeypatch.setitem(sys.modules, "dotenv", fake_dotenv)
-
-    fake_run_agent = types.ModuleType("run_agent")
-    fake_run_agent.AIAgent = FakeAgent
-    monkeypatch.setitem(sys.modules, "run_agent", fake_run_agent)
-
-    adapter = ProgressCaptureAdapter(platform=Platform.TELEGRAM)
-    runner = _make_runner(adapter)
-    gateway_run = importlib.import_module("gateway.run")
-    monkeypatch.setattr(gateway_run, "_hermes_home", tmp_path)
-    monkeypatch.setattr(gateway_run, "_resolve_runtime_agent_kwargs", lambda: {"api_key": "***"})
-
-    source = SessionSource(
-        platform=Platform.TELEGRAM,
-        chat_id="12345",
-        chat_type="dm",
-        thread_id=None,
-    )
-
-    result = await runner._run_agent(
-        message="hello",
-        context_prompt="",
-        history=[],
-        source=source,
-        session_id="sess-2",
-        session_key="agent:main:telegram:dm:12345",
-        event_message_id="777",
-    )
-
-    assert result["final_response"] == "done"
-    assert adapter.sent
-    assert adapter.sent[0]["metadata"] is None
-    assert all(call["metadata"] is None for call in adapter.typing)
-
-
-@pytest.mark.asyncio
-async def test_run_agent_progress_uses_event_message_id_for_slack_dm(monkeypatch, tmp_path):
-    """Slack DM progress should keep event ts fallback threading."""
-    monkeypatch.setenv("HERMES_TOOL_PROGRESS_MODE", "all")
-
-    fake_dotenv = types.ModuleType("dotenv")
-    fake_dotenv.load_dotenv = lambda *args, **kwargs: None
-    monkeypatch.setitem(sys.modules, "dotenv", fake_dotenv)
-
-    fake_run_agent = types.ModuleType("run_agent")
-    fake_run_agent.AIAgent = FakeAgent
-    monkeypatch.setitem(sys.modules, "run_agent", fake_run_agent)
-
-    adapter = ProgressCaptureAdapter(platform=Platform.SLACK)
-    runner = _make_runner(adapter)
-    gateway_run = importlib.import_module("gateway.run")
-    monkeypatch.setattr(gateway_run, "_hermes_home", tmp_path)
-    monkeypatch.setattr(gateway_run, "_resolve_runtime_agent_kwargs", lambda: {"api_key": "***"})
-
-    source = SessionSource(
-        platform=Platform.SLACK,
-        chat_id="D123",
-        chat_type="dm",
-        thread_id=None,
-    )
-
-    result = await runner._run_agent(
-        message="hello",
-        context_prompt="",
-        history=[],
-        source=source,
-        session_id="sess-3",
-        session_key="agent:main:slack:dm:D123",
-        event_message_id="1234567890.000001",
-    )
-
-    assert result["final_response"] == "done"
-    assert adapter.sent
-    assert adapter.sent[0]["metadata"] == {"thread_id": "1234567890.000001"}
-    assert all(call["metadata"] == {"thread_id": "1234567890.000001"} for call in adapter.typing)
@@ -89,8 +89,7 @@ async def test_runner_queues_retryable_runtime_fatal_for_reconnection(monkeypatc

    await runner._handle_adapter_fatal_error(adapter)

-    # Should shut down with failure — systemd Restart=on-failure will restart
-    runner.stop.assert_awaited_once()
-    assert runner._exit_with_failure is True
+    # Should NOT shut down — platform is queued for reconnection
+    runner.stop.assert_not_awaited()
    assert Platform.WHATSAPP in runner._failed_platforms
    assert runner._failed_platforms[Platform.WHATSAPP]["attempts"] == 0
@@ -304,12 +304,8 @@ async def test_session_hygiene_messages_stay_in_originating_topic(monkeypatch, t
    class FakeCompressAgent:
        def __init__(self, **kwargs):
            self.model = kwargs.get("model")
-            self.session_id = kwargs.get("session_id", "fake-session")
-            self._print_fn = None

        def _compress_context(self, messages, *_args, **_kwargs):
-            # Simulate real _compress_context: create a new session_id
-            self.session_id = f"{self.session_id}_compressed"
            return ([{"role": "assistant", "content": "compressed"}], None)

    fake_run_agent = types.ModuleType("run_agent")
@@ -1,280 +0,0 @@
-"""Tests for SSE client disconnect → agent task cancellation.
-
-When a streaming /v1/chat/completions client disconnects mid-stream
-(network drop, browser tab close), the agent is interrupted via
-agent.interrupt() so it stops making LLM API calls, and the asyncio
-task wrapper is cancelled.
-"""
-
-import asyncio
-import json
-import queue
-from unittest.mock import AsyncMock, MagicMock, patch
-
-import pytest
-
-
-# ---------------------------------------------------------------------------
-# Helpers
-# ---------------------------------------------------------------------------
-
-def _make_adapter():
-    """Build a minimal APIServerAdapter with mocked internals."""
-    from gateway.platforms.api_server import APIServerAdapter
-    from gateway.config import PlatformConfig
-
-    config = PlatformConfig(enabled=True, token="test-key")
-    adapter = APIServerAdapter(config)
-    return adapter
-
-
-def _make_request():
-    """Build a mock aiohttp request."""
-    req = MagicMock()
-    req.headers = {}
-    return req
-
-
-# ---------------------------------------------------------------------------
-# Tests
-# ---------------------------------------------------------------------------
-
-class TestSSEAgentCancelOnDisconnect:
-    """gateway/platforms/api_server.py — _write_sse_chat_completion()"""
-
-    def test_agent_task_cancelled_on_client_disconnect(self):
-        """When response.write raises ConnectionResetError (client dropped),
-        the agent task must be cancelled."""
-        adapter = _make_adapter()
-
-        stream_q = queue.Queue()
-        stream_q.put("hello ")  # Some data already queued
-
-        # Agent task that runs forever (simulates a long LLM call)
-        agent_done = asyncio.Event()
-
-        async def fake_agent():
-            await agent_done.wait()
-            return {"final_response": "done"}, {"input_tokens": 10, "output_tokens": 5, "total_tokens": 15}
-
-        async def run():
-            from aiohttp import web
-
-            agent_task = asyncio.ensure_future(fake_agent())
-
-            # Mock response that raises ConnectionResetError on second write
-            mock_response = AsyncMock(spec=web.StreamResponse)
-            call_count = 0
-
-            async def write_side_effect(data):
-                nonlocal call_count
-                call_count += 1
-                if call_count >= 2:
-                    raise ConnectionResetError("client disconnected")
-
-            mock_response.write = AsyncMock(side_effect=write_side_effect)
-            mock_response.prepare = AsyncMock()
-
-            with patch.object(type(adapter), '_write_sse_chat_completion',
-                              adapter._write_sse_chat_completion):
-                # Patch StreamResponse creation
-                with patch("gateway.platforms.api_server.web.StreamResponse",
-                           return_value=mock_response):
-                    await adapter._write_sse_chat_completion(
-                        _make_request(), "cmpl-123", "gpt-4", 1234567890,
-                        stream_q, agent_task,
-                    )
-
-            # The critical assertion: agent_task must be cancelled
-            assert agent_task.cancelled() or agent_task.done()
-            # Clean up
-            agent_done.set()
-
-        asyncio.run(run())
-
-    def test_agent_task_not_cancelled_on_normal_completion(self):
-        """On normal stream completion, agent task should NOT be cancelled."""
-        adapter = _make_adapter()
-
-        stream_q = queue.Queue()
-        stream_q.put("hello")
-        stream_q.put(None)  # End-of-stream sentinel
-
-        async def fake_agent():
-            return {"final_response": "done"}, {"input_tokens": 10, "output_tokens": 5, "total_tokens": 15}
-
-        async def run():
-            from aiohttp import web
-
-            agent_task = asyncio.ensure_future(fake_agent())
-            await asyncio.sleep(0)  # Let agent complete
-
-            mock_response = AsyncMock(spec=web.StreamResponse)
-            mock_response.write = AsyncMock()
-            mock_response.prepare = AsyncMock()
-
-            with patch("gateway.platforms.api_server.web.StreamResponse",
-                       return_value=mock_response):
-                await adapter._write_sse_chat_completion(
-                    _make_request(), "cmpl-456", "gpt-4", 1234567890,
-                    stream_q, agent_task,
-                )
-
-            # Agent should have completed normally, not been cancelled
-            assert agent_task.done()
-            assert not agent_task.cancelled()
-
-        asyncio.run(run())
-
-    def test_broken_pipe_also_cancels_agent(self):
-        """BrokenPipeError (another disconnect variant) also cancels the task."""
-        adapter = _make_adapter()
-
-        stream_q = queue.Queue()
-
-        async def fake_agent():
-            await asyncio.sleep(999)  # Never completes
-            return {}, {}
-
-        async def run():
-            from aiohttp import web
-
-            agent_task = asyncio.ensure_future(fake_agent())
-
-            mock_response = AsyncMock(spec=web.StreamResponse)
-            mock_response.write = AsyncMock(side_effect=BrokenPipeError("pipe broken"))
-            mock_response.prepare = AsyncMock()
-
-            with patch("gateway.platforms.api_server.web.StreamResponse",
-                       return_value=mock_response):
-                await adapter._write_sse_chat_completion(
-                    _make_request(), "cmpl-789", "gpt-4", 1234567890,
-                    stream_q, agent_task,
-                )
-
-            assert agent_task.cancelled() or agent_task.done()
-
-        asyncio.run(run())
-
-    def test_already_done_task_not_cancelled_on_disconnect(self):
-        """If agent already finished before disconnect, don't try to cancel."""
-        adapter = _make_adapter()
-
-        stream_q = queue.Queue()
-        stream_q.put("data")
-
-        async def fake_agent():
-            return {"final_response": "done"}, {}
-
-        async def run():
-            from aiohttp import web
-
-            agent_task = asyncio.ensure_future(fake_agent())
-            await asyncio.sleep(0)  # Let agent complete
-
-            mock_response = AsyncMock(spec=web.StreamResponse)
-            call_count = 0
-
-            async def write_side_effect(data):
-                nonlocal call_count
-                call_count += 1
-                if call_count >= 2:
-                    raise ConnectionResetError("late disconnect")
-
-            mock_response.write = AsyncMock(side_effect=write_side_effect)
-            mock_response.prepare = AsyncMock()
-
-            with patch("gateway.platforms.api_server.web.StreamResponse",
-                       return_value=mock_response):
-                await adapter._write_sse_chat_completion(
-                    _make_request(), "cmpl-done", "gpt-4", 1234567890,
-                    stream_q, agent_task,
-                )
-
-            # Task was already done — should not be cancelled
-            assert agent_task.done()
-            assert not agent_task.cancelled()
-
-        asyncio.run(run())
-
-    def test_agent_interrupt_called_on_disconnect(self):
-        """When the client disconnects, agent.interrupt() must be called
-        so the agent thread stops making LLM API calls."""
-        adapter = _make_adapter()
-
-        stream_q = queue.Queue()
-        stream_q.put("hello ")
-
-        agent_done = asyncio.Event()
-
-        async def fake_agent():
-            await agent_done.wait()
-            return {"final_response": "done"}, {}
-
-        # Mock agent with an interrupt method
-        mock_agent = MagicMock()
-        mock_agent.interrupt = MagicMock()
-
-        async def run():
-            from aiohttp import web
-
-            agent_task = asyncio.ensure_future(fake_agent())
-            agent_ref = [mock_agent]
-
-            mock_response = AsyncMock(spec=web.StreamResponse)
-            call_count = 0
-
-            async def write_side_effect(data):
-                nonlocal call_count
-                call_count += 1
-                if call_count >= 2:
-                    raise ConnectionResetError("client disconnected")
-
-            mock_response.write = AsyncMock(side_effect=write_side_effect)
-            mock_response.prepare = AsyncMock()
-
-            with patch("gateway.platforms.api_server.web.StreamResponse",
-                       return_value=mock_response):
-                await adapter._write_sse_chat_completion(
-                    _make_request(), "cmpl-int", "gpt-4", 1234567890,
-                    stream_q, agent_task, agent_ref,
-                )
-
-            # agent.interrupt() must have been called
-            mock_agent.interrupt.assert_called_once_with("SSE client disconnected")
-            # Clean up
-            agent_done.set()
-
-        asyncio.run(run())
-
-    def test_agent_ref_none_still_cancels_task(self):
-        """When agent_ref is not provided (None), the task is still cancelled
-        on disconnect — just without the interrupt() call."""
-        adapter = _make_adapter()
-
-        stream_q = queue.Queue()
-
-        async def fake_agent():
-            await asyncio.sleep(999)
-            return {}, {}
-
-        async def run():
-            from aiohttp import web
-
-            agent_task = asyncio.ensure_future(fake_agent())
-
-            mock_response = AsyncMock(spec=web.StreamResponse)
-            mock_response.write = AsyncMock(side_effect=BrokenPipeError("gone"))
-            mock_response.prepare = AsyncMock()
-
-            with patch("gateway.platforms.api_server.web.StreamResponse",
-                       return_value=mock_response):
-                # No agent_ref passed — should still handle disconnect cleanly
-                await adapter._write_sse_chat_completion(
-                    _make_request(), "cmpl-noref", "gpt-4", 1234567890,
-                    stream_q, agent_task,
-                )
-
-            assert agent_task.cancelled() or agent_task.done()
-
-        asyncio.run(run())
@@ -315,24 +315,6 @@ class TestFallbackTransportInit:
        transport = tnet.TelegramFallbackTransport(["149.154.167.220", "not-an-ip"])
        assert transport._fallback_ips == ["149.154.167.220"]

-    def test_uses_proxy_env_for_primary_and_fallback_transports(self, monkeypatch):
-        seen_kwargs = []
-
-        def factory(**kwargs):
-            seen_kwargs.append(kwargs.copy())
-            return FakeTransport([], {})
-
-        for key in ("HTTPS_PROXY", "HTTP_PROXY", "ALL_PROXY", "https_proxy", "http_proxy", "all_proxy"):
-            monkeypatch.delenv(key, raising=False)
-        monkeypatch.setenv("HTTPS_PROXY", "http://proxy.example:8080")
-        monkeypatch.setattr(tnet.httpx, "AsyncHTTPTransport", factory)
-
-        transport = tnet.TelegramFallbackTransport(["149.154.167.220"])
-
-        assert transport._fallback_ips == ["149.154.167.220"]
-        assert len(seen_kwargs) == 2
-        assert all(kwargs["proxy"] == "http://proxy.example:8080" for kwargs in seen_kwargs)
-

 class TestFallbackTransportClose:
    @pytest.mark.asyncio
@@ -1,199 +0,0 @@
-"""Tests for Telegram send() thread_id fallback.
-
-When message_thread_id points to a non-existent thread, Telegram returns
-BadRequest('Message thread not found'). Since BadRequest is a subclass of
-NetworkError in python-telegram-bot, the old retry loop treated this as a
-transient error and retried 3 times before silently failing — killing all
-tool progress messages, streaming responses, and typing indicators.
-
-The fix detects "thread not found" BadRequest errors and retries the send
-WITHOUT message_thread_id so the message still reaches the chat.
-"""
-
-import sys
-import types
-from types import SimpleNamespace
-
-import pytest
-
-from gateway.config import PlatformConfig, Platform
-from gateway.platforms.base import SendResult
-
-
-# ── Fake telegram.error hierarchy ──────────────────────────────────────
-# Mirrors the real python-telegram-bot hierarchy:
-#   BadRequest → NetworkError → TelegramError → Exception
-
-
-class FakeNetworkError(Exception):
-    pass
-
-
-class FakeBadRequest(FakeNetworkError):
-    pass
-
-
-# Build a fake telegram module tree so the adapter's internal imports work
-_fake_telegram = types.ModuleType("telegram")
-_fake_telegram_error = types.ModuleType("telegram.error")
-_fake_telegram_error.NetworkError = FakeNetworkError
-_fake_telegram_error.BadRequest = FakeBadRequest
-_fake_telegram.error = _fake_telegram_error
-_fake_telegram_constants = types.ModuleType("telegram.constants")
-_fake_telegram_constants.ParseMode = SimpleNamespace(MARKDOWN_V2="MarkdownV2")
-_fake_telegram.constants = _fake_telegram_constants
-
-
-@pytest.fixture(autouse=True)
-def _inject_fake_telegram(monkeypatch):
-    """Inject fake telegram modules so the adapter can import from them."""
-    monkeypatch.setitem(sys.modules, "telegram", _fake_telegram)
-    monkeypatch.setitem(sys.modules, "telegram.error", _fake_telegram_error)
-    monkeypatch.setitem(sys.modules, "telegram.constants", _fake_telegram_constants)
-
-
-def _make_adapter():
-    from gateway.platforms.telegram import TelegramAdapter
-
-    config = PlatformConfig(enabled=True, token="fake-token")
-    adapter = object.__new__(TelegramAdapter)
-    adapter._config = config
-    adapter._platform = Platform.TELEGRAM
-    adapter._connected = True
-    adapter._dm_topics = {}
-    adapter._dm_topics_config = []
-    adapter._reply_to_mode = "first"
-    adapter._fallback_ips = []
-    adapter._polling_conflict_count = 0
-    adapter._polling_network_error_count = 0
-    adapter._polling_error_callback_ref = None
-    adapter.platform = Platform.TELEGRAM
-    return adapter
-
-
-@pytest.mark.asyncio
-async def test_send_retries_without_thread_on_thread_not_found():
-    """When message_thread_id causes 'thread not found', retry without it."""
-    adapter = _make_adapter()
-
-    call_log = []
-
-    async def mock_send_message(**kwargs):
-        call_log.append(dict(kwargs))
-        tid = kwargs.get("message_thread_id")
-        if tid is not None:
-            raise FakeBadRequest("Message thread not found")
-        return SimpleNamespace(message_id=42)
-
-    adapter._bot = SimpleNamespace(send_message=mock_send_message)
-
-    result = await adapter.send(
-        chat_id="123",
-        content="test message",
-        metadata={"thread_id": "99999"},
-    )
-
-    assert result.success is True
-    assert result.message_id == "42"
-    # First call has thread_id, second call retries without
-    assert len(call_log) == 2
-    assert call_log[0]["message_thread_id"] == 99999
-    assert call_log[1]["message_thread_id"] is None
-
-
-@pytest.mark.asyncio
-async def test_send_raises_on_other_bad_request():
-    """Non-thread BadRequest errors should NOT be retried — they fail immediately."""
-    adapter = _make_adapter()
-
-    async def mock_send_message(**kwargs):
-        raise FakeBadRequest("Chat not found")
-
-    adapter._bot = SimpleNamespace(send_message=mock_send_message)
-
-    result = await adapter.send(
-        chat_id="123",
-        content="test message",
-        metadata={"thread_id": "99999"},
-    )
-
-    assert result.success is False
-    assert "Chat not found" in result.error
-
-
-@pytest.mark.asyncio
-async def test_send_without_thread_id_unaffected():
-    """Normal sends without thread_id should work as before."""
-    adapter = _make_adapter()
-
-    call_log = []
-
-    async def mock_send_message(**kwargs):
-        call_log.append(dict(kwargs))
-        return SimpleNamespace(message_id=100)
-
-    adapter._bot = SimpleNamespace(send_message=mock_send_message)
-
-    result = await adapter.send(
-        chat_id="123",
-        content="test message",
-    )
-
-    assert result.success is True
-    assert len(call_log) == 1
-    assert call_log[0]["message_thread_id"] is None
-
-
-@pytest.mark.asyncio
-async def test_send_retries_network_errors_normally():
-    """Real transient network errors (not BadRequest) should still be retried."""
-    adapter = _make_adapter()
-
-    attempt = [0]
-
-    async def mock_send_message(**kwargs):
-        attempt[0] += 1
-        if attempt[0] < 3:
-            raise FakeNetworkError("Connection reset")
-        return SimpleNamespace(message_id=200)
-
-    adapter._bot = SimpleNamespace(send_message=mock_send_message)
-
-    result = await adapter.send(
-        chat_id="123",
-        content="test message",
-    )
-
-    assert result.success is True
-    assert attempt[0] == 3  # Two retries then success
-
-
-@pytest.mark.asyncio
-async def test_thread_fallback_only_fires_once():
-    """After clearing thread_id, subsequent chunks should also use None."""
-    adapter = _make_adapter()
-
-    call_log = []
-
-    async def mock_send_message(**kwargs):
-        call_log.append(dict(kwargs))
-        tid = kwargs.get("message_thread_id")
-        if tid is not None:
-            raise FakeBadRequest("Message thread not found")
-        return SimpleNamespace(message_id=42)
-
-    adapter._bot = SimpleNamespace(send_message=mock_send_message)
-
-    # Send a long message that gets split into chunks
-    long_msg = "A" * 5000  # Exceeds Telegram's 4096 limit
-    result = await adapter.send(
-        chat_id="123",
-        content=long_msg,
-        metadata={"thread_id": "99999"},
-    )
-
-    assert result.success is True
-    # First chunk: attempt with thread → fail → retry without → succeed
-    # Second chunk: should use thread_id=None directly (effective_thread_id
-    # was cleared per-chunk but the metadata doesn't change between chunks)
-    # The key point: the message was delivered despite the invalid thread
@@ -1,87 +0,0 @@
-"""Tests for webhook adapter dynamic route loading."""
-
-import json
-import os
-import pytest
-from pathlib import Path
-
-from gateway.config import PlatformConfig
-from gateway.platforms.webhook import WebhookAdapter, _DYNAMIC_ROUTES_FILENAME
-
-
-def _make_adapter(routes=None, extra=None):
-    _extra = extra or {}
-    if routes:
-        _extra["routes"] = routes
-    _extra.setdefault("secret", "test-global-secret")
-    config = PlatformConfig(enabled=True, extra=_extra)
-    return WebhookAdapter(config)
-
-
-@pytest.fixture(autouse=True)
-def _isolate(tmp_path, monkeypatch):
-    monkeypatch.setenv("HERMES_HOME", str(tmp_path))
-
-
-class TestDynamicRouteLoading:
-    def test_no_dynamic_file(self):
-        adapter = _make_adapter(routes={"static": {"secret": "s"}})
-        adapter._reload_dynamic_routes()
-        assert "static" in adapter._routes
-        assert len(adapter._dynamic_routes) == 0
-
-    def test_loads_dynamic_routes(self, tmp_path):
-        subs = {"my-hook": {"secret": "dynamic-secret", "prompt": "test", "events": []}}
-        (tmp_path / _DYNAMIC_ROUTES_FILENAME).write_text(json.dumps(subs))
-
-        adapter = _make_adapter(routes={"static": {"secret": "s"}})
-        adapter._reload_dynamic_routes()
-        assert "my-hook" in adapter._routes
-        assert "static" in adapter._routes
-
-    def test_static_takes_precedence(self, tmp_path):
-        (tmp_path / _DYNAMIC_ROUTES_FILENAME).write_text(
-            json.dumps({"conflict": {"secret": "dynamic", "prompt": "dyn"}})
-        )
-        adapter = _make_adapter(routes={"conflict": {"secret": "static", "prompt": "stat"}})
-        adapter._reload_dynamic_routes()
-        assert adapter._routes["conflict"]["secret"] == "static"
-
-    def test_mtime_gated(self, tmp_path):
-        import time
-        path = tmp_path / _DYNAMIC_ROUTES_FILENAME
-        path.write_text(json.dumps({"v1": {"secret": "s"}}))
-
-        adapter = _make_adapter()
-        adapter._reload_dynamic_routes()
-        assert "v1" in adapter._dynamic_routes
-
-        # Same mtime — no reload
-        adapter._dynamic_routes["injected"] = True
-        adapter._reload_dynamic_routes()
-        assert "injected" in adapter._dynamic_routes
-
-        # New write — reloads
-        time.sleep(0.05)
-        path.write_text(json.dumps({"v2": {"secret": "s"}}))
-        adapter._reload_dynamic_routes()
-        assert "v2" in adapter._dynamic_routes
-        assert "v1" not in adapter._dynamic_routes
-
-    def test_file_removal_clears(self, tmp_path):
-        path = tmp_path / _DYNAMIC_ROUTES_FILENAME
-        path.write_text(json.dumps({"temp": {"secret": "s"}}))
-        adapter = _make_adapter()
-        adapter._reload_dynamic_routes()
-        assert "temp" in adapter._dynamic_routes
-
-        path.unlink()
-        adapter._reload_dynamic_routes()
-        assert len(adapter._dynamic_routes) == 0
-
-    def test_corrupted_file(self, tmp_path):
-        (tmp_path / _DYNAMIC_ROUTES_FILENAME).write_text("not json")
-        adapter = _make_adapter(routes={"static": {"secret": "s"}})
-        adapter._reload_dynamic_routes()
-        assert "static" in adapter._routes
-        assert len(adapter._dynamic_routes) == 0
@@ -105,24 +105,3 @@ class TestCmdUpdateBranchFallback:
        commands = [" ".join(str(a) for a in c.args[0]) for c in mock_run.call_args_list]
        pull_cmds = [c for c in commands if "pull" in c]
        assert len(pull_cmds) == 0
-
-    def test_update_non_interactive_skips_migration_prompt(self, mock_args, capsys):
-        """When stdin/stdout aren't TTYs, config migration prompt is skipped."""
-        with patch("shutil.which", return_value=None), patch(
-            "subprocess.run"
-        ) as mock_run, patch("builtins.input") as mock_input, patch(
-            "hermes_cli.config.get_missing_env_vars", return_value=["MISSING_KEY"]
-        ), patch("hermes_cli.config.get_missing_config_fields", return_value=[]), patch(
-            "hermes_cli.config.check_config_version", return_value=(1, 2)
-        ), patch("hermes_cli.main.sys") as mock_sys:
-            mock_sys.stdin.isatty.return_value = False
-            mock_sys.stdout.isatty.return_value = False
-            mock_run.side_effect = _make_run_side_effect(
-                branch="main", verify_ok=True, commit_count="1"
-            )
-
-            cmd_update(mock_args)
-
-            mock_input.assert_not_called()
-            captured = capsys.readouterr()
-            assert "Non-interactive session" in captured.out
@@ -1,7 +1,6 @@
 """Tests for gateway service management helpers."""

 import os
-from pathlib import Path
 from types import SimpleNamespace

 import hermes_cli.gateway as gateway_cli
@@ -153,13 +152,12 @@ class TestLaunchdServiceRecovery:
    def test_launchd_start_reloads_unloaded_job_and_retries(self, tmp_path, monkeypatch):
        plist_path = tmp_path / "ai.hermes.gateway.plist"
        plist_path.write_text(gateway_cli.generate_launchd_plist(), encoding="utf-8")
-        label = gateway_cli.get_launchd_label()

        calls = []

        def fake_run(cmd, check=False, **kwargs):
            calls.append(cmd)
-            if cmd == ["launchctl", "start", label] and calls.count(cmd) == 1:
+            if cmd == ["launchctl", "start", "ai.hermes.gateway"] and calls.count(cmd) == 1:
                raise gateway_cli.subprocess.CalledProcessError(3, cmd, stderr="Could not find service")
            return SimpleNamespace(returncode=0, stdout="", stderr="")

@@ -169,9 +167,9 @@ class TestLaunchdServiceRecovery:
        gateway_cli.launchd_start()

        assert calls == [
-            ["launchctl", "start", label],
+            ["launchctl", "start", "ai.hermes.gateway"],
            ["launchctl", "load", str(plist_path)],
-            ["launchctl", "start", label],
+            ["launchctl", "start", "ai.hermes.gateway"],
        ]

    def test_launchd_status_reports_local_stale_plist_when_unloaded(self, tmp_path, monkeypatch, capsys):
@@ -356,20 +354,6 @@ class TestGeneratedUnitUsesDetectedVenv:
        assert "/venv/" not in unit or "/.venv/" in unit


-class TestGeneratedUnitIncludesLocalBin:
-    """~/.local/bin must be in PATH so uvx/pipx tools are discoverable."""
-
-    def test_user_unit_includes_local_bin_in_path(self):
-        unit = gateway_cli.generate_systemd_unit(system=False)
-        home = str(Path.home())
-        assert f"{home}/.local/bin" in unit
-
-    def test_system_unit_includes_local_bin_in_path(self):
-        unit = gateway_cli.generate_systemd_unit(system=True)
-        # System unit uses the resolved home dir from _system_service_identity
-        assert "/.local/bin" in unit
-
-
 class TestEnsureUserSystemdEnv:
    """Tests for _ensure_user_systemd_env() D-Bus session bus auto-detection."""

@@ -1,13 +1,10 @@
 """
-Tests for skip_confirm and invalidate_cache behavior in /skills install
-and /skills uninstall slash commands.
+Tests for skip_confirm behavior in /skills install and /skills uninstall.

-Slash commands always skip confirmation (input() hangs in TUI).
-Cache invalidation is deferred by default; --now opts into immediate
-invalidation (at the cost of breaking prompt cache mid-session).
+Verifies that --yes / -y bypasses the interactive confirmation prompt
+that hangs inside prompt_toolkit's TUI.

 Based on PR #1595 by 333Alden333 (salvaged).
-Updated for PR #3586 (cache-aware install/uninstall).
 """

 from unittest.mock import patch, MagicMock
@@ -35,43 +32,23 @@ class TestHandleSkillsSlashInstallFlags:
            _, kwargs = mock_install.call_args
            assert kwargs.get("skip_confirm") is True

-    def test_force_flag_sets_force(self):
+    def test_force_flag_sets_force_not_skip(self):
        from hermes_cli.skills_hub import handle_skills_slash
        with patch("hermes_cli.skills_hub.do_install") as mock_install:
            handle_skills_slash("/skills install test/skill --force")
            mock_install.assert_called_once()
            _, kwargs = mock_install.call_args
            assert kwargs.get("force") is True
-            # Slash commands always skip confirmation (input() hangs in TUI)
-            assert kwargs.get("skip_confirm") is True
+            assert kwargs.get("skip_confirm") is False

-    def test_no_flags_still_skips_confirm(self):
-        """Slash commands always skip confirmation — input() hangs in TUI."""
+    def test_no_flags(self):
        from hermes_cli.skills_hub import handle_skills_slash
        with patch("hermes_cli.skills_hub.do_install") as mock_install:
            handle_skills_slash("/skills install test/skill")
            mock_install.assert_called_once()
            _, kwargs = mock_install.call_args
            assert kwargs.get("force") is False
-            assert kwargs.get("skip_confirm") is True
-
-    def test_default_defers_cache_invalidation(self):
-        """Without --now, cache invalidation is deferred to next session."""
-        from hermes_cli.skills_hub import handle_skills_slash
-        with patch("hermes_cli.skills_hub.do_install") as mock_install:
-            handle_skills_slash("/skills install test/skill")
-            mock_install.assert_called_once()
-            _, kwargs = mock_install.call_args
-            assert kwargs.get("invalidate_cache") is False
-
-    def test_now_flag_invalidates_cache(self):
-        """--now opts into immediate cache invalidation."""
-        from hermes_cli.skills_hub import handle_skills_slash
-        with patch("hermes_cli.skills_hub.do_install") as mock_install:
-            handle_skills_slash("/skills install test/skill --now")
-            mock_install.assert_called_once()
-            _, kwargs = mock_install.call_args
-            assert kwargs.get("invalidate_cache") is True
+            assert kwargs.get("skip_confirm") is False


 class TestHandleSkillsSlashUninstallFlags:
@@ -93,32 +70,13 @@ class TestHandleSkillsSlashUninstallFlags:
            _, kwargs = mock_uninstall.call_args
            assert kwargs.get("skip_confirm") is True

-    def test_no_flags_still_skips_confirm(self):
-        """Slash commands always skip confirmation — input() hangs in TUI."""
+    def test_no_flags(self):
        from hermes_cli.skills_hub import handle_skills_slash
        with patch("hermes_cli.skills_hub.do_uninstall") as mock_uninstall:
            handle_skills_slash("/skills uninstall test-skill")
            mock_uninstall.assert_called_once()
            _, kwargs = mock_uninstall.call_args
-            assert kwargs.get("skip_confirm") is True
-
-    def test_default_defers_cache_invalidation(self):
-        """Without --now, cache invalidation is deferred to next session."""
-        from hermes_cli.skills_hub import handle_skills_slash
-        with patch("hermes_cli.skills_hub.do_uninstall") as mock_uninstall:
-            handle_skills_slash("/skills uninstall test-skill")
-            mock_uninstall.assert_called_once()
-            _, kwargs = mock_uninstall.call_args
-            assert kwargs.get("invalidate_cache") is False
-
-    def test_now_flag_invalidates_cache(self):
-        """--now opts into immediate cache invalidation."""
-        from hermes_cli.skills_hub import handle_skills_slash
-        with patch("hermes_cli.skills_hub.do_uninstall") as mock_uninstall:
-            handle_skills_slash("/skills uninstall test-skill --now")
-            mock_uninstall.assert_called_once()
-            _, kwargs = mock_uninstall.call_args
-            assert kwargs.get("invalidate_cache") is True
+            assert kwargs.get("skip_confirm", False) is False


 class TestDoInstallSkipConfirm:
@@ -237,53 +237,3 @@ def test_save_platform_tools_still_preserves_mcp_with_platform_default_present()

    # Deselected configurable toolset removed
    assert "terminal" not in saved
-
-
-# ── Platform / toolset consistency ────────────────────────────────────────────
-
-
-class TestPlatformToolsetConsistency:
-    """Every platform in tools_config.PLATFORMS must have a matching toolset."""
-
-    def test_all_platforms_have_toolset_definitions(self):
-        """Each platform's default_toolset must exist in TOOLSETS."""
-        from hermes_cli.tools_config import PLATFORMS
-        from toolsets import TOOLSETS
-
-        for platform, meta in PLATFORMS.items():
-            ts_name = meta["default_toolset"]
-            assert ts_name in TOOLSETS, (
-                f"Platform {platform!r} references toolset {ts_name!r} "
-                f"which is not defined in toolsets.py"
-            )
-
-    def test_gateway_toolset_includes_all_messaging_platforms(self):
-        """hermes-gateway includes list should cover all messaging platforms."""
-        from hermes_cli.tools_config import PLATFORMS
-        from toolsets import TOOLSETS
-
-        gateway_includes = set(TOOLSETS["hermes-gateway"]["includes"])
-        # Exclude non-messaging platforms from the check
-        non_messaging = {"cli", "api_server"}
-        for platform, meta in PLATFORMS.items():
-            if platform in non_messaging:
-                continue
-            ts_name = meta["default_toolset"]
-            assert ts_name in gateway_includes, (
-                f"Platform {platform!r} toolset {ts_name!r} missing from "
-                f"hermes-gateway includes"
-            )
-
-    def test_skills_config_covers_tools_config_platforms(self):
-        """skills_config.PLATFORMS should have entries for all gateway platforms."""
-        from hermes_cli.tools_config import PLATFORMS as TOOLS_PLATFORMS
-        from hermes_cli.skills_config import PLATFORMS as SKILLS_PLATFORMS
-
-        non_messaging = {"api_server"}
-        for platform in TOOLS_PLATFORMS:
-            if platform in non_messaging:
-                continue
-            assert platform in SKILLS_PLATFORMS, (
-                f"Platform {platform!r} in tools_config but missing from "
-                f"skills_config PLATFORMS"
-            )
@@ -267,8 +267,7 @@ def test_restore_stashed_changes_user_declines_reset(monkeypatch, tmp_path, caps


 def test_restore_stashed_changes_auto_resets_non_interactive(monkeypatch, tmp_path, capsys):
-    """Non-interactive mode auto-resets without prompting and returns False
-    instead of sys.exit(1) so the update can continue (gateway /update path)."""
+    """Non-interactive mode auto-resets without prompting."""
    calls = []

    def fake_run(cmd, **kwargs):
@@ -283,9 +282,9 @@ def test_restore_stashed_changes_auto_resets_non_interactive(monkeypatch, tmp_pa

    monkeypatch.setattr(hermes_main.subprocess, "run", fake_run)

-    result = hermes_main._restore_stashed_changes(["git"], tmp_path, "abc123", prompt_user=False)
+    with pytest.raises(SystemExit, match="1"):
+        hermes_main._restore_stashed_changes(["git"], tmp_path, "abc123", prompt_user=False)

-    assert result is False
    out = capsys.readouterr().out
    assert "Working tree reset to clean state" in out
    reset_calls = [c for c, _ in calls if c[1:3] == ["reset", "--hard"]]
@@ -385,236 +384,3 @@ def test_cmd_update_succeeds_with_extras(monkeypatch, tmp_path):
    install_cmds = [c for c in recorded if "pip" in c and "install" in c]
    assert len(install_cmds) == 1
    assert ".[all]" in install_cmds[0]
-
-
-# ---------------------------------------------------------------------------
-# ff-only fallback to reset --hard on diverged history
-# ---------------------------------------------------------------------------
-
-def _make_update_side_effect(
-    current_branch="main",
-    commit_count="3",
-    ff_only_fails=False,
-    reset_fails=False,
-    fetch_fails=False,
-    fetch_stderr="",
-):
-    """Build a subprocess.run side_effect for cmd_update tests."""
-    recorded = []
-
-    def side_effect(cmd, **kwargs):
-        recorded.append(cmd)
-        joined = " ".join(str(c) for c in cmd)
-        if "fetch" in joined and "origin" in joined:
-            if fetch_fails:
-                return SimpleNamespace(stdout="", stderr=fetch_stderr, returncode=128)
-            return SimpleNamespace(stdout="", stderr="", returncode=0)
-        if "rev-parse" in joined and "--abbrev-ref" in joined:
-            return SimpleNamespace(stdout=f"{current_branch}\n", stderr="", returncode=0)
-        if "checkout" in joined and "main" in joined:
-            return SimpleNamespace(stdout="", stderr="", returncode=0)
-        if "rev-list" in joined:
-            return SimpleNamespace(stdout=f"{commit_count}\n", stderr="", returncode=0)
-        if "--ff-only" in joined:
-            if ff_only_fails:
-                return SimpleNamespace(
-                    stdout="",
-                    stderr="fatal: Not possible to fast-forward, aborting.\n",
-                    returncode=128,
-                )
-            return SimpleNamespace(stdout="Updating abc..def\n", stderr="", returncode=0)
-        if "reset" in joined and "--hard" in joined:
-            if reset_fails:
-                return SimpleNamespace(stdout="", stderr="error: unable to write\n", returncode=1)
-            return SimpleNamespace(stdout="HEAD is now at abc123\n", stderr="", returncode=0)
-        return SimpleNamespace(returncode=0, stdout="", stderr="")
-
-    return side_effect, recorded
-
-
-def test_cmd_update_falls_back_to_reset_when_ff_only_fails(monkeypatch, tmp_path, capsys):
-    """When --ff-only fails (diverged history), update resets to origin/{branch}."""
-    _setup_update_mocks(monkeypatch, tmp_path)
-    monkeypatch.setattr("shutil.which", lambda name: "/usr/bin/uv" if name == "uv" else None)
-
-    side_effect, recorded = _make_update_side_effect(ff_only_fails=True)
-    monkeypatch.setattr(hermes_main.subprocess, "run", side_effect)
-
-    hermes_main.cmd_update(SimpleNamespace())
-
-    reset_calls = [c for c in recorded if "reset" in c and "--hard" in c]
-    assert len(reset_calls) == 1
-    assert reset_calls[0] == ["git", "reset", "--hard", "origin/main"]
-
-    out = capsys.readouterr().out
-    assert "Fast-forward not possible" in out
-
-
-def test_cmd_update_no_reset_when_ff_only_succeeds(monkeypatch, tmp_path):
-    """When --ff-only succeeds, no reset is attempted."""
-    _setup_update_mocks(monkeypatch, tmp_path)
-    monkeypatch.setattr("shutil.which", lambda name: "/usr/bin/uv" if name == "uv" else None)
-
-    side_effect, recorded = _make_update_side_effect()
-    monkeypatch.setattr(hermes_main.subprocess, "run", side_effect)
-
-    hermes_main.cmd_update(SimpleNamespace())
-
-    reset_calls = [c for c in recorded if "reset" in c and "--hard" in c]
-    assert len(reset_calls) == 0
-
-
-# ---------------------------------------------------------------------------
-# Non-main branch → auto-checkout main
-# ---------------------------------------------------------------------------
-
-def test_cmd_update_switches_to_main_from_feature_branch(monkeypatch, tmp_path, capsys):
-    """When on a feature branch, update checks out main before pulling."""
-    _setup_update_mocks(monkeypatch, tmp_path)
-    monkeypatch.setattr("shutil.which", lambda name: "/usr/bin/uv" if name == "uv" else None)
-
-    side_effect, recorded = _make_update_side_effect(current_branch="fix/something")
-    monkeypatch.setattr(hermes_main.subprocess, "run", side_effect)
-
-    hermes_main.cmd_update(SimpleNamespace())
-
-    checkout_calls = [c for c in recorded if "checkout" in c and "main" in c]
-    assert len(checkout_calls) == 1
-
-    out = capsys.readouterr().out
-    assert "fix/something" in out
-    assert "switching to main" in out
-
-
-def test_cmd_update_switches_to_main_from_detached_head(monkeypatch, tmp_path, capsys):
-    """When in detached HEAD state, update checks out main before pulling."""
-    _setup_update_mocks(monkeypatch, tmp_path)
-    monkeypatch.setattr("shutil.which", lambda name: "/usr/bin/uv" if name == "uv" else None)
-
-    side_effect, recorded = _make_update_side_effect(current_branch="HEAD")
-    monkeypatch.setattr(hermes_main.subprocess, "run", side_effect)
-
-    hermes_main.cmd_update(SimpleNamespace())
-
-    checkout_calls = [c for c in recorded if "checkout" in c and "main" in c]
-    assert len(checkout_calls) == 1
-
-    out = capsys.readouterr().out
-    assert "detached HEAD" in out
-
-
-def test_cmd_update_restores_stash_and_branch_when_already_up_to_date(monkeypatch, tmp_path, capsys):
-    """When on a feature branch with no updates, stash is restored and branch switched back."""
-    _setup_update_mocks(monkeypatch, tmp_path)
-    monkeypatch.setattr("shutil.which", lambda name: "/usr/bin/uv" if name == "uv" else None)
-
-    # Enable stash so it returns a ref
-    monkeypatch.setattr(
-        hermes_main, "_stash_local_changes_if_needed",
-        lambda *a, **kw: "abc123deadbeef",
-    )
-    restore_calls = []
-    monkeypatch.setattr(
-        hermes_main, "_restore_stashed_changes",
-        lambda *a, **kw: restore_calls.append(1) or True,
-    )
-
-    side_effect, recorded = _make_update_side_effect(
-        current_branch="fix/something", commit_count="0",
-    )
-    monkeypatch.setattr(hermes_main.subprocess, "run", side_effect)
-
-    hermes_main.cmd_update(SimpleNamespace())
-
-    # Stash should have been restored
-    assert len(restore_calls) == 1
-
-    # Should have checked out back to the original branch
-    checkout_back = [c for c in recorded if "checkout" in c and "fix/something" in c]
-    assert len(checkout_back) == 1
-
-    out = capsys.readouterr().out
-    assert "Already up to date" in out
-
-
-def test_cmd_update_no_checkout_when_already_on_main(monkeypatch, tmp_path):
-    """When already on main, no checkout is needed."""
-    _setup_update_mocks(monkeypatch, tmp_path)
-    monkeypatch.setattr("shutil.which", lambda name: "/usr/bin/uv" if name == "uv" else None)
-
-    side_effect, recorded = _make_update_side_effect()
-    monkeypatch.setattr(hermes_main.subprocess, "run", side_effect)
-
-    hermes_main.cmd_update(SimpleNamespace())
-
-    checkout_calls = [c for c in recorded if "checkout" in c]
-    assert len(checkout_calls) == 0
-
-
-# ---------------------------------------------------------------------------
-# Fetch failure — friendly error messages
-# ---------------------------------------------------------------------------
-
-def test_cmd_update_network_error_shows_friendly_message(monkeypatch, tmp_path, capsys):
-    """Network failures during fetch show a user-friendly message."""
-    _setup_update_mocks(monkeypatch, tmp_path)
-
-    side_effect, _ = _make_update_side_effect(
-        fetch_fails=True,
-        fetch_stderr="fatal: unable to access 'https://...': Could not resolve host: github.com",
-    )
-    monkeypatch.setattr(hermes_main.subprocess, "run", side_effect)
-
-    with pytest.raises(SystemExit, match="1"):
-        hermes_main.cmd_update(SimpleNamespace())
-
-    out = capsys.readouterr().out
-    assert "Network error" in out
-
-
-def test_cmd_update_auth_error_shows_friendly_message(monkeypatch, tmp_path, capsys):
-    """Auth failures during fetch show a user-friendly message."""
-    _setup_update_mocks(monkeypatch, tmp_path)
-
-    side_effect, _ = _make_update_side_effect(
-        fetch_fails=True,
-        fetch_stderr="fatal: Authentication failed for 'https://...'",
-    )
-    monkeypatch.setattr(hermes_main.subprocess, "run", side_effect)
-
-    with pytest.raises(SystemExit, match="1"):
-        hermes_main.cmd_update(SimpleNamespace())
-
-    out = capsys.readouterr().out
-    assert "Authentication failed" in out
-
-
-# ---------------------------------------------------------------------------
-# reset --hard failure — don't attempt stash restore
-# ---------------------------------------------------------------------------
-
-def test_cmd_update_skips_stash_restore_when_reset_fails(monkeypatch, tmp_path, capsys):
-    """When reset --hard fails, stash restore is skipped with a helpful message."""
-    _setup_update_mocks(monkeypatch, tmp_path)
-    # Re-enable stash so it actually returns a ref
-    monkeypatch.setattr(
-        hermes_main, "_stash_local_changes_if_needed",
-        lambda *a, **kw: "abc123deadbeef",
-    )
-    restore_calls = []
-    monkeypatch.setattr(
-        hermes_main, "_restore_stashed_changes",
-        lambda *a, **kw: restore_calls.append(1) or True,
-    )
-
-    side_effect, _ = _make_update_side_effect(ff_only_fails=True, reset_fails=True)
-    monkeypatch.setattr(hermes_main.subprocess, "run", side_effect)
-
-    with pytest.raises(SystemExit, match="1"):
-        hermes_main.cmd_update(SimpleNamespace())
-
-    # Stash restore should NOT have been called
-    assert len(restore_calls) == 0
-
-    out = capsys.readouterr().out
-    assert "preserved in stash" in out
@@ -101,69 +101,6 @@ class TestLaunchdPlistReplace:
        assert replace_idx == run_idx + 1


-class TestLaunchdPlistPath:
-    def test_plist_contains_environment_variables(self):
-        plist = gateway_cli.generate_launchd_plist()
-        assert "<key>EnvironmentVariables</key>" in plist
-        assert "<key>PATH</key>" in plist
-        assert "<key>VIRTUAL_ENV</key>" in plist
-        assert "<key>HERMES_HOME</key>" in plist
-
-    def test_plist_path_includes_venv_bin(self):
-        plist = gateway_cli.generate_launchd_plist()
-        detected = gateway_cli._detect_venv_dir()
-        venv_bin = str(detected / "bin") if detected else str(gateway_cli.PROJECT_ROOT / "venv" / "bin")
-        assert venv_bin in plist
-
-    def test_plist_path_starts_with_venv_bin(self):
-        plist = gateway_cli.generate_launchd_plist()
-        lines = plist.splitlines()
-        for i, line in enumerate(lines):
-            if "<key>PATH</key>" in line.strip():
-                path_value = lines[i + 1].strip()
-                path_value = path_value.replace("<string>", "").replace("</string>", "")
-                detected = gateway_cli._detect_venv_dir()
-                venv_bin = str(detected / "bin") if detected else str(gateway_cli.PROJECT_ROOT / "venv" / "bin")
-                assert path_value.startswith(venv_bin + ":")
-                break
-        else:
-            raise AssertionError("PATH key not found in plist")
-
-    def test_plist_path_includes_node_modules_bin(self):
-        plist = gateway_cli.generate_launchd_plist()
-        node_bin = str(gateway_cli.PROJECT_ROOT / "node_modules" / ".bin")
-        lines = plist.splitlines()
-        for i, line in enumerate(lines):
-            if "<key>PATH</key>" in line.strip():
-                path_value = lines[i + 1].strip()
-                path_value = path_value.replace("<string>", "").replace("</string>", "")
-                assert node_bin in path_value.split(":")
-                break
-        else:
-            raise AssertionError("PATH key not found in plist")
-
-    def test_plist_path_includes_current_env_path(self, monkeypatch):
-        monkeypatch.setenv("PATH", "/custom/bin:/usr/bin:/bin")
-        plist = gateway_cli.generate_launchd_plist()
-        assert "/custom/bin" in plist
-
-    def test_plist_path_deduplicates_venv_bin_when_already_in_path(self, monkeypatch):
-        detected = gateway_cli._detect_venv_dir()
-        venv_bin = str(detected / "bin") if detected else str(gateway_cli.PROJECT_ROOT / "venv" / "bin")
-        monkeypatch.setenv("PATH", f"{venv_bin}:/usr/bin:/bin")
-        plist = gateway_cli.generate_launchd_plist()
-        lines = plist.splitlines()
-        for i, line in enumerate(lines):
-            if "<key>PATH</key>" in line.strip():
-                path_value = lines[i + 1].strip()
-                path_value = path_value.replace("<string>", "").replace("</string>", "")
-                parts = path_value.split(":")
-                assert parts.count(venv_bin) == 1
-                break
-        else:
-            raise AssertionError("PATH key not found in plist")
-
-
 # ---------------------------------------------------------------------------
 # cmd_update — macOS launchd detection
 # ---------------------------------------------------------------------------
@@ -240,33 +177,6 @@ class TestLaunchdPlistRefresh:
        assert any("unload" in s for s in cmd_strs)
        assert any("start" in s for s in cmd_strs)

-    def test_launchd_start_recreates_missing_plist_and_loads_service(self, tmp_path, monkeypatch):
-        """launchd_start self-heals when the plist file is missing entirely."""
-        plist_path = tmp_path / "ai.hermes.gateway.plist"
-        assert not plist_path.exists()
-
-        monkeypatch.setattr(gateway_cli, "get_launchd_plist_path", lambda: plist_path)
-
-        calls = []
-        def fake_run(cmd, check=False, **kwargs):
-            calls.append(cmd)
-            return SimpleNamespace(returncode=0, stdout="", stderr="")
-
-        monkeypatch.setattr(gateway_cli.subprocess, "run", fake_run)
-
-        gateway_cli.launchd_start()
-
-        # Should have created the plist
-        assert plist_path.exists()
-        assert "--replace" in plist_path.read_text()
-
-        cmd_strs = [" ".join(c) for c in calls]
-        # Should load the new plist, then start
-        assert any("load" in s for s in cmd_strs)
-        assert any("start" in s for s in cmd_strs)
-        # Should NOT call unload (nothing to unload)
-        assert not any("unload" in s for s in cmd_strs)
-

 class TestCmdUpdateLaunchdRestart:
    """cmd_update correctly detects and handles launchd on macOS."""
@@ -1,189 +0,0 @@
-"""Tests for hermes_cli/webhook.py — webhook subscription CLI."""
-
-import json
-import os
-import pytest
-from argparse import Namespace
-from pathlib import Path
-
-from hermes_cli.webhook import (
-    webhook_command,
-    _load_subscriptions,
-    _save_subscriptions,
-    _subscriptions_path,
-    _is_webhook_enabled,
-)
-
-
-@pytest.fixture(autouse=True)
-def _isolate(tmp_path, monkeypatch):
-    monkeypatch.setenv("HERMES_HOME", str(tmp_path))
-    # Default: webhooks enabled (most tests need this)
-    monkeypatch.setattr(
-        "hermes_cli.webhook._is_webhook_enabled", lambda: True
-    )
-
-
-def _make_args(**kwargs):
-    defaults = {
-        "webhook_action": None,
-        "name": "",
-        "prompt": "",
-        "events": "",
-        "description": "",
-        "skills": "",
-        "deliver": "log",
-        "deliver_chat_id": "",
-        "secret": "",
-        "payload": "",
-    }
-    defaults.update(kwargs)
-    return Namespace(**defaults)
-
-
-class TestSubscribe:
-    def test_basic_create(self, capsys):
-        webhook_command(_make_args(webhook_action="subscribe", name="test-hook"))
-        out = capsys.readouterr().out
-        assert "Created" in out
-        assert "/webhooks/test-hook" in out
-        subs = _load_subscriptions()
-        assert "test-hook" in subs
-
-    def test_with_options(self, capsys):
-        webhook_command(_make_args(
-            webhook_action="subscribe",
-            name="gh-issues",
-            events="issues,pull_request",
-            prompt="Issue: {issue.title}",
-            deliver="telegram",
-            deliver_chat_id="12345",
-            description="Watch GitHub",
-        ))
-        subs = _load_subscriptions()
-        route = subs["gh-issues"]
-        assert route["events"] == ["issues", "pull_request"]
-        assert route["prompt"] == "Issue: {issue.title}"
-        assert route["deliver"] == "telegram"
-        assert route["deliver_extra"] == {"chat_id": "12345"}
-
-    def test_custom_secret(self):
-        webhook_command(_make_args(
-            webhook_action="subscribe", name="s", secret="my-secret"
-        ))
-        assert _load_subscriptions()["s"]["secret"] == "my-secret"
-
-    def test_auto_secret(self):
-        webhook_command(_make_args(webhook_action="subscribe", name="s"))
-        secret = _load_subscriptions()["s"]["secret"]
-        assert len(secret) > 20
-
-    def test_update(self, capsys):
-        webhook_command(_make_args(webhook_action="subscribe", name="x", prompt="v1"))
-        webhook_command(_make_args(webhook_action="subscribe", name="x", prompt="v2"))
-        out = capsys.readouterr().out
-        assert "Updated" in out
-        assert _load_subscriptions()["x"]["prompt"] == "v2"
-
-    def test_invalid_name(self, capsys):
-        webhook_command(_make_args(webhook_action="subscribe", name="bad name!"))
-        out = capsys.readouterr().out
-        assert "Error" in out or "Invalid" in out
-        assert _load_subscriptions() == {}
-
-
-class TestList:
-    def test_empty(self, capsys):
-        webhook_command(_make_args(webhook_action="list"))
-        out = capsys.readouterr().out
-        assert "No dynamic" in out
-
-    def test_with_entries(self, capsys):
-        webhook_command(_make_args(webhook_action="subscribe", name="a"))
-        webhook_command(_make_args(webhook_action="subscribe", name="b"))
-        capsys.readouterr()  # clear
-        webhook_command(_make_args(webhook_action="list"))
-        out = capsys.readouterr().out
-        assert "2 webhook" in out
-        assert "a" in out
-        assert "b" in out
-
-
-class TestRemove:
-    def test_remove_existing(self, capsys):
-        webhook_command(_make_args(webhook_action="subscribe", name="temp"))
-        webhook_command(_make_args(webhook_action="remove", name="temp"))
-        out = capsys.readouterr().out
-        assert "Removed" in out
-        assert _load_subscriptions() == {}
-
-    def test_remove_nonexistent(self, capsys):
-        webhook_command(_make_args(webhook_action="remove", name="nope"))
-        out = capsys.readouterr().out
-        assert "No subscription" in out
-
-    def test_selective_remove(self):
-        webhook_command(_make_args(webhook_action="subscribe", name="keep"))
-        webhook_command(_make_args(webhook_action="subscribe", name="drop"))
-        webhook_command(_make_args(webhook_action="remove", name="drop"))
-        subs = _load_subscriptions()
-        assert "keep" in subs
-        assert "drop" not in subs
-
-
-class TestPersistence:
-    def test_file_written(self):
-        webhook_command(_make_args(webhook_action="subscribe", name="persist"))
-        path = _subscriptions_path()
-        assert path.exists()
-        data = json.loads(path.read_text())
-        assert "persist" in data
-
-    def test_corrupted_file(self):
-        path = _subscriptions_path()
-        path.parent.mkdir(parents=True, exist_ok=True)
-        path.write_text("broken{{{")
-        assert _load_subscriptions() == {}
-
-
-class TestWebhookEnabledGate:
-    def test_blocks_when_disabled(self, capsys, monkeypatch):
-        monkeypatch.setattr("hermes_cli.webhook._is_webhook_enabled", lambda: False)
-        webhook_command(_make_args(webhook_action="subscribe", name="blocked"))
-        out = capsys.readouterr().out
-        assert "not enabled" in out.lower()
-        assert "hermes gateway setup" in out
-        assert _load_subscriptions() == {}
-
-    def test_blocks_list_when_disabled(self, capsys, monkeypatch):
-        monkeypatch.setattr("hermes_cli.webhook._is_webhook_enabled", lambda: False)
-        webhook_command(_make_args(webhook_action="list"))
-        out = capsys.readouterr().out
-        assert "not enabled" in out.lower()
-
-    def test_allows_when_enabled(self, capsys):
-        # _is_webhook_enabled already patched to True by autouse fixture
-        webhook_command(_make_args(webhook_action="subscribe", name="allowed"))
-        out = capsys.readouterr().out
-        assert "Created" in out
-        assert "allowed" in _load_subscriptions()
-
-    def test_real_check_disabled(self, monkeypatch):
-        monkeypatch.setattr(
-            "hermes_cli.webhook._get_webhook_config",
-            lambda: {},
-        )
-        monkeypatch.setattr(
-            "hermes_cli.webhook._is_webhook_enabled",
-            lambda: bool({}.get("enabled")),
-        )
-        import hermes_cli.webhook as wh_mod
-        assert wh_mod._is_webhook_enabled() is False
-
-    def test_real_check_enabled(self, monkeypatch):
-        monkeypatch.setattr(
-            "hermes_cli.webhook._is_webhook_enabled",
-            lambda: True,
-        )
-        import hermes_cli.webhook as wh_mod
-        assert wh_mod._is_webhook_enabled() is True
@@ -926,8 +926,7 @@ class TestBuildAnthropicKwargs:
        )
        assert "thinking" not in kwargs

-    def test_default_max_tokens_uses_model_output_limit(self):
-        """When max_tokens is None, use the model's native output limit."""
+    def test_default_max_tokens(self):
        kwargs = build_anthropic_kwargs(
            model="claude-sonnet-4-20250514",
            messages=[{"role": "user", "content": "Hi"}],
@@ -935,135 +934,7 @@ class TestBuildAnthropicKwargs:
            max_tokens=None,
            reasoning_config=None,
        )
-        assert kwargs["max_tokens"] == 64_000  # Sonnet 4 output limit
-
-    def test_default_max_tokens_opus_4_6(self):
-        kwargs = build_anthropic_kwargs(
-            model="claude-opus-4-6",
-            messages=[{"role": "user", "content": "Hi"}],
-            tools=None,
-            max_tokens=None,
-            reasoning_config=None,
-        )
-        assert kwargs["max_tokens"] == 128_000
-
-    def test_default_max_tokens_sonnet_4_6(self):
-        kwargs = build_anthropic_kwargs(
-            model="claude-sonnet-4-6",
-            messages=[{"role": "user", "content": "Hi"}],
-            tools=None,
-            max_tokens=None,
-            reasoning_config=None,
-        )
-        assert kwargs["max_tokens"] == 64_000
-
-    def test_default_max_tokens_date_stamped_model(self):
-        """Date-stamped model IDs should resolve via substring match."""
-        kwargs = build_anthropic_kwargs(
-            model="claude-sonnet-4-5-20250929",
-            messages=[{"role": "user", "content": "Hi"}],
-            tools=None,
-            max_tokens=None,
-            reasoning_config=None,
-        )
-        assert kwargs["max_tokens"] == 64_000
-
-    def test_default_max_tokens_older_model(self):
-        kwargs = build_anthropic_kwargs(
-            model="claude-3-5-sonnet-20241022",
-            messages=[{"role": "user", "content": "Hi"}],
-            tools=None,
-            max_tokens=None,
-            reasoning_config=None,
-        )
-        assert kwargs["max_tokens"] == 8_192
-
-    def test_default_max_tokens_unknown_model_uses_highest(self):
-        """Unknown future models should get the highest known limit."""
-        kwargs = build_anthropic_kwargs(
-            model="claude-ultra-5-20260101",
-            messages=[{"role": "user", "content": "Hi"}],
-            tools=None,
-            max_tokens=None,
-            reasoning_config=None,
-        )
-        assert kwargs["max_tokens"] == 128_000
-
-    def test_explicit_max_tokens_overrides_default(self):
-        """User-specified max_tokens should be respected."""
-        kwargs = build_anthropic_kwargs(
-            model="claude-opus-4-6",
-            messages=[{"role": "user", "content": "Hi"}],
-            tools=None,
-            max_tokens=4096,
-            reasoning_config=None,
-        )
-        assert kwargs["max_tokens"] == 4096
-
-    def test_context_length_clamp(self):
-        """max_tokens should be clamped to context_length if it's smaller."""
-        kwargs = build_anthropic_kwargs(
-            model="claude-opus-4-6",  # 128K output
-            messages=[{"role": "user", "content": "Hi"}],
-            tools=None,
-            max_tokens=None,
-            reasoning_config=None,
-            context_length=50000,
-        )
-        assert kwargs["max_tokens"] == 49999  # context_length - 1
-
-    def test_context_length_no_clamp_when_larger(self):
-        """No clamping when context_length exceeds output limit."""
-        kwargs = build_anthropic_kwargs(
-            model="claude-sonnet-4-6",  # 64K output
-            messages=[{"role": "user", "content": "Hi"}],
-            tools=None,
-            max_tokens=None,
-            reasoning_config=None,
-            context_length=200000,
-        )
-        assert kwargs["max_tokens"] == 64_000
-
-
-# ---------------------------------------------------------------------------
-# Model output limit lookup
-# ---------------------------------------------------------------------------
-
-
-class TestGetAnthropicMaxOutput:
-    def test_opus_4_6(self):
-        from agent.anthropic_adapter import _get_anthropic_max_output
-        assert _get_anthropic_max_output("claude-opus-4-6") == 128_000
-
-    def test_opus_4_6_variant(self):
-        from agent.anthropic_adapter import _get_anthropic_max_output
-        assert _get_anthropic_max_output("claude-opus-4-6:1m:fast") == 128_000
-
-    def test_sonnet_4_6(self):
-        from agent.anthropic_adapter import _get_anthropic_max_output
-        assert _get_anthropic_max_output("claude-sonnet-4-6") == 64_000
-
-    def test_sonnet_4_date_stamped(self):
-        from agent.anthropic_adapter import _get_anthropic_max_output
-        assert _get_anthropic_max_output("claude-sonnet-4-20250514") == 64_000
-
-    def test_claude_3_5_sonnet(self):
-        from agent.anthropic_adapter import _get_anthropic_max_output
-        assert _get_anthropic_max_output("claude-3-5-sonnet-20241022") == 8_192
-
-    def test_claude_3_opus(self):
-        from agent.anthropic_adapter import _get_anthropic_max_output
-        assert _get_anthropic_max_output("claude-3-opus-20240229") == 4_096
-
-    def test_unknown_future_model(self):
-        from agent.anthropic_adapter import _get_anthropic_max_output
-        assert _get_anthropic_max_output("claude-ultra-5-20260101") == 128_000
-
-    def test_longest_prefix_wins(self):
-        """'claude-3-5-sonnet' should match before 'claude-3-5'."""
-        from agent.anthropic_adapter import _get_anthropic_max_output
-        # claude-3-5-sonnet (8192) should win over a hypothetical shorter match
-        assert _get_anthropic_max_output("claude-3-5-sonnet-20241022") == 8_192
+        assert kwargs["max_tokens"] == 16384


 # ---------------------------------------------------------------------------
@@ -38,7 +38,6 @@ class TestProviderRegistry:
    @pytest.mark.parametrize("provider_id,name,auth_type", [
        ("copilot-acp", "GitHub Copilot ACP", "external_process"),
        ("copilot", "GitHub Copilot", "api_key"),
-        ("huggingface", "Hugging Face", "api_key"),
        ("zai", "Z.AI / GLM", "api_key"),
        ("kimi-coding", "Kimi / Moonshot", "api_key"),
        ("minimax", "MiniMax", "api_key"),
@@ -88,11 +87,6 @@ class TestProviderRegistry:
        assert pconfig.api_key_env_vars == ("KILOCODE_API_KEY",)
        assert pconfig.base_url_env_var == "KILOCODE_BASE_URL"

-    def test_huggingface_env_vars(self):
-        pconfig = PROVIDER_REGISTRY["huggingface"]
-        assert pconfig.api_key_env_vars == ("HF_TOKEN",)
-        assert pconfig.base_url_env_var == "HF_BASE_URL"
-
    def test_base_urls(self):
        assert PROVIDER_REGISTRY["copilot"].inference_base_url == "https://api.githubcopilot.com"
        assert PROVIDER_REGISTRY["copilot-acp"].inference_base_url == "acp://copilot"
@@ -102,7 +96,6 @@ class TestProviderRegistry:
        assert PROVIDER_REGISTRY["minimax-cn"].inference_base_url == "https://api.minimaxi.com/anthropic"
        assert PROVIDER_REGISTRY["ai-gateway"].inference_base_url == "https://ai-gateway.vercel.sh/v1"
        assert PROVIDER_REGISTRY["kilocode"].inference_base_url == "https://api.kilo.ai/api/gateway"
-        assert PROVIDER_REGISTRY["huggingface"].inference_base_url == "https://router.huggingface.co/v1"

    def test_oauth_providers_unchanged(self):
        """Ensure we didn't break the existing OAuth providers."""
@@ -206,18 +199,6 @@ class TestResolveProvider:
        assert resolve_provider("github-copilot-acp") == "copilot-acp"
        assert resolve_provider("copilot-acp-agent") == "copilot-acp"

-    def test_explicit_huggingface(self):
-        assert resolve_provider("huggingface") == "huggingface"
-
-    def test_alias_hf(self):
-        assert resolve_provider("hf") == "huggingface"
-
-    def test_alias_hugging_face(self):
-        assert resolve_provider("hugging-face") == "huggingface"
-
-    def test_alias_huggingface_hub(self):
-        assert resolve_provider("huggingface-hub") == "huggingface"
-
    def test_unknown_provider_raises(self):
        with pytest.raises(AuthError):
            resolve_provider("nonexistent-provider-xyz")
@@ -254,10 +235,6 @@ class TestResolveProvider:
        monkeypatch.setenv("KILOCODE_API_KEY", "test-kilo-key")
        assert resolve_provider("auto") == "kilocode"

-    def test_auto_detects_hf_token(self, monkeypatch):
-        monkeypatch.setenv("HF_TOKEN", "hf_test_token")
-        assert resolve_provider("auto") == "huggingface"
-
    def test_openrouter_takes_priority_over_glm(self, monkeypatch):
        """OpenRouter API key should win over GLM in auto-detection."""
        monkeypatch.setenv("OPENROUTER_API_KEY", "or-key")
@@ -731,55 +708,3 @@ class TestKimiMoonshotModelListIsolation:
        coding_models = _PROVIDER_MODELS["kimi-coding"]
        assert "kimi-for-coding" in coding_models
        assert "kimi-k2-thinking-turbo" in coding_models
-
-
-# =============================================================================
-# Hugging Face provider model list tests
-# =============================================================================
-
-class TestHuggingFaceModels:
-    """Verify Hugging Face model lists are consistent across all locations."""
-
-    def test_main_provider_models_has_huggingface(self):
-        from hermes_cli.main import _PROVIDER_MODELS
-        assert "huggingface" in _PROVIDER_MODELS
-        models = _PROVIDER_MODELS["huggingface"]
-        assert len(models) >= 6, "Expected at least 6 curated HF models"
-
-    def test_models_py_has_huggingface(self):
-        from hermes_cli.models import _PROVIDER_MODELS
-        assert "huggingface" in _PROVIDER_MODELS
-        models = _PROVIDER_MODELS["huggingface"]
-        assert len(models) >= 6
-
-    def test_model_lists_match(self):
-        """Model lists in main.py and models.py should be identical."""
-        from hermes_cli.main import _PROVIDER_MODELS as main_models
-        from hermes_cli.models import _PROVIDER_MODELS as models_models
-        assert main_models["huggingface"] == models_models["huggingface"]
-
-    def test_model_metadata_has_context_lengths(self):
-        """Every HF model should have a context length entry."""
-        from hermes_cli.models import _PROVIDER_MODELS
-        from agent.model_metadata import DEFAULT_CONTEXT_LENGTHS
-        hf_models = _PROVIDER_MODELS["huggingface"]
-        for model in hf_models:
-            assert model in DEFAULT_CONTEXT_LENGTHS, (
-                f"HF model {model!r} missing from DEFAULT_CONTEXT_LENGTHS"
-            )
-
-    def test_models_use_org_name_format(self):
-        """HF models should use org/name format (e.g. Qwen/Qwen3-235B)."""
-        from hermes_cli.models import _PROVIDER_MODELS
-        for model in _PROVIDER_MODELS["huggingface"]:
-            assert "/" in model, f"HF model {model!r} missing org/ prefix"
-
-    def test_provider_aliases_in_models_py(self):
-        from hermes_cli.models import _PROVIDER_ALIASES
-        assert _PROVIDER_ALIASES.get("hf") == "huggingface"
-        assert _PROVIDER_ALIASES.get("hugging-face") == "huggingface"
-
-    def test_provider_label(self):
-        from hermes_cli.models import _PROVIDER_LABELS
-        assert "huggingface" in _PROVIDER_LABELS
-        assert _PROVIDER_LABELS["huggingface"] == "Hugging Face"
@@ -1,162 +0,0 @@
-"""Tests for the AsyncHttpxClientWrapper.__del__ neuter fix.
-
-The OpenAI SDK's ``AsyncHttpxClientWrapper.__del__`` schedules
-``aclose()`` via ``asyncio.get_running_loop().create_task()``.  When GC
-fires during CLI idle time, prompt_toolkit's event loop picks up the task
-and crashes with "Event loop is closed" because the underlying TCP
-transport is bound to a dead worker loop.
-
-The three-layer defence:
-1. ``neuter_async_httpx_del()`` replaces ``__del__`` with a no-op.
-2. A custom asyncio exception handler silences residual errors.
-3. ``cleanup_stale_async_clients()`` evicts stale cache entries.
-"""
-
-import asyncio
-import threading
-from types import SimpleNamespace
-from unittest.mock import MagicMock, patch
-
-import pytest
-
-
-# ---------------------------------------------------------------------------
-# Layer 1: neuter_async_httpx_del
-# ---------------------------------------------------------------------------
-
-class TestNeuterAsyncHttpxDel:
-    """Verify neuter_async_httpx_del replaces __del__ on the SDK class."""
-
-    def test_del_becomes_noop(self):
-        """After neuter, __del__ should do nothing (no RuntimeError)."""
-        from agent.auxiliary_client import neuter_async_httpx_del
-
-        try:
-            from openai._base_client import AsyncHttpxClientWrapper
-        except ImportError:
-            pytest.skip("openai SDK not installed")
-
-        # Save original so we can restore
-        original_del = AsyncHttpxClientWrapper.__del__
-        try:
-            neuter_async_httpx_del()
-            # The patched __del__ should be a no-op lambda
-            assert AsyncHttpxClientWrapper.__del__ is not original_del
-            # Calling it should not raise, even without a running loop
-            wrapper = MagicMock(spec=AsyncHttpxClientWrapper)
-            AsyncHttpxClientWrapper.__del__(wrapper)  # Should be silent
-        finally:
-            # Restore original to avoid leaking into other tests
-            AsyncHttpxClientWrapper.__del__ = original_del
-
-    def test_neuter_idempotent(self):
-        """Calling neuter twice doesn't break anything."""
-        from agent.auxiliary_client import neuter_async_httpx_del
-
-        try:
-            from openai._base_client import AsyncHttpxClientWrapper
-        except ImportError:
-            pytest.skip("openai SDK not installed")
-
-        original_del = AsyncHttpxClientWrapper.__del__
-        try:
-            neuter_async_httpx_del()
-            first_del = AsyncHttpxClientWrapper.__del__
-            neuter_async_httpx_del()
-            second_del = AsyncHttpxClientWrapper.__del__
-            # Both calls should succeed; the class should have a no-op
-            assert first_del is not original_del
-            assert second_del is not original_del
-        finally:
-            AsyncHttpxClientWrapper.__del__ = original_del
-
-    def test_neuter_graceful_without_sdk(self):
-        """neuter_async_httpx_del doesn't raise if the openai SDK isn't installed."""
-        from agent.auxiliary_client import neuter_async_httpx_del
-
-        with patch.dict("sys.modules", {"openai._base_client": None}):
-            # Should not raise
-            neuter_async_httpx_del()
-
-
-# ---------------------------------------------------------------------------
-# Layer 3: cleanup_stale_async_clients
-# ---------------------------------------------------------------------------
-
-class TestCleanupStaleAsyncClients:
-    """Verify stale cache entries are evicted and force-closed."""
-
-    def test_removes_stale_entries(self):
-        """Entries with a closed loop should be evicted."""
-        from agent.auxiliary_client import (
-            _client_cache,
-            _client_cache_lock,
-            cleanup_stale_async_clients,
-        )
-
-        # Create a loop, close it, make a cache entry
-        loop = asyncio.new_event_loop()
-        loop.close()
-
-        mock_client = MagicMock()
-        # Give it _client attribute for _force_close_async_httpx
-        mock_client._client = MagicMock()
-        mock_client._client.is_closed = False
-
-        key = ("test_stale", True, "", "", id(loop))
-        with _client_cache_lock:
-            _client_cache[key] = (mock_client, "test-model", loop)
-
-        try:
-            cleanup_stale_async_clients()
-            with _client_cache_lock:
-                assert key not in _client_cache, "Stale entry should be removed"
-        finally:
-            # Clean up in case test fails
-            with _client_cache_lock:
-                _client_cache.pop(key, None)
-
-    def test_keeps_live_entries(self):
-        """Entries with an open loop should be preserved."""
-        from agent.auxiliary_client import (
-            _client_cache,
-            _client_cache_lock,
-            cleanup_stale_async_clients,
-        )
-
-        loop = asyncio.new_event_loop()  # NOT closed
-
-        mock_client = MagicMock()
-        key = ("test_live", True, "", "", id(loop))
-        with _client_cache_lock:
-            _client_cache[key] = (mock_client, "test-model", loop)
-
-        try:
-            cleanup_stale_async_clients()
-            with _client_cache_lock:
-                assert key in _client_cache, "Live entry should be preserved"
-        finally:
-            loop.close()
-            with _client_cache_lock:
-                _client_cache.pop(key, None)
-
-    def test_keeps_entries_without_loop(self):
-        """Sync entries (cached_loop=None) should be preserved."""
-        from agent.auxiliary_client import (
-            _client_cache,
-            _client_cache_lock,
-            cleanup_stale_async_clients,
-        )
-
-        mock_client = MagicMock()
-        key = ("test_sync", False, "", "", 0)
-        with _client_cache_lock:
-            _client_cache[key] = (mock_client, "test-model", None)
-
-        try:
-            cleanup_stale_async_clients()
-            with _client_cache_lock:
-                assert key in _client_cache, "Sync entry should be preserved"
-        finally:
-            with _client_cache_lock:
-                _client_cache.pop(key, None)
@@ -69,12 +69,10 @@ class TestFormatContextPressure:
        assert isinstance(result, str)

    def test_over_100_percent_capped(self):
-        """Progress > 1.0 should cap both bar and percentage text at 100%."""
+        """Progress > 1.0 should not break the bar."""
        line = format_context_pressure(1.05, 100_000, 0.50)
        assert "▰" in line
        assert line.count("▰") == 20
-        assert "100%" in line
-        assert "105%" not in line


 class TestFormatContextPressureGateway:
@@ -102,13 +100,6 @@ class TestFormatContextPressureGateway:
        msg = format_context_pressure_gateway(0.80, 0.50)
        assert "▰" in msg

-    def test_over_100_percent_capped(self):
-        """Progress > 1.0 should cap percentage text at 100%."""
-        msg = format_context_pressure_gateway(1.09, 0.50)
-        assert "100% to compaction" in msg
-        assert "109%" not in msg
-        assert msg.count("▰") == 20
-

 # ---------------------------------------------------------------------------
 # AIAgent context pressure flag tests
@@ -1,154 +0,0 @@
-"""Tests for percentage clamping at 100% across display paths.
-
-PR #3480 capped context pressure percentage at 100% in agent/display.py
-but missed the same unclamped pattern in 4 other files. When token counts
-overshoot the context length (possible during streaming or before
-compression fires), users see >100% in /stats, gateway status, and
-memory tool output.
-"""
-
-import pytest
-
-
-class TestContextCompressorUsagePercent:
-    """agent/context_compressor.py — get_status() usage_percent"""
-
-    def test_usage_percent_capped_at_100(self):
-        """Tokens exceeding context_length should still show max 100%."""
-        from agent.context_compressor import ContextCompressor
-
-        comp = ContextCompressor.__new__(ContextCompressor)
-        comp.last_prompt_tokens = 210_000  # exceeds context_length
-        comp.context_length = 200_000
-        comp.threshold_tokens = 160_000
-        comp.compression_count = 0
-
-        status = comp.get_status()
-        assert status["usage_percent"] <= 100
-
-    def test_usage_percent_normal(self):
-        """Normal usage should show correct percentage."""
-        from agent.context_compressor import ContextCompressor
-
-        comp = ContextCompressor.__new__(ContextCompressor)
-        comp.last_prompt_tokens = 100_000
-        comp.context_length = 200_000
-        comp.threshold_tokens = 160_000
-        comp.compression_count = 0
-
-        status = comp.get_status()
-        assert status["usage_percent"] == 50.0
-
-    def test_usage_percent_zero_context_length(self):
-        """Zero context_length should return 0, not crash."""
-        from agent.context_compressor import ContextCompressor
-
-        comp = ContextCompressor.__new__(ContextCompressor)
-        comp.last_prompt_tokens = 1000
-        comp.context_length = 0
-        comp.threshold_tokens = 0
-        comp.compression_count = 0
-
-        status = comp.get_status()
-        assert status["usage_percent"] == 0
-
-
-class TestMemoryToolPercentClamp:
-    """tools/memory_tool.py — _success_response and _render_block pct"""
-
-    def test_over_limit_clamped_at_100(self):
-        """Percentage should be capped at 100 even if current > limit."""
-        # Simulate the calculation directly
-        current = 5500
-        limit = 5000
-        pct = min(100, int((current / limit) * 100)) if limit > 0 else 0
-        assert pct == 100
-
-    def test_normal_percentage(self):
-        current = 2500
-        limit = 5000
-        pct = min(100, int((current / limit) * 100)) if limit > 0 else 0
-        assert pct == 50
-
-    def test_zero_limit_returns_zero(self):
-        current = 100
-        limit = 0
-        pct = min(100, int((current / limit) * 100)) if limit > 0 else 0
-        assert pct == 0
-
-
-class TestCLIStatsPercentClamp:
-    """cli.py — /stats command percentage"""
-
-    def test_over_context_clamped_at_100(self):
-        """Tokens exceeding context_length should show max 100%."""
-        last_prompt = 210_000
-        ctx_len = 200_000
-        pct = min(100, (last_prompt / ctx_len * 100)) if ctx_len else 0
-        assert pct == 100
-
-    def test_normal_context(self):
-        last_prompt = 100_000
-        ctx_len = 200_000
-        pct = min(100, (last_prompt / ctx_len * 100)) if ctx_len else 0
-        assert pct == 50.0
-
-    def test_zero_context_length(self):
-        last_prompt = 1000
-        ctx_len = 0
-        pct = min(100, (last_prompt / ctx_len * 100)) if ctx_len else 0
-        assert pct == 0
-
-
-class TestGatewayStatsPercentClamp:
-    """gateway/run.py — _format_usage_stats percentage"""
-
-    def test_over_context_clamped_at_100(self):
-        last_prompt_tokens = 210_000
-        context_length = 200_000
-        pct = min(100, last_prompt_tokens / context_length * 100) if context_length else 0
-        assert pct == 100
-
-    def test_normal_context(self):
-        last_prompt_tokens = 150_000
-        context_length = 200_000
-        pct = min(100, last_prompt_tokens / context_length * 100) if context_length else 0
-        assert pct == 75.0
-
-
-class TestSourceLinesAreClamped:
-    """Verify the actual source files have min(100, ...) applied."""
-
-    @staticmethod
-    def _read_file(rel_path: str) -> str:
-        import os
-        base = os.path.dirname(os.path.dirname(__file__))
-        with open(os.path.join(base, rel_path)) as f:
-            return f.read()
-
-    def test_context_compressor_clamped(self):
-        src = self._read_file("agent/context_compressor.py")
-        assert "min(100," in src, (
-            "context_compressor.py usage_percent is not clamped with min(100, ...)"
-        )
-
-    def test_gateway_run_clamped(self):
-        src = self._read_file("gateway/run.py")
-        # Check that the stats handler has min(100, ...)
-        assert "min(100, ctx.last_prompt_tokens" in src, (
-            "gateway/run.py stats pct is not clamped with min(100, ...)"
-        )
-
-    def test_cli_clamped(self):
-        src = self._read_file("cli.py")
-        assert "min(100, (last_prompt" in src, (
-            "cli.py /stats pct is not clamped with min(100, ...)"
-        )
-
-    def test_memory_tool_clamped(self):
-        src = self._read_file("tools/memory_tool.py")
-        # Both _success_response and _render_block should have min(100, ...)
-        count = src.count("min(100, int((current / limit)")
-        assert count >= 2, (
-            f"memory_tool.py has only {count} clamped pct lines, expected >= 2"
-        )
@@ -226,42 +226,6 @@ class TestPluginHooks:
        # Should not raise despite 1/0
        mgr.invoke_hook("post_tool_call", tool_name="x", args={}, result="r", task_id="")

-    def test_hook_return_values_collected(self, tmp_path, monkeypatch):
-        """invoke_hook() collects non-None return values from callbacks."""
-        plugins_dir = tmp_path / "hermes_test" / "plugins"
-        _make_plugin_dir(
-            plugins_dir, "ctx_plugin",
-            register_body=(
-                'ctx.register_hook("pre_llm_call", '
-                'lambda **kw: {"context": "memory from plugin"})'
-            ),
-        )
-        monkeypatch.setenv("HERMES_HOME", str(tmp_path / "hermes_test"))
-
-        mgr = PluginManager()
-        mgr.discover_and_load()
-
-        results = mgr.invoke_hook("pre_llm_call", session_id="s1", user_message="hi",
-                                  conversation_history=[], is_first_turn=True, model="test")
-        assert len(results) == 1
-        assert results[0] == {"context": "memory from plugin"}
-
-    def test_hook_none_returns_excluded(self, tmp_path, monkeypatch):
-        """invoke_hook() excludes None returns from the result list."""
-        plugins_dir = tmp_path / "hermes_test" / "plugins"
-        _make_plugin_dir(
-            plugins_dir, "none_hook",
-            register_body='ctx.register_hook("post_llm_call", lambda **kw: None)',
-        )
-        monkeypatch.setenv("HERMES_HOME", str(tmp_path / "hermes_test"))
-
-        mgr = PluginManager()
-        mgr.discover_and_load()
-
-        results = mgr.invoke_hook("post_llm_call", session_id="s1",
-                                  user_message="hi", assistant_response="bye", model="test")
-        assert results == []
-
    def test_invalid_hook_name_warns(self, tmp_path, monkeypatch, caplog):
        """Registering an unknown hook name logs a warning."""
        plugins_dir = tmp_path / "hermes_test" / "plugins"
@@ -472,7 +472,6 @@ class TestInlineThinkBlockExtraction(unittest.TestCase):
        agent._extract_reasoning = AIAgent._extract_reasoning.__get__(agent)
        agent.verbose_logging = False
        agent.reasoning_callback = None
-        agent.stream_delta_callback = None  # non-streaming by default
        return agent

    def test_single_think_block_extracted(self):
@@ -606,159 +605,5 @@ class TestEndToEndPipeline(unittest.TestCase):
        self.assertIsNone(result["last_reasoning"])


-# ---------------------------------------------------------------------------
-# Duplicate reasoning box prevention (Bug fix: 3 boxes for 1 reasoning)
-# ---------------------------------------------------------------------------
-
-class TestReasoningDeltasFiredFlag(unittest.TestCase):
-    """_build_assistant_message should not re-fire reasoning_callback when
-    reasoning was already streamed via _fire_reasoning_delta."""
-
-    def _make_agent(self):
-        from run_agent import AIAgent
-        agent = AIAgent.__new__(AIAgent)
-        agent.reasoning_callback = None
-        agent.stream_delta_callback = None
-        agent._reasoning_deltas_fired = False
-        agent.verbose_logging = False
-        return agent
-
-    def test_fire_reasoning_delta_sets_flag(self):
-        agent = self._make_agent()
-        captured = []
-        agent.reasoning_callback = lambda t: captured.append(t)
-        self.assertFalse(agent._reasoning_deltas_fired)
-        agent._fire_reasoning_delta("thinking...")
-        self.assertTrue(agent._reasoning_deltas_fired)
-        self.assertEqual(captured, ["thinking..."])
-
-    def test_build_assistant_message_skips_callback_when_already_streamed(self):
-        """When streaming already fired reasoning deltas, the post-stream
-        _build_assistant_message should NOT re-fire the callback."""
-        agent = self._make_agent()
-        captured = []
-        agent.reasoning_callback = lambda t: captured.append(t)
-        agent.stream_delta_callback = lambda t: None  # streaming is active
-
-        # Simulate streaming having fired reasoning
-        agent._reasoning_deltas_fired = True
-
-        msg = SimpleNamespace(
-            content="I'll merge that.",
-            tool_calls=None,
-            reasoning_content="Let me merge the PR.",
-            reasoning=None,
-            reasoning_details=None,
-        )
-        agent._build_assistant_message(msg, "stop")
-
-        # Callback should NOT have been fired again
-        self.assertEqual(captured, [])
-
-    def test_build_assistant_message_skips_callback_when_streaming_active(self):
-        """When streaming is active, callback should NEVER fire from
-        _build_assistant_message — reasoning was already displayed during the
-        stream (either via reasoning_content deltas or content tag extraction).
-        Any missed reasoning is caught by the CLI post-response fallback."""
-        agent = self._make_agent()
-        captured = []
-        agent.reasoning_callback = lambda t: captured.append(t)
-        agent.stream_delta_callback = lambda t: None  # streaming active
-
-        # Even though _reasoning_deltas_fired is False (reasoning came through
-        # content tags, not reasoning_content deltas), callback should not fire
-        agent._reasoning_deltas_fired = False
-
-        msg = SimpleNamespace(
-            content="I'll merge that.",
-            tool_calls=None,
-            reasoning_content="Let me merge the PR.",
-            reasoning=None,
-            reasoning_details=None,
-        )
-        agent._build_assistant_message(msg, "stop")
-
-        # Callback should NOT fire — streaming is active
-        self.assertEqual(captured, [])
-
-    def test_build_assistant_message_fires_callback_without_streaming(self):
-        """When no streaming is active, callback always fires for structured
-        reasoning."""
-        agent = self._make_agent()
-        captured = []
-        agent.reasoning_callback = lambda t: captured.append(t)
-        # No streaming
-        agent.stream_delta_callback = None
-        agent._reasoning_deltas_fired = False
-
-        msg = SimpleNamespace(
-            content="I'll merge that.",
-            tool_calls=None,
-            reasoning_content="Let me merge the PR.",
-            reasoning=None,
-            reasoning_details=None,
-        )
-        agent._build_assistant_message(msg, "stop")
-
-        self.assertEqual(captured, ["Let me merge the PR."])
-
-
-class TestReasoningShownThisTurnFlag(unittest.TestCase):
-    """Post-response reasoning display should be suppressed when reasoning
-    was already shown during streaming in a tool-calling loop."""
-
-    def _make_cli(self):
-        from cli import HermesCLI
-        cli = HermesCLI.__new__(HermesCLI)
-        cli.show_reasoning = True
-        cli.streaming_enabled = True
-        cli._stream_box_opened = False
-        cli._reasoning_box_opened = False
-        cli._reasoning_stream_started = False
-        cli._reasoning_shown_this_turn = False
-        cli._reasoning_buf = ""
-        cli._stream_buf = ""
-        cli._stream_started = False
-        cli._stream_text_ansi = ""
-        cli._stream_prefilt = ""
-        cli._in_reasoning_block = False
-        cli._reasoning_preview_buf = ""
-        return cli
-
-    @patch("cli._cprint")
-    def test_streaming_reasoning_sets_turn_flag(self, mock_cprint):
-        cli = self._make_cli()
-        self.assertFalse(cli._reasoning_shown_this_turn)
-        cli._stream_reasoning_delta("Thinking about it...")
-        self.assertTrue(cli._reasoning_shown_this_turn)
-
-    @patch("cli._cprint")
-    def test_turn_flag_survives_reset_stream_state(self, mock_cprint):
-        """_reasoning_shown_this_turn must NOT be cleared by
-        _reset_stream_state (called at intermediate turn boundaries)."""
-        cli = self._make_cli()
-        cli._stream_reasoning_delta("Thinking...")
-        self.assertTrue(cli._reasoning_shown_this_turn)
-
-        # Simulate intermediate turn boundary (tool call)
-        cli._reset_stream_state()
-
-        # Flag must persist
-        self.assertTrue(cli._reasoning_shown_this_turn)
-
-    @patch("cli._cprint")
-    def test_turn_flag_cleared_before_new_turn(self, mock_cprint):
-        """The turn flag should be reset at the start of a new user turn.
-        This happens outside _reset_stream_state, at the call site."""
-        cli = self._make_cli()
-        cli._reasoning_shown_this_turn = True
-
-        # Simulate new user turn setup
-        cli._reset_stream_state()
-        cli._reasoning_shown_this_turn = False  # done by process_input
-
-        self.assertFalse(cli._reasoning_shown_this_turn)
-
-
 if __name__ == "__main__":
    unittest.main()
@@ -584,164 +584,6 @@ class TestBuildSystemPrompt:
        # Should contain current date info like "Conversation started:"
        assert "Conversation started:" in prompt

-    def test_skills_prompt_derives_available_toolsets_from_loaded_tools(self):
-        tools = _make_tool_defs("web_search", "skills_list", "skill_view", "skill_manage")
-        toolset_map = {
-            "web_search": "web",
-            "skills_list": "skills",
-            "skill_view": "skills",
-            "skill_manage": "skills",
-        }
-
-        with (
-            patch("run_agent.get_tool_definitions", return_value=tools),
-            patch(
-                "run_agent.check_toolset_requirements",
-                side_effect=AssertionError("should not re-check toolset requirements"),
-            ),
-            patch("run_agent.get_toolset_for_tool", create=True, side_effect=toolset_map.get),
-            patch("run_agent.build_skills_system_prompt", return_value="SKILLS_PROMPT") as mock_skills,
-            patch("run_agent.OpenAI"),
-        ):
-            agent = AIAgent(
-                api_key="test-k...7890",
-                quiet_mode=True,
-                skip_context_files=True,
-                skip_memory=True,
-            )
-
-            prompt = agent._build_system_prompt()
-
-        assert "SKILLS_PROMPT" in prompt
-        assert mock_skills.call_args.kwargs["available_tools"] == set(toolset_map)
-        assert mock_skills.call_args.kwargs["available_toolsets"] == {"web", "skills"}
-
-
-class TestToolUseEnforcementConfig:
-    """Tests for the agent.tool_use_enforcement config option."""
-
-    def _make_agent(self, model="openai/gpt-4.1", tool_use_enforcement="auto"):
-        """Create an agent with tools and a specific enforcement config."""
-        with (
-            patch(
-                "run_agent.get_tool_definitions",
-                return_value=_make_tool_defs("terminal", "web_search"),
-            ),
-            patch("run_agent.check_toolset_requirements", return_value={}),
-            patch("run_agent.OpenAI"),
-            patch(
-                "hermes_cli.config.load_config",
-                return_value={"agent": {"tool_use_enforcement": tool_use_enforcement}},
-            ),
-        ):
-            a = AIAgent(
-                model=model,
-                api_key="test-key-1234567890",
-                quiet_mode=True,
-                skip_context_files=True,
-                skip_memory=True,
-            )
-            a.client = MagicMock()
-            return a
-
-    def test_auto_injects_for_gpt(self):
-        from agent.prompt_builder import TOOL_USE_ENFORCEMENT_GUIDANCE
-        agent = self._make_agent(model="openai/gpt-4.1", tool_use_enforcement="auto")
-        prompt = agent._build_system_prompt()
-        assert TOOL_USE_ENFORCEMENT_GUIDANCE in prompt
-
-    def test_auto_injects_for_codex(self):
-        from agent.prompt_builder import TOOL_USE_ENFORCEMENT_GUIDANCE
-        agent = self._make_agent(model="openai/codex-mini", tool_use_enforcement="auto")
-        prompt = agent._build_system_prompt()
-        assert TOOL_USE_ENFORCEMENT_GUIDANCE in prompt
-
-    def test_auto_skips_for_claude(self):
-        from agent.prompt_builder import TOOL_USE_ENFORCEMENT_GUIDANCE
-        agent = self._make_agent(model="anthropic/claude-sonnet-4", tool_use_enforcement="auto")
-        prompt = agent._build_system_prompt()
-        assert TOOL_USE_ENFORCEMENT_GUIDANCE not in prompt
-
-    def test_true_forces_for_all_models(self):
-        from agent.prompt_builder import TOOL_USE_ENFORCEMENT_GUIDANCE
-        agent = self._make_agent(model="anthropic/claude-sonnet-4", tool_use_enforcement=True)
-        prompt = agent._build_system_prompt()
-        assert TOOL_USE_ENFORCEMENT_GUIDANCE in prompt
-
-    def test_string_true_forces_for_all_models(self):
-        from agent.prompt_builder import TOOL_USE_ENFORCEMENT_GUIDANCE
-        agent = self._make_agent(model="anthropic/claude-sonnet-4", tool_use_enforcement="true")
-        prompt = agent._build_system_prompt()
-        assert TOOL_USE_ENFORCEMENT_GUIDANCE in prompt
-
-    def test_always_forces_for_all_models(self):
-        from agent.prompt_builder import TOOL_USE_ENFORCEMENT_GUIDANCE
-        agent = self._make_agent(model="deepseek/deepseek-r1", tool_use_enforcement="always")
-        prompt = agent._build_system_prompt()
-        assert TOOL_USE_ENFORCEMENT_GUIDANCE in prompt
-
-    def test_false_disables_for_gpt(self):
-        from agent.prompt_builder import TOOL_USE_ENFORCEMENT_GUIDANCE
-        agent = self._make_agent(model="openai/gpt-4.1", tool_use_enforcement=False)
-        prompt = agent._build_system_prompt()
-        assert TOOL_USE_ENFORCEMENT_GUIDANCE not in prompt
-
-    def test_string_false_disables(self):
-        from agent.prompt_builder import TOOL_USE_ENFORCEMENT_GUIDANCE
-        agent = self._make_agent(model="openai/gpt-4.1", tool_use_enforcement="off")
-        prompt = agent._build_system_prompt()
-        assert TOOL_USE_ENFORCEMENT_GUIDANCE not in prompt
-
-    def test_custom_list_matches(self):
-        from agent.prompt_builder import TOOL_USE_ENFORCEMENT_GUIDANCE
-        agent = self._make_agent(
-            model="deepseek/deepseek-r1",
-            tool_use_enforcement=["deepseek", "gemini"],
-        )
-        prompt = agent._build_system_prompt()
-        assert TOOL_USE_ENFORCEMENT_GUIDANCE in prompt
-
-    def test_custom_list_no_match(self):
-        from agent.prompt_builder import TOOL_USE_ENFORCEMENT_GUIDANCE
-        agent = self._make_agent(
-            model="anthropic/claude-sonnet-4",
-            tool_use_enforcement=["deepseek", "gemini"],
-        )
-        prompt = agent._build_system_prompt()
-        assert TOOL_USE_ENFORCEMENT_GUIDANCE not in prompt
-
-    def test_custom_list_case_insensitive(self):
-        from agent.prompt_builder import TOOL_USE_ENFORCEMENT_GUIDANCE
-        agent = self._make_agent(
-            model="openai/GPT-4.1",
-            tool_use_enforcement=["GPT", "Codex"],
-        )
-        prompt = agent._build_system_prompt()
-        assert TOOL_USE_ENFORCEMENT_GUIDANCE in prompt
-
-    def test_no_tools_never_injects(self):
-        """Even with enforcement=true, no injection when agent has no tools."""
-        from agent.prompt_builder import TOOL_USE_ENFORCEMENT_GUIDANCE
-        with (
-            patch("run_agent.get_tool_definitions", return_value=[]),
-            patch("run_agent.check_toolset_requirements", return_value={}),
-            patch("run_agent.OpenAI"),
-            patch(
-                "hermes_cli.config.load_config",
-                return_value={"agent": {"tool_use_enforcement": True}},
-            ),
-        ):
-            a = AIAgent(
-                api_key="test-key-1234567890",
-                quiet_mode=True,
-                skip_context_files=True,
-                skip_memory=True,
-                enabled_toolsets=[],
-            )
-            a.client = MagicMock()
-            prompt = a._build_system_prompt()
-            assert TOOL_USE_ENFORCEMENT_GUIDANCE not in prompt
-

 class TestInvalidateSystemPrompt:
    def test_clears_cache(self, agent):
@@ -763,7 +605,7 @@ class TestBuildApiKwargs:
        kwargs = agent._build_api_kwargs(messages)
        assert kwargs["model"] == agent.model
        assert kwargs["messages"] is messages
-        assert kwargs["timeout"] == 1800.0
+        assert kwargs["timeout"] == 900.0

    def test_provider_preferences_injected(self, agent):
        agent.providers_allowed = ["Anthropic"]
@@ -1498,11 +1340,19 @@ class TestRunConversation:
        assert result["final_response"] == "Recovered after compression"
        assert result["completed"] is True

-    def test_length_finish_reason_requests_continuation(self, agent):
-        """Normal truncation (partial real content) triggers continuation."""
+    @pytest.mark.parametrize(
+        ("first_content", "second_content", "expected_final"),
+        [
+            ("Part 1 ", "Part 2", "Part 1 Part 2"),
+            ("<think>internal reasoning</think>", "Recovered final answer", "Recovered final answer"),
+        ],
+    )
+    def test_length_finish_reason_requests_continuation(
+        self, agent, first_content, second_content, expected_final
+    ):
        self._setup_agent(agent)
-        first = _mock_response(content="Part 1 ", finish_reason="length")
-        second = _mock_response(content="Part 2", finish_reason="stop")
+        first = _mock_response(content=first_content, finish_reason="length")
+        second = _mock_response(content=second_content, finish_reason="stop")
        agent.client.chat.completions.create.side_effect = [first, second]

        with (
@@ -1514,58 +1364,12 @@ class TestRunConversation:

        assert result["completed"] is True
        assert result["api_calls"] == 2
-        assert result["final_response"] == "Part 1 Part 2"
+        assert result["final_response"] == expected_final

        second_call_messages = agent.client.chat.completions.create.call_args_list[1].kwargs["messages"]
        assert second_call_messages[-1]["role"] == "user"
        assert "truncated by the output length limit" in second_call_messages[-1]["content"]

-    def test_length_thinking_exhausted_skips_continuation(self, agent):
-        """When finish_reason='length' but content is only thinking, skip retries."""
-        self._setup_agent(agent)
-        resp = _mock_response(
-            content="<think>internal reasoning</think>",
-            finish_reason="length",
-        )
-        agent.client.chat.completions.create.return_value = resp
-
-        with (
-            patch.object(agent, "_persist_session"),
-            patch.object(agent, "_save_trajectory"),
-            patch.object(agent, "_cleanup_task_resources"),
-        ):
-            result = agent.run_conversation("hello")
-
-        # Should return immediately — no continuation, only 1 API call
-        assert result["completed"] is False
-        assert result["api_calls"] == 1
-        assert "reasoning" in result["error"].lower()
-        assert "output tokens" in result["error"].lower()
-        # Should have a user-friendly response (not None)
-        assert result["final_response"] is not None
-        assert "Thinking Budget Exhausted" in result["final_response"]
-        assert "/thinkon" in result["final_response"]
-
-    def test_length_empty_content_detected_as_thinking_exhausted(self, agent):
-        """When finish_reason='length' and content is None/empty, detect exhaustion."""
-        self._setup_agent(agent)
-        resp = _mock_response(content=None, finish_reason="length")
-        agent.client.chat.completions.create.return_value = resp
-
-        with (
-            patch.object(agent, "_persist_session"),
-            patch.object(agent, "_save_trajectory"),
-            patch.object(agent, "_cleanup_task_resources"),
-        ):
-            result = agent.run_conversation("hello")
-
-        assert result["completed"] is False
-        assert result["api_calls"] == 1
-        assert "reasoning" in result["error"].lower()
-        # User-friendly message is returned
-        assert result["final_response"] is not None
-        assert "Thinking Budget Exhausted" in result["final_response"]
-

 class TestRetryExhaustion:
    """Regression: retry_count > max_retries was dead code (off-by-one).
@@ -2793,50 +2597,6 @@ class TestStreamingApiCall:
        assert tc[0].function.name == "search"
        assert tc[1].function.name == "read"

-    def test_ollama_reused_index_separate_tool_calls(self, agent):
-        """Ollama sends every tool call at index 0 with different ids.
-
-        Without the fix, names and arguments get concatenated into one slot.
-        """
-        chunks = [
-            _make_chunk(tool_calls=[_make_tc_delta(0, "call_a", "search", '{"q":"hello"}')]),
-            # Second tool call at the SAME index 0, but different id
-            _make_chunk(tool_calls=[_make_tc_delta(0, "call_b", "read_file", '{"path":"x.py"}')]),
-            _make_chunk(finish_reason="tool_calls"),
-        ]
-        agent.client.chat.completions.create.return_value = iter(chunks)
-
-        resp = agent._interruptible_streaming_api_call({"messages": []})
-
-        tc = resp.choices[0].message.tool_calls
-        assert len(tc) == 2, f"Expected 2 tool calls, got {len(tc)}: {[t.function.name for t in tc]}"
-        assert tc[0].function.name == "search"
-        assert tc[0].function.arguments == '{"q":"hello"}'
-        assert tc[0].id == "call_a"
-        assert tc[1].function.name == "read_file"
-        assert tc[1].function.arguments == '{"path":"x.py"}'
-        assert tc[1].id == "call_b"
-
-    def test_ollama_reused_index_streamed_args(self, agent):
-        """Ollama with streamed arguments across multiple chunks at same index."""
-        chunks = [
-            _make_chunk(tool_calls=[_make_tc_delta(0, "call_a", "search", '{"q":')]),
-            _make_chunk(tool_calls=[_make_tc_delta(0, None, None, '"hello"}')]),
-            # New tool call, same index 0
-            _make_chunk(tool_calls=[_make_tc_delta(0, "call_b", "read", '{}')]),
-            _make_chunk(finish_reason="tool_calls"),
-        ]
-        agent.client.chat.completions.create.return_value = iter(chunks)
-
-        resp = agent._interruptible_streaming_api_call({"messages": []})
-
-        tc = resp.choices[0].message.tool_calls
-        assert len(tc) == 2
-        assert tc[0].function.name == "search"
-        assert tc[0].function.arguments == '{"q":"hello"}'
-        assert tc[1].function.name == "read"
-        assert tc[1].function.arguments == '{}'
-
    def test_content_and_tool_calls_together(self, agent):
        chunks = [
            _make_chunk(content="I'll search"),
@@ -493,22 +493,22 @@ def test_minimax_default_url_uses_anthropic_messages(monkeypatch):
    assert resolved["base_url"] == "https://api.minimax.io/anthropic"


-def test_minimax_v1_url_uses_chat_completions(monkeypatch):
-    """MiniMax with /v1 base URL should use chat_completions (user override for regions where /anthropic 404s)."""
+def test_minimax_stale_v1_url_auto_corrected(monkeypatch):
+    """MiniMax with stale /v1 base URL should be auto-corrected to /anthropic."""
    monkeypatch.setattr(rp, "resolve_provider", lambda *a, **k: "minimax")
    monkeypatch.setattr(rp, "_get_model_config", lambda: {})
    monkeypatch.setenv("MINIMAX_API_KEY", "test-minimax-key")
-    monkeypatch.setenv("MINIMAX_BASE_URL", "https://api.minimax.chat/v1")
+    monkeypatch.setenv("MINIMAX_BASE_URL", "https://api.minimax.io/v1")

    resolved = rp.resolve_runtime_provider(requested="minimax")

    assert resolved["provider"] == "minimax"
-    assert resolved["api_mode"] == "chat_completions"
-    assert resolved["base_url"] == "https://api.minimax.chat/v1"
+    assert resolved["api_mode"] == "anthropic_messages"
+    assert resolved["base_url"] == "https://api.minimax.io/anthropic"


-def test_minimax_cn_v1_url_uses_chat_completions(monkeypatch):
-    """MiniMax-CN with /v1 base URL should use chat_completions (user override)."""
+def test_minimax_cn_stale_v1_url_auto_corrected(monkeypatch):
+    """MiniMax-CN with stale /v1 base URL should be auto-corrected to /anthropic."""
    monkeypatch.setattr(rp, "resolve_provider", lambda *a, **k: "minimax-cn")
    monkeypatch.setattr(rp, "_get_model_config", lambda: {})
    monkeypatch.setenv("MINIMAX_CN_API_KEY", "test-minimax-cn-key")
@@ -517,8 +517,8 @@ def test_minimax_cn_v1_url_uses_chat_completions(monkeypatch):
    resolved = rp.resolve_runtime_provider(requested="minimax-cn")

    assert resolved["provider"] == "minimax-cn"
-    assert resolved["api_mode"] == "chat_completions"
-    assert resolved["base_url"] == "https://api.minimaxi.com/v1"
+    assert resolved["api_mode"] == "anthropic_messages"
+    assert resolved["base_url"] == "https://api.minimaxi.com/anthropic"


 def test_minimax_explicit_api_mode_respected(monkeypatch):
@@ -534,8 +534,8 @@ def test_minimax_explicit_api_mode_respected(monkeypatch):
    assert resolved["api_mode"] == "chat_completions"


-def test_alibaba_default_coding_intl_endpoint_uses_chat_completions(monkeypatch):
-    """Alibaba default coding-intl /v1 URL should use chat_completions mode."""
+def test_alibaba_default_anthropic_endpoint_uses_anthropic_messages(monkeypatch):
+    """Alibaba with default /apps/anthropic URL should use anthropic_messages mode."""
    monkeypatch.setattr(rp, "resolve_provider", lambda *a, **k: "alibaba")
    monkeypatch.setattr(rp, "_get_model_config", lambda: {})
    monkeypatch.setenv("DASHSCOPE_API_KEY", "test-dashscope-key")
@@ -544,22 +544,22 @@ def test_alibaba_default_coding_intl_endpoint_uses_chat_completions(monkeypatch)
    resolved = rp.resolve_runtime_provider(requested="alibaba")

    assert resolved["provider"] == "alibaba"
-    assert resolved["api_mode"] == "chat_completions"
-    assert resolved["base_url"] == "https://coding-intl.dashscope.aliyuncs.com/v1"
+    assert resolved["api_mode"] == "anthropic_messages"
+    assert resolved["base_url"] == "https://dashscope-intl.aliyuncs.com/apps/anthropic"


-def test_alibaba_anthropic_endpoint_override_uses_anthropic_messages(monkeypatch):
-    """Alibaba with /apps/anthropic URL override should auto-detect anthropic_messages mode."""
+def test_alibaba_openai_compatible_v1_endpoint_stays_chat_completions(monkeypatch):
+    """Alibaba with /v1 coding endpoint should use chat_completions mode."""
    monkeypatch.setattr(rp, "resolve_provider", lambda *a, **k: "alibaba")
    monkeypatch.setattr(rp, "_get_model_config", lambda: {})
    monkeypatch.setenv("DASHSCOPE_API_KEY", "test-dashscope-key")
-    monkeypatch.setenv("DASHSCOPE_BASE_URL", "https://coding-intl.dashscope.aliyuncs.com/apps/anthropic")
+    monkeypatch.setenv("DASHSCOPE_BASE_URL", "https://coding-intl.dashscope.aliyuncs.com/v1")

    resolved = rp.resolve_runtime_provider(requested="alibaba")

    assert resolved["provider"] == "alibaba"
-    assert resolved["api_mode"] == "anthropic_messages"
-    assert resolved["base_url"] == "https://coding-intl.dashscope.aliyuncs.com/apps/anthropic"
+    assert resolved["api_mode"] == "chat_completions"
+    assert resolved["base_url"] == "https://coding-intl.dashscope.aliyuncs.com/v1"


 def test_named_custom_provider_anthropic_api_mode(monkeypatch):
@@ -362,11 +362,9 @@ class TestStreamingCallbacks:

        # Text before tool call IS fired (we don't know yet it will have tools)
        assert "thinking..." in deltas
-        # Text after tool call IS still routed to stream_delta_callback so that
-        # reasoning tag extraction can fire (PR #3566).  Display-level suppression
-        # of non-reasoning text happens in the CLI's _stream_delta, not here.
-        assert " more text" in deltas
-        # Content is still accumulated in the response
+        # Text after tool call is NOT fired
+        assert " more text" not in deltas
+        # But content is still accumulated in the response
        assert response.choices[0].message.content == "thinking... more text"


@@ -534,121 +532,6 @@ class TestStreamingFallback:
        mock_non_stream.assert_called_once()
        assert mock_close.call_count >= 1

-    @patch("run_agent.AIAgent._interruptible_api_call")
-    @patch("run_agent.AIAgent._create_request_openai_client")
-    @patch("run_agent.AIAgent._close_request_openai_client")
-    def test_sse_connection_lost_retried_as_transient(self, mock_close, mock_create, mock_non_stream):
-        """SSE 'Network connection lost' (APIError w/ no status_code) retries like httpx errors.
-
-        OpenRouter sends {"error":{"message":"Network connection lost."}} as an SSE
-        event when the upstream stream drops.  The OpenAI SDK raises APIError from
-        this.  It should be retried at the streaming level, same as httpx connection
-        errors, before falling back to non-streaming.
-        """
-        from run_agent import AIAgent
-        import httpx
-
-        # Create an APIError that mimics what the OpenAI SDK raises from SSE error events.
-        # Key: no status_code attribute (unlike APIStatusError which has one).
-        from openai import APIError as OAIAPIError
-        sse_error = OAIAPIError(
-            message="Network connection lost.",
-            request=httpx.Request("POST", "https://openrouter.ai/api/v1/chat/completions"),
-            body={"message": "Network connection lost."},
-        )
-
-        mock_client = MagicMock()
-        mock_client.chat.completions.create.side_effect = sse_error
-        mock_create.return_value = mock_client
-
-        fallback_response = SimpleNamespace(
-            id="fallback",
-            model="test",
-            choices=[SimpleNamespace(
-                index=0,
-                message=SimpleNamespace(
-                    role="assistant",
-                    content="fallback after SSE retries",
-                    tool_calls=None,
-                    reasoning_content=None,
-                ),
-                finish_reason="stop",
-            )],
-            usage=None,
-        )
-        mock_non_stream.return_value = fallback_response
-
-        agent = AIAgent(
-            model="test/model",
-            quiet_mode=True,
-            skip_context_files=True,
-            skip_memory=True,
-        )
-        agent.api_mode = "chat_completions"
-        agent._interrupt_requested = False
-
-        response = agent._interruptible_streaming_api_call({})
-
-        assert response.choices[0].message.content == "fallback after SSE retries"
-        # Should retry 3 times (default HERMES_STREAM_RETRIES=2 → 3 attempts)
-        # before falling back to non-streaming
-        assert mock_client.chat.completions.create.call_count == 3
-        mock_non_stream.assert_called_once()
-        # Connection cleanup should happen for each failed retry
-        assert mock_close.call_count >= 2
-
-    @patch("run_agent.AIAgent._interruptible_api_call")
-    @patch("run_agent.AIAgent._create_request_openai_client")
-    @patch("run_agent.AIAgent._close_request_openai_client")
-    def test_sse_non_connection_error_falls_back_immediately(self, mock_close, mock_create, mock_non_stream):
-        """SSE errors that aren't connection-related still fall back immediately (no stream retry)."""
-        from run_agent import AIAgent
-        import httpx
-
-        from openai import APIError as OAIAPIError
-        sse_error = OAIAPIError(
-            message="Invalid model configuration.",
-            request=httpx.Request("POST", "https://openrouter.ai/api/v1/chat/completions"),
-            body={"message": "Invalid model configuration."},
-        )
-
-        mock_client = MagicMock()
-        mock_client.chat.completions.create.side_effect = sse_error
-        mock_create.return_value = mock_client
-
-        fallback_response = SimpleNamespace(
-            id="fallback",
-            model="test",
-            choices=[SimpleNamespace(
-                index=0,
-                message=SimpleNamespace(
-                    role="assistant",
-                    content="fallback no retry",
-                    tool_calls=None,
-                    reasoning_content=None,
-                ),
-                finish_reason="stop",
-            )],
-            usage=None,
-        )
-        mock_non_stream.return_value = fallback_response
-
-        agent = AIAgent(
-            model="test/model",
-            quiet_mode=True,
-            skip_context_files=True,
-            skip_memory=True,
-        )
-        agent.api_mode = "chat_completions"
-        agent._interrupt_requested = False
-
-        response = agent._interruptible_streaming_api_call({})
-
-        assert response.choices[0].message.content == "fallback no retry"
-        # Should NOT retry — goes straight to non-streaming fallback
-        assert mock_client.chat.completions.create.call_count == 1
-        mock_non_stream.assert_called_once()
-

 # ── Test: Reasoning Streaming ────────────────────────────────────────────

--- a/Show More
+++ b/Show More