perf(state): merge FTS5 segments on VACUUM + add 'hermes sessions optimize'

The FTS5 indexes (messages_fts, messages_fts_trigram) grow as a series of incremental b-tree segments — one per trigger-driven insert batch. SQLite's automerge caps at ~16 segments, so a long-lived store keeps scanning many segments per MATCH and never collapses them unless the special 'optimize' command runs. Nothing in the codebase ever ran it: vacuum() only fired after a prune that deleted rows, and even then never merged FTS segments. Changes: - SessionDB.optimize_fts(): merges each FTS5 index to a single segment, probing for the (optional/lazy) trigram table first so it is safe to call unconditionally. Layout-only — search results and snippet() are unchanged. - vacuum() now calls optimize_fts() before VACUUM so freed index pages are returned to the OS in the same pass. - 'hermes sessions optimize' CLI subcommand for on-demand reclamation + segment compaction (previously there was no way to compact the store without a prune deleting rows), with before/after size reporting. Benchmark (8000 msgs, fragmented to 8 segments/index): - segments 8 -> 1 on both indexes - porter MATCH 5.5x faster (0.449 -> 0.081 ms/q) - trigram MATCH 3.0x faster (0.632 -> 0.207 ms/q) - 8000 matches before == 8000 after, identical row ids (no functional change) Orthogonal to the structural FTS-size PRs (#20239 external-content, #27770 optional trigram) — segment merge helps regardless of those. Tests: TestOptimizeFts covers index count, search+snippet preservation, missing-trigram path, and idempotency. Full test_hermes_state.py green (227).
test: update non-minimax overflow test to match new keep-context behavior
2026-05-29 01:17:51 +05:30 · 2026-05-28 12:26:53 -07:00 · 2026-05-28 12:26:53 -07:00 · 2026-05-28 12:26:53 -07:00 · 2026-05-28 12:10:21 -07:00 · 2026-05-28 11:59:58 -07:00
304 changed files with 12296 additions and 2548 deletions
@@ -78,6 +78,12 @@ mini-swe-agent/
 .nix-stamps/
 result
 website/static/api/skills-index.json
+# skills.json + skills-meta.json are build artifacts emitted by
+# website/scripts/extract-skills.py during prebuild — keep them out of
+# git for the same reason as skills-index.json (large, generated, change
+# every build).
+website/static/api/skills.json
+website/static/api/skills-meta.json
 models-dev-upstream/
 hermes_cli/tui_dist/*
 hermes_cli/scripts/
@@ -0,0 +1,651 @@
+# Hermes Agent v0.15.0 (v2026.5.28)
+
+**Release Date:** May 28, 2026
+**Since v0.14.0:** 1,302 commits · 747 merged PRs · 1,746 files changed · 282,712 insertions · 36,699 deletions · 560+ issues closed (15 P0, 65 P1, 19 security-tagged) · 321 community contributors (including co-authors)
+
+> **The Velocity Release.** Hermes gets dramatically faster — to start, to run, to ship work, and to grow. The 16,083-line `run_agent.py` collapses to 3,821 (-76%) across 14 cohesive `agent/*` modules. Kanban grew into a real multi-agent platform across 104 PRs — orchestrator auto-decomposition, swarm topology, scheduled tasks, worktree-per-task, per-task model overrides. The cold-start perf wave keeps going: another second shaved off launch, 47% fewer per-conversation function calls, `hermes --version` flipping the head-to-head benchmark against Codex CLI. `session_search` is 4,500× faster and free now. Promptware defense lands against Brainworm-class attacks. Bitwarden Secrets Manager replaces N per-provider API keys with one bootstrap token. Skill bundles let one slash command load a whole workflow. The Ink TUI gets a multi-session orchestrator. Two new image_gen providers (Krea 2 Medium + Large, FAL ported to plugin), the Nous-approved MCP catalog with an interactive picker, an OpenHands orchestration skill, ntfy as the 23rd messaging platform, and a deep xAI integration round (Web Search plugin, xai-oauth `hermes proxy` upstream, retired-May-15 model detection + `hermes migrate xai`, natural TTS speech-tag pauses, base_url leak guard, OpenAI-style execution guidance for Grok). 15 P0 + 65 P1 closures alongside.
+
+---
+
+## ✨ Highlights
+
+- **The Big Refactor — `run_agent.py` is no longer 16,000 lines** — The file at the heart of Hermes — the agent conversation loop — has been reduced from 16,083 lines to 3,821 (-76%), with the extracted code redistributed across 14 cohesive modules under `agent/`. Behavior is unchanged: every extraction keeps a thin forwarder on `AIAgent`, every test patch path still works, every external caller is compatible. The reason you care: future Hermes development moves faster, plugin authors can finally grep the codebase, and the file that took 90 seconds to load in your editor opens in a blink. ([#27248](https://github.com/NousResearch/hermes-agent/pull/27248))
+
+- **Kanban grew into a real multi-agent platform — 104 PRs end to end** — Triage auto-decomposes one task into a tree of sub-tasks. `hermes kanban swarm` creates a full Swarm v1 graph in one command — root, parallel workers, gated verifier, gated synthesizer, shared blackboard. Tasks support per-task model overrides (cheap models for boilerplate, expensive ones for hard sub-tasks), board-level default workdirs, per-task worktree paths and branches, scheduled start times, configurable claim TTL, retry fingerprinting, stale-task detection, respawn guards, and a drag-to-delete trash zone. Workers report through `/workers/active`, `/runs/{id}`, and `/inspect` endpoints. ([#27572](https://github.com/NousResearch/hermes-agent/pull/27572), [#28443](https://github.com/NousResearch/hermes-agent/pull/28443), [#28364](https://github.com/NousResearch/hermes-agent/pull/28364), [#28394](https://github.com/NousResearch/hermes-agent/pull/28394), [#28462](https://github.com/NousResearch/hermes-agent/pull/28462), [#28384](https://github.com/NousResearch/hermes-agent/pull/28384), [#28467](https://github.com/NousResearch/hermes-agent/pull/28467), [#28455](https://github.com/NousResearch/hermes-agent/pull/28455), [#28452](https://github.com/NousResearch/hermes-agent/pull/28452), [#28432](https://github.com/NousResearch/hermes-agent/pull/28432), [#28468](https://github.com/NousResearch/hermes-agent/pull/28468), [#28420](https://github.com/NousResearch/hermes-agent/pull/28420))
+
+- **Cold-start perf wave keeps going — another second saved, 47% fewer per-turn function calls** — Three new optimization rounds: defer `openai._base_client` import (-240ms / -17MB on every CLI invocation), hot-path optimizations cut 47% of per-conversation function calls (399k → 213k for 31-turn chat), defer compression-feasibility check (-170 to -290ms on every agent construction), adaptive subprocess polling (-195ms per tool call, 1+ second per turn). Termux cold start drops from 2.9s to 0.8s. `hermes --version` cold drops 63% (701ms → 258ms), flipping the head-to-head benchmark against Codex CLI from 5/11 wins to 6/11. ([#28864](https://github.com/NousResearch/hermes-agent/pull/28864), [#28866](https://github.com/NousResearch/hermes-agent/pull/28866), [#28957](https://github.com/NousResearch/hermes-agent/pull/28957), [#29006](https://github.com/NousResearch/hermes-agent/pull/29006), [#29419](https://github.com/NousResearch/hermes-agent/pull/29419), [#30121](https://github.com/NousResearch/hermes-agent/pull/30121), [#30609](https://github.com/NousResearch/hermes-agent/pull/30609), [#31968](https://github.com/NousResearch/hermes-agent/pull/31968))
+
+- **`session_search` rebuilt — no LLM, no cost, 4,500× faster** — The old `session_search` was an aux-LLM-powered tool that cost ~$0.30/call and took ~30 seconds to summarize three sessions, sometimes confabulating when the right session wasn't even in the FTS5 hit list. The new shape is one tool with three modes (discovery, scroll, browse) inferred from which args are set — no `mode` parameter, no aux-LLM, no config knob, no companion skill. Discovery is ~20ms instead of ~90s; scroll is ~1ms. Searching your past sessions for context is now free and instant. ([#27590](https://github.com/NousResearch/hermes-agent/pull/27590))
+
+- **Promptware defense — Brainworm-class attacks blocked at three chokepoints** — Inspired by recent Brainworm / Promptware Kill Chain research (Origin HQ, arxiv 2601.09625), Hermes now defends the context window against prompt-injection attacks that try to hijack the agent via tool output, recalled memory, or stored skills. Single source of truth (`tools/threat_patterns.py`) with ~15 new Brainworm/C2 patterns; recalled memory is scanned at load time; tool results get delimiter markers so a malicious file or remote service can't impersonate Hermes' own system content. Paired with a new `security-guidance` plugin that pattern-matches dangerous code writes. ([#32269](https://github.com/NousResearch/hermes-agent/pull/32269), [#33131](https://github.com/NousResearch/hermes-agent/pull/33131), [#9151](https://github.com/NousResearch/hermes-agent/pull/9151))
+
+- **Bitwarden Secrets Manager — one bootstrap token replaces every per-provider API key** — Stop keeping plaintext API keys in `~/.hermes/.env`. Install Bitwarden Secrets Manager (`bws` auto-installs lazily on first use), point Hermes at it with one bootstrap token (`BWS_ACCESS_TOKEN`), and every credential you need comes from Bitwarden at startup. Rotate a key in the Bitwarden web app and the rotation actually takes effect — Bitwarden defaults to source-of-truth so its values overwrite matching env vars on startup. Flip `secrets.bitwarden.override_existing: false` to invert. EU Cloud and self-hosted Bitwarden server URLs supported. Detected credentials are now labeled with their source so you can see at a glance which keys came from Bitwarden vs. the local env. ([#30035](https://github.com/NousResearch/hermes-agent/pull/30035), [#31378](https://github.com/NousResearch/hermes-agent/pull/31378), [#30364](https://github.com/NousResearch/hermes-agent/pull/30364))
+
+- **ntfy as the 23rd messaging platform — push notifications without an account** — ntfy is the self-hostable push-notification service with no signup, no API key, just a topic URL. Hermes now adapts to it as a platform plugin (zero edits to core), so your agent can send you push notifications from any cron job, kanban task completion, or chat `send_message` — to your phone, your watch, your desktop, your homelab. (salvages [#30625](https://github.com/NousResearch/hermes-agent/pull/30625) → originally [#4043](https://github.com/NousResearch/hermes-agent/pull/4043)) ([#30867](https://github.com/NousResearch/hermes-agent/pull/30867))
+
+- **Skill bundles — `/<name>` loads multiple skills at once** — A skill bundle is a named group of skills that loads them all together with one slash command. Set up your "writing day" bundle (humanizer + ideation + obsidian + youtube-content) and `/writing-day` activates all four for the session. Skills Hub now has health checks, a freshness badge, and a watchdog cron. Three new optional skills land: `code-wiki` (Karpathy's LLM-Wiki, persistent indexed dev wiki), `openhands` (delegate to OpenHands for parallel coding agents), and `web-pentest` (OWASP-style web pentest recipes). ([#28373](https://github.com/NousResearch/hermes-agent/pull/28373), [#32345](https://github.com/NousResearch/hermes-agent/pull/32345), [#32240](https://github.com/NousResearch/hermes-agent/pull/32240), [#32261](https://github.com/NousResearch/hermes-agent/pull/32261), [#32265](https://github.com/NousResearch/hermes-agent/pull/32265))
+
+- **TUI session orchestrator — multiple live sessions in one TUI window** — The Ink TUI gained an active-session switcher overlay. List, switch between, refresh, and close multiple live process-local sessions without leaving the TUI; dispatch a new session with a session-scoped model picker. Plus a wave of TUI polish — mouse-tracking DEC mode presets, scrollback preservation across branches and termux, slash-dropdown fixes, x.com link rendering, and CJK / IME input rendering improvements. (salvages [#27642](https://github.com/NousResearch/hermes-agent/pull/27642)) ([#32980](https://github.com/NousResearch/hermes-agent/pull/32980), [#30084](https://github.com/NousResearch/hermes-agent/pull/30084))
+
+- **Two new image_gen providers — Krea 2 Medium + Large, FAL ported to plugin** — Krea joins the image_gen lineup as a built-in plugin: `Krea 2 Medium` ($0.03) and `Krea 2 Large` ($0.06), auto-discovered, selectable via `hermes tools` → Image Generation → Krea. Available through both the native Krea plugin and the FAL.ai catalog. The FAL.ai backend got pulled out of the monolithic image-generation tool into `plugins/image_gen/fal/`, completing the four-way architectural parity already established by web, browser, and video_gen — new image providers are now one file, not a fork. ([#33236](https://github.com/NousResearch/hermes-agent/pull/33236), [#30380](https://github.com/NousResearch/hermes-agent/pull/30380), [#33506](https://github.com/NousResearch/hermes-agent/pull/33506))
+
+- **Nous-approved MCP catalog with interactive picker** — A curated catalog of Nous-vetted MCP servers, mirroring the optional-skills shape. Run `hermes mcp` and you get an interactive picker; install with one keystroke, credentials prompted at install time and written to `~/.hermes/.env`. Ships with the n8n manifest first. Closes the discovery gap that left users hunting GitHub for trusted MCP servers. ([#30870](https://github.com/NousResearch/hermes-agent/pull/30870))
+
+- **OpenHands orchestration skill** — A new optional skill under `optional-skills/autonomous-ai-agents/openhands/` lets the agent delegate coding tasks to the OpenHands CLI alongside `claude-code`, `codex`, and `opencode`. OpenHands is the model-agnostic member of that family — any LiteLLM-supported provider works (OpenAI, Anthropic, OpenRouter, your own), so you can route a sub-task to the cheapest model that can finish it. Drop-in worker for kanban swarms and `/delegate` flows. (closes [#477](https://github.com/NousResearch/hermes-agent/issues/477)) ([#32261](https://github.com/NousResearch/hermes-agent/pull/32261))
+
+- **Deep xAI integration round — Web Search plugin, OAuth proxy upstream, May 15 retirement detection, natural TTS, security hardening** — Six interlocking xAI improvements:
+    - **xAI Web Search** lands as a `plugins/web/xai/` provider, slots alongside Brave / Tavily / Exa / SearXNG / DDGS / Firecrawl — reuses your existing Grok OAuth or `XAI_API_KEY` credentials, no new env vars. ([#29042](https://github.com/NousResearch/hermes-agent/pull/29042))
+    - **`hermes proxy` gains an xAI upstream** — your local OpenAI-compatible endpoint can now be backed by SuperGrok OAuth, no PKCE-refresh code to write in your client. ([#28356](https://github.com/NousResearch/hermes-agent/pull/28356))
+    - **May 15 model retirement detection** — `grok-4`, `grok-4-fast{,-reasoning,-non-reasoning}`, `grok-3`, `grok-code-fast-1`, `grok-imagine-image-pro` etc. are detected in doctor and chat startup, with `hermes migrate xai` to one-shot config migration to the supported model. No more silent 404s after the retirement date. ([#29277](https://github.com/NousResearch/hermes-agent/pull/29277))
+    - **Opt-in `auto_speech_tags`** for xAI TTS — inserts light `[pause]` tags between paragraphs and sentences for more natural-sounding voice replies. Default OFF. ([#29376](https://github.com/NousResearch/hermes-agent/pull/29376))
+    - **`xai-oauth` `base_url` pinned to `x.ai` origin** — closes a silent credential-leak vector where `XAI_BASE_URL` could repoint OAuth-authenticated inference to an attacker-controlled host. ([#28952](https://github.com/NousResearch/hermes-agent/pull/28952))
+    - **OpenAI-style execution guidance applied to Grok models** — Grok and xai-oauth now get the same family-specific execution discipline block GPT/Codex have, so the model stops claiming completion without tool calls and stops suggesting workarounds instead of using existing tools. ([#27797](https://github.com/NousResearch/hermes-agent/pull/27797))
+    - Plus `x_search` degraded-results surfacing, tier-gated 403 with API-key fallback, PKCE `code_challenge` round-trip fix, dead-token quarantine on terminal refresh failure, MiniMax-style short-token refresh on per-request, and `WKE=unauthenticated` honor at both classifier sites. ([#29484](https://github.com/NousResearch/hermes-agent/pull/29484), [#28351](https://github.com/NousResearch/hermes-agent/pull/28351), [#27560](https://github.com/NousResearch/hermes-agent/pull/27560), [#28116](https://github.com/NousResearch/hermes-agent/pull/28116), [#30619](https://github.com/NousResearch/hermes-agent/pull/30619), [#30872](https://github.com/NousResearch/hermes-agent/pull/30872))
+
+---
+
+## 🏗️ Core Agent & Architecture
+
+### The Big Refactor — `run_agent.py` 16k → 3.8k
+
+- `run_agent.py` from 16,083 → 3,821 lines (-76%), extracted into 14 cohesive `agent/*` modules. `run_conversation` alone was 3,877 lines before the refactor. Every extraction keeps a thin forwarder on `AIAgent`, every test-patch path is preserved, every external caller stays compatible. ([#27248](https://github.com/NousResearch/hermes-agent/pull/27248))
+
+### Agent loop & conversation
+
+- Auxiliary task layered fallback (primary → chain → main agent → graceful fail) on capacity errors (402/429/connection). (salvages [#26811](https://github.com/NousResearch/hermes-agent/pull/26811) + [#26998](https://github.com/NousResearch/hermes-agent/pull/26998)) ([#27625](https://github.com/NousResearch/hermes-agent/pull/27625))
+- Buffer retry/fallback status; surface only on terminal failure (no more noisy "retrying..." spam in mid-run output). ([#33816](https://github.com/NousResearch/hermes-agent/pull/33816))
+- Host contract for external context engines — condenses 5 prior PRs into one extension surface. ([#33750](https://github.com/NousResearch/hermes-agent/pull/33750))
+- Fallback immediately on provider content-policy blocks. ([#33883](https://github.com/NousResearch/hermes-agent/pull/33883))
+- Re-pad `reasoning_content` on cross-provider fallback to require-side providers. (salvage [#33784](https://github.com/NousResearch/hermes-agent/pull/33784)) ([#33795](https://github.com/NousResearch/hermes-agent/pull/33795))
+- Per-turn tool-outcome verifier — patch tool gets indent preservation, CRLF preservation, per-file failure escalation. ([#32273](https://github.com/NousResearch/hermes-agent/pull/32273))
+- Single-knob native vision for custom-provider models. ([#29679](https://github.com/NousResearch/hermes-agent/pull/29679))
+- Background review fork isolated from external memory plugins. ([#27190](https://github.com/NousResearch/hermes-agent/pull/27190))
+- Background review inherits parent toolset config for `tools[]` cache parity. ([#29704](https://github.com/NousResearch/hermes-agent/pull/29704))
+- Recover from providers returning list-type tool content. ([#30259](https://github.com/NousResearch/hermes-agent/pull/30259))
+- Treat partial-stream stub responses as length truncation rather than clean stop. ([#30998](https://github.com/NousResearch/hermes-agent/pull/30998))
+- OpenAI execution guidance applied to xAI Grok / xai-oauth. ([#27797](https://github.com/NousResearch/hermes-agent/pull/27797))
+- ContextVars propagate to concurrent tool worker threads.
+- Preload `jiter` native parser. ([#33692](https://github.com/NousResearch/hermes-agent/pull/33692))
+- Expose context engine tools with saved toolsets. (salvage of [#31194](https://github.com/NousResearch/hermes-agent/pull/31194)) ([#33719](https://github.com/NousResearch/hermes-agent/pull/33719))
+
+### Sessions & memory
+
+- `session_search` rebuilt — single-shape (discovery + scroll + browse), no aux-LLM, ~20ms vs. ~90s. ([#27590](https://github.com/NousResearch/hermes-agent/pull/27590))
+- Salvage [#29182](https://github.com/NousResearch/hermes-agent/pull/29182) — opt-in JSON snapshot writer for sessions. ([#29278](https://github.com/NousResearch/hermes-agent/pull/29278))
+- Persist `platform_message_id` for recall across gateway restarts. ([#29449](https://github.com/NousResearch/hermes-agent/pull/29449))
+- Inline memory-context mentions stay visible in conversation. ([#28132](https://github.com/NousResearch/hermes-agent/pull/28132))
+- Recalled memory labeled informational, not authoritative. ([#28583](https://github.com/NousResearch/hermes-agent/pull/28583))
+- Memory + context-engine tool injection gated on `enabled_toolsets`. ([#30177](https://github.com/NousResearch/hermes-agent/pull/30177))
+- Guard against external drift in `MEMORY.md` / `USER.md`. ([#30877](https://github.com/NousResearch/hermes-agent/pull/30877))
+- Honcho runtime peer mapping — correctness follow-ups + setup wizard + docs. ([#30077](https://github.com/NousResearch/hermes-agent/pull/30077))
+- Periodic memory logging for leak detection. (salvage of [#17667](https://github.com/NousResearch/hermes-agent/pull/17667)) ([#27102](https://github.com/NousResearch/hermes-agent/pull/27102))
+
+### Codex / Responses-API maturation
+
+- TTFB watchdog for stalled Codex Responses streams. ([#32042](https://github.com/NousResearch/hermes-agent/pull/32042))
+- Actionable hint when stale-call detector fires on known silent-reject pattern. ([#32016](https://github.com/NousResearch/hermes-agent/pull/32016), [#33133](https://github.com/NousResearch/hermes-agent/pull/33133))
+- Drop SDK `responses.stream()` helper; consume events directly. ([#33042](https://github.com/NousResearch/hermes-agent/pull/33042))
+- Gracefully recover from `invalid_encrypted_content`. (salvage of [#10144](https://github.com/NousResearch/hermes-agent/pull/10144)) ([#33035](https://github.com/NousResearch/hermes-agent/pull/33035))
+- Recover Codex Responses streams with null output. ([#32963](https://github.com/NousResearch/hermes-agent/pull/32963), [#33390](https://github.com/NousResearch/hermes-agent/pull/33390))
+- Drop foreign-issuer reasoning and transient `rs_tmp` reasoning replay state. ([#33156](https://github.com/NousResearch/hermes-agent/pull/33156), [#33146](https://github.com/NousResearch/hermes-agent/pull/33146))
+- Codex 429 quota classified as rate-limit, not missing credentials. ([#33168](https://github.com/NousResearch/hermes-agent/pull/33168))
+- Codex chat path falls back to credential_pool when singleton is empty. ([#33189](https://github.com/NousResearch/hermes-agent/pull/33189))
+- Codex re-auth syncs credential_pool. ([#33164](https://github.com/NousResearch/hermes-agent/pull/33164))
+- Omit `tools` key when no tools registered. ([#33409](https://github.com/NousResearch/hermes-agent/pull/33409))
+- Parse Codex image-generation SSE directly. ([#32933](https://github.com/NousResearch/hermes-agent/pull/32933))
+
+---
+
+## 🎛️ Kanban — Multi-Agent Maturation Wave
+
+### Orchestration & dispatch
+
+- Orchestrator-driven auto-decomposition on triage. ([#27572](https://github.com/NousResearch/hermes-agent/pull/27572))
+- Kanban swarm topology helper — `hermes kanban swarm` creates a Swarm v1 graph (root + parallel workers + gated verifier + gated synthesizer + shared blackboard). (salvages [#26791](https://github.com/NousResearch/hermes-agent/pull/26791) by @Niraven) ([#28443](https://github.com/NousResearch/hermes-agent/pull/28443))
+- Dispatcher wires review agents from the review column. ([#28449](https://github.com/NousResearch/hermes-agent/pull/28449))
+- Stale-detection for running tasks in dispatcher. ([#28452](https://github.com/NousResearch/hermes-agent/pull/28452))
+- Respawn guard blocks repeat worker storms. ([#28455](https://github.com/NousResearch/hermes-agent/pull/28455))
+- Respawn guard defers `blocker_auth` instead of auto-blocking. ([#28683](https://github.com/NousResearch/hermes-agent/pull/28683))
+- Cross-profile cron jobs surface in dashboard. ([#28457](https://github.com/NousResearch/hermes-agent/pull/28457))
+- Worker visibility endpoints: `/workers/active`, `/runs/{id}`, `/inspect`. (salvages [#23761](https://github.com/NousResearch/hermes-agent/pull/23761) by @Interstellar-code) ([#28432](https://github.com/NousResearch/hermes-agent/pull/28432))
+
+### Task configuration & scheduling
+
+- Per-task model override. ([#28364](https://github.com/NousResearch/hermes-agent/pull/28364))
+- Board-level default workdir. ([#28394](https://github.com/NousResearch/hermes-agent/pull/28394))
+- Configurable worktree paths and branches. ([#28462](https://github.com/NousResearch/hermes-agent/pull/28462))
+- Scheduled task start times. ([#28384](https://github.com/NousResearch/hermes-agent/pull/28384))
+- Scheduled status for delayed follow-ups. ([#28467](https://github.com/NousResearch/hermes-agent/pull/28467))
+- Trimmed task comments. ([#28399](https://github.com/NousResearch/hermes-agent/pull/28399))
+- Initial-status for human-ops cards. ([#28414](https://github.com/NousResearch/hermes-agent/pull/28414))
+- `max_in_progress` config to cap concurrent running tasks. ([#28420](https://github.com/NousResearch/hermes-agent/pull/28420))
+- Filter tasks by workflow fields. ([#28454](https://github.com/NousResearch/hermes-agent/pull/28454))
+- `--sort` for `hermes kanban list`. ([#28427](https://github.com/NousResearch/hermes-agent/pull/28427))
+- Optional `board` parameter on all MCP tools. ([#28444](https://github.com/NousResearch/hermes-agent/pull/28444))
+- Stamp originating ACP session_id on tasks. ([#28447](https://github.com/NousResearch/hermes-agent/pull/28447))
+- `auto_promote_children` config toggle. ([#28344](https://github.com/NousResearch/hermes-agent/pull/28344))
+- `archive --rm` to hard-delete archived tasks. ([#28355](https://github.com/NousResearch/hermes-agent/pull/28355))
+- Promote dependents when parent is archived. ([#28372](https://github.com/NousResearch/hermes-agent/pull/28372))
+- Promote blocked tasks when parent dependencies complete. ([#28377](https://github.com/NousResearch/hermes-agent/pull/28377))
+- Demote ready children when parent is reopened. ([#28382](https://github.com/NousResearch/hermes-agent/pull/28382))
+- `promote` verb for manual `todo→ready` recovery + bulk `--ids`. (salvage [#29464](https://github.com/NousResearch/hermes-agent/pull/29464)) ([#31334](https://github.com/NousResearch/hermes-agent/pull/31334))
+
+### Dashboard
+
+- Drag-to-delete trash zone + bulk delete. ([#28468](https://github.com/NousResearch/hermes-agent/pull/28468))
+- Surface per-task `model_override` in show + tool output. ([#28442](https://github.com/NousResearch/hermes-agent/pull/28442))
+- Cross-profile notification delivery via `kanban.notification_sources`. ([#28395](https://github.com/NousResearch/hermes-agent/pull/28395))
+- Scratch-workspace deletion warning for users. ([#30949](https://github.com/NousResearch/hermes-agent/pull/30949))
+- Mobile dashboard UX polish. ([#28127](https://github.com/NousResearch/hermes-agent/pull/28127))
+
+### Reliability
+
+- Worker log retention configurable. ([#27867](https://github.com/NousResearch/hermes-agent/pull/27867))
+- Configurable claim TTL. ([#28392](https://github.com/NousResearch/hermes-agent/pull/28392))
+- Fingerprint crash errors to prevent fleet-wide retry exhaustion. ([#28380](https://github.com/NousResearch/hermes-agent/pull/28380))
+- Reset failure counters on `unblock_task`. ([#28379](https://github.com/NousResearch/hermes-agent/pull/28379))
+- Detect cycles in `decompose_triage_task` sibling-link pre-validation. ([#28088](https://github.com/NousResearch/hermes-agent/pull/28088))
+- Surface unusable triage auxiliary model (auto-decompose aware). ([#27871](https://github.com/NousResearch/hermes-agent/pull/27871))
+- Align failure diagnostics with retry limit. ([#27868](https://github.com/NousResearch/hermes-agent/pull/27868))
+- Align worker terminal timeout with task runtime. ([#27864](https://github.com/NousResearch/hermes-agent/pull/27864))
+- Auto-install bundled skills (kanban-worker) on init. ([#28368](https://github.com/NousResearch/hermes-agent/pull/28368))
+- Make legacy task migration idempotent. ([#28397](https://github.com/NousResearch/hermes-agent/pull/28397))
+- Serialize DB initialization. ([#28383](https://github.com/NousResearch/hermes-agent/pull/28383))
+- Persist worker session metadata on completion. ([#28387](https://github.com/NousResearch/hermes-agent/pull/28387))
+- Pass `accept-hooks` to worker chat subprocess. ([#28393](https://github.com/NousResearch/hermes-agent/pull/28393))
+- Preserve worker tools with restricted toolsets. ([#28396](https://github.com/NousResearch/hermes-agent/pull/28396))
+- Avoid unsafe Windows worker Hermes shim resolution. ([#28398](https://github.com/NousResearch/hermes-agent/pull/28398))
+- Sync slash subcommands with live parser. ([#28376](https://github.com/NousResearch/hermes-agent/pull/28376))
+- Show scheduled kanban tasks in dashboard. ([#28400](https://github.com/NousResearch/hermes-agent/pull/28400))
+- Assign single-task kanban decompositions. ([#28401](https://github.com/NousResearch/hermes-agent/pull/28401))
+- Configurable `max_tokens` for kanban specify. ([#28374](https://github.com/NousResearch/hermes-agent/pull/28374))
+- Per-job profile support for cron. ([#28124](https://github.com/NousResearch/hermes-agent/pull/28124))
+- Codex app-server: include every Kanban-pinned path in `writable_roots`. ([#28435](https://github.com/NousResearch/hermes-agent/pull/28435))
+- Cache kanban worker guidance at session init for prompt-cache reuse. ([#28425](https://github.com/NousResearch/hermes-agent/pull/28425))
+
+---
+
+## ⚡ Performance
+
+- `openai._base_client` import deferred — 240ms / 17MB off every CLI cold start. ([#28864](https://github.com/NousResearch/hermes-agent/pull/28864))
+- Agent-loop hot-path optimizations — 47% fewer per-conversation function calls (399k → 213k for 31-turn chat). ([#28866](https://github.com/NousResearch/hermes-agent/pull/28866))
+- Compression-feasibility check deferred — 170-290ms off every agent construction. ([#28957](https://github.com/NousResearch/hermes-agent/pull/28957))
+- Adaptive subprocess poll — ~195ms off every tool call, 1+ second per turn. ([#29006](https://github.com/NousResearch/hermes-agent/pull/29006))
+- Termux TUI cold start speedup. ([#29419](https://github.com/NousResearch/hermes-agent/pull/29419))
+- Termux non-TUI cold start speedup. (salvage [#29438](https://github.com/NousResearch/hermes-agent/pull/29438)) ([#30121](https://github.com/NousResearch/hermes-agent/pull/30121))
+- Termux fast-path version + deferred bare-prompt agent startup. ([#30609](https://github.com/NousResearch/hermes-agent/pull/30609))
+- Cut hermes `--version` wall time 63% — flips head-to-head vs Codex CLI. ([#31968](https://github.com/NousResearch/hermes-agent/pull/31968))
+- Date-only timestamp + loud gateway-DB roundtrip logging — improves prompt-cache hit rate. ([#27675](https://github.com/NousResearch/hermes-agent/pull/27675))
+- Cache kanban worker guidance at session init for prompt-cache reuse. ([#28425](https://github.com/NousResearch/hermes-agent/pull/28425))
+
+---
+
+## 🔧 Tool System
+
+### Tool surface
+
+- `patch`: indent preservation, CRLF preservation, per-file failure escalation. ([#32273](https://github.com/NousResearch/hermes-agent/pull/32273))
+- `terminal`: warn at call time when `background=true` runs silently. ([#31289](https://github.com/NousResearch/hermes-agent/pull/31289))
+- `terminal`: nudge homebrewed CI pollers at the tool surface. ([#33142](https://github.com/NousResearch/hermes-agent/pull/33142))
+- `x_search`: surface degraded results + validate dates. ([#29484](https://github.com/NousResearch/hermes-agent/pull/29484))
+- `x_search`: auto-enable toolset when xAI credentials are configured. ([#27376](https://github.com/NousResearch/hermes-agent/pull/27376))
+- `computer_use`: route SOM/vision captures via auxiliary.vision. ([#30126](https://github.com/NousResearch/hermes-agent/pull/30126))
+- `transcription`: reject symlinked audio inputs. ([#10082](https://github.com/NousResearch/hermes-agent/pull/10082))
+- TTS: prevent double `[pause]` in xAI auto speech tags. ([#32237](https://github.com/NousResearch/hermes-agent/pull/32237))
+- TTS: preserve native audio outside Telegram voice delivery. ([#28512](https://github.com/NousResearch/hermes-agent/pull/28512))
+- TTS: opt-in xAI `auto_speech_tags` speech-tag pauses for natural voice replies. ([#29376](https://github.com/NousResearch/hermes-agent/pull/29376))
+- Voice: chunk oversized CLI recordings. ([#30044](https://github.com/NousResearch/hermes-agent/pull/30044))
+- Voice: honor `PULSE_SERVER` / `PIPEWIRE_REMOTE` inside Docker. ([#22534](https://github.com/NousResearch/hermes-agent/pull/22534))
+
+### Browser
+
+- All cloud browser providers (Browserbase, Anchor, Camofox, Hyperbrowser, etc.) migrated to image_gen-style plugins. (salvages [#25580](https://github.com/NousResearch/hermes-agent/pull/25580)) ([#27403](https://github.com/NousResearch/hermes-agent/pull/27403))
+- Auto-launch Chromium-family browser for CDP. ([#29106](https://github.com/NousResearch/hermes-agent/pull/29106))
+- Docker: discover agent-browser Chromium binary at boot. ([#33184](https://github.com/NousResearch/hermes-agent/pull/33184))
+
+### Image generation
+
+- **Krea** provider plugin (Krea 2 Medium + Large). ([#33236](https://github.com/NousResearch/hermes-agent/pull/33236))
+- FAL backend ported to `plugins/image_gen/fal`. (salvage [#27966](https://github.com/NousResearch/hermes-agent/pull/27966)) ([#30380](https://github.com/NousResearch/hermes-agent/pull/30380))
+- Cache xAI ephemeral URL responses to disk. ([#31759](https://github.com/NousResearch/hermes-agent/pull/31759))
+
+### Web search
+
+- **xAI Web Search** as a provider plugin. ([#29042](https://github.com/NousResearch/hermes-agent/pull/29042))
+
+### MCP
+
+- **Nous-approved MCP catalog** with interactive picker. ([#30870](https://github.com/NousResearch/hermes-agent/pull/30870))
+- **TLS client certificate (mTLS) support** for HTTP and SSE MCP servers. ([#33721](https://github.com/NousResearch/hermes-agent/pull/33721))
+- Stdin paste-back fallback for headless OAuth flow. ([#32053](https://github.com/NousResearch/hermes-agent/pull/32053))
+- `skip` at paste prompt bypasses auth without disabling server. ([#32069](https://github.com/NousResearch/hermes-agent/pull/32069))
+- Registry-aware `mcp_` prefix on both ends of round-trip. ([#31700](https://github.com/NousResearch/hermes-agent/pull/31700))
+
+---
+
+## 🧩 Skills Ecosystem
+
+### Skills system
+
+- **Skill bundles** — `/<name>` loads multiple skills. ([#28373](https://github.com/NousResearch/hermes-agent/pull/28373))
+- Skills Hub: health checks, freshness badge, and a watchdog cron. ([#32345](https://github.com/NousResearch/hermes-agent/pull/32345))
+- Opt-in AST deep diagnostics on skill writes. (salvage of [#30918](https://github.com/NousResearch/hermes-agent/pull/30918)) ([#31198](https://github.com/NousResearch/hermes-agent/pull/31198))
+- Bundled/pinned skill protection in background-review prompts. ([#28338](https://github.com/NousResearch/hermes-agent/pull/28338))
+- Show user-modified skill names in bundled skill sync summary. ([#28671](https://github.com/NousResearch/hermes-agent/pull/28671))
+- Load symlinked skill slash commands. ([#27759](https://github.com/NousResearch/hermes-agent/pull/27759))
+- Deduplicate Skills Hub search results by identifier, not name. ([#29490](https://github.com/NousResearch/hermes-agent/pull/29490))
+
+### New skills
+
+- `openhands` — delegate-to-OpenHands orchestration skill (closes [#477](https://github.com/NousResearch/hermes-agent/issues/477)) ([#32261](https://github.com/NousResearch/hermes-agent/pull/32261))
+- `code-wiki` — persistent indexed dev wiki (closes [#486](https://github.com/NousResearch/hermes-agent/issues/486)) ([#32240](https://github.com/NousResearch/hermes-agent/pull/32240))
+- `web-pentest` — OWASP recipes (closes [#400](https://github.com/NousResearch/hermes-agent/issues/400)) ([#32265](https://github.com/NousResearch/hermes-agent/pull/32265))
+- `baoyu-article-illustrator` ([#28287](https://github.com/NousResearch/hermes-agent/pull/28287))
+
+---
+
+## ☁️ Providers
+
+### xAI deep integration
+
+- **xAI Web Search** as a `plugins/web/xai/` provider plugin. ([#29042](https://github.com/NousResearch/hermes-agent/pull/29042))
+- **`hermes proxy` xAI upstream** — OpenAI-compatible local proxy backed by xai-oauth. ([#28356](https://github.com/NousResearch/hermes-agent/pull/28356))
+- **May 15 model retirement detection + `hermes migrate xai`** for grok-4 / grok-3 / grok-code-fast-1 / grok-imagine-image-pro. ([#29277](https://github.com/NousResearch/hermes-agent/pull/29277))
+- **Opt-in `auto_speech_tags`** for natural xAI TTS voice replies. ([#29376](https://github.com/NousResearch/hermes-agent/pull/29376))
+- **xai-oauth base_url pinned to x.ai origin** — closes silent credential-leak vector. ([#28952](https://github.com/NousResearch/hermes-agent/pull/28952))
+- **OpenAI-style execution guidance** applied to Grok / xai-oauth models. ([#27797](https://github.com/NousResearch/hermes-agent/pull/27797))
+- xAI: detect retired May 15 models in doctor/chat startup. ([#29277](https://github.com/NousResearch/hermes-agent/pull/29277))
+- xAI: resolve Grok Build context for OAuth. ([#30579](https://github.com/NousResearch/hermes-agent/pull/30579))
+- xAI OAuth: tier-gated 403 with API-key fallback. ([#28351](https://github.com/NousResearch/hermes-agent/pull/28351))
+- xAI OAuth: PKCE `code_challenge` echo. ([#27560](https://github.com/NousResearch/hermes-agent/pull/27560))
+- xAI OAuth: quarantine dead tokens on terminal refresh failure. ([#28116](https://github.com/NousResearch/hermes-agent/pull/28116))
+- xAI OAuth: honor `WKE=unauthenticated` disambiguator at both classifier sites. ([#30872](https://github.com/NousResearch/hermes-agent/pull/30872))
+- xAI OAuth: accept bare-code manual paste (state=None). (closes [#26923](https://github.com/NousResearch/hermes-agent/issues/26923)) ([#33880](https://github.com/NousResearch/hermes-agent/pull/33880))
+- xAI OAuth: fall back to manual paste on loopback timeout. ([#33231](https://github.com/NousResearch/hermes-agent/pull/33231))
+- xAI proxy: handle 429 rate-limit responses in proxy retry path. ([#33743](https://github.com/NousResearch/hermes-agent/pull/33743))
+
+### Other providers
+
+- **OpenAI API as a first-class provider** (distinct from Codex runtime). ([#31898](https://github.com/NousResearch/hermes-agent/pull/31898))
+- **Microsoft Entra ID** auth for Azure Foundry (with 1M Anthropic-Messages beta preserved on Bearer). (salvages [#27509](https://github.com/NousResearch/hermes-agent/pull/27509), [#27022](https://github.com/NousResearch/hermes-agent/pull/27022)) ([#28101](https://github.com/NousResearch/hermes-agent/pull/28101), [#28084](https://github.com/NousResearch/hermes-agent/pull/28084))
+- **OpenRouter** sticky routing — `session_id` passed via `extra_body` so a long-running session keeps landing on the same upstream provider. (@Cybourgeoisie) ([#33939](https://github.com/NousResearch/hermes-agent/pull/33939))
+- Nous: JWT token for inference; stop replaying invalid Nous refresh tokens. (@rewbs) ([#27663](https://github.com/NousResearch/hermes-agent/pull/27663))
+- Nous Portal: one-shot setup, status CLI, and Nous-included markers. ([#30860](https://github.com/NousResearch/hermes-agent/pull/30860))
+- Anthropic adapter: extract 7 helpers from `convert_messages_to_anthropic`. (salvage [#27784](https://github.com/NousResearch/hermes-agent/pull/27784)) ([#30386](https://github.com/NousResearch/hermes-agent/pull/30386))
+- Catalog: add `qwen3.7-max` to Alibaba + Alibaba-Coding-Plan model lists. ([#33129](https://github.com/NousResearch/hermes-agent/pull/33129))
+- opencode-go: route `qwen3.7-max` via `anthropic_messages`. (@beardthelion) ([#32780](https://github.com/NousResearch/hermes-agent/pull/32780))
+- opencode-go: expose Kimi K2 + DeepSeek reasoning controls. ([#30845](https://github.com/NousResearch/hermes-agent/pull/30845))
+- Remove Vercel AI Gateway and Vercel Sandbox.
+- MiniMax OAuth: refresh short-lived access tokens per request. ([#30619](https://github.com/NousResearch/hermes-agent/pull/30619))
+- Codex OAuth: quarantine terminal refresh errors. ([#28118](https://github.com/NousResearch/hermes-agent/pull/28118))
+- Codex: drop dead model slugs that HTTP 400 on ChatGPT Pro. ([#33424](https://github.com/NousResearch/hermes-agent/pull/33424))
+- Codex: sync `manual:device_code` pool entries on re-auth. ([#33744](https://github.com/NousResearch/hermes-agent/pull/33744))
+- MiniMax OAuth: quarantine terminal refresh errors. ([#28119](https://github.com/NousResearch/hermes-agent/pull/28119))
+
+---
+
+## 🔑 Secrets
+
+- **Bitwarden Secrets Manager** integration with lazy `bws` install. ([#30035](https://github.com/NousResearch/hermes-agent/pull/30035))
+- Bitwarden: EU Cloud + self-hosted server URL support. ([#31378](https://github.com/NousResearch/hermes-agent/pull/31378))
+- Label detected credentials with their source (Bitwarden). ([#30364](https://github.com/NousResearch/hermes-agent/pull/30364))
+
+---
+
+## 📱 Messaging Platforms (Gateway)
+
+### Gateway core
+
+- **Deliverable mode** — agents ship artifacts as native uploads from any platform (Slack/Discord/Telegram/Teams/Email). ([#27813](https://github.com/NousResearch/hermes-agent/pull/27813))
+- `hermes send` — pipe any script's output to any messaging platform. (salvage of [#19631](https://github.com/NousResearch/hermes-agent/pull/19631)) ([#27188](https://github.com/NousResearch/hermes-agent/pull/27188))
+- Debounce queued text follow-ups during active sessions. (salvage of [#31235](https://github.com/NousResearch/hermes-agent/pull/31235)) ([#31341](https://github.com/NousResearch/hermes-agent/pull/31341))
+- Plugin-transformed final_response delivered through streaming gate. ([#31433](https://github.com/NousResearch/hermes-agent/pull/31433))
+- Refresh cached agent tools on `/reload-mcp`. ([#32815](https://github.com/NousResearch/hermes-agent/pull/32815))
+- Harden kanban + provider cleanup races on long-running workloads. ([#29479](https://github.com/NousResearch/hermes-agent/pull/29479))
+
+### New / reorganized adapters
+
+- **ntfy** — 23rd platform, push notifications, plugin shape, zero core edits. (salvages [#30625](https://github.com/NousResearch/hermes-agent/pull/30625) → [#4043](https://github.com/NousResearch/hermes-agent/pull/4043)) ([#30867](https://github.com/NousResearch/hermes-agent/pull/30867))
+- **Discord** adapter migrated to bundled plugin. (salvage of [#24356](https://github.com/NousResearch/hermes-agent/pull/24356)) ([#30591](https://github.com/NousResearch/hermes-agent/pull/30591))
+- **Mattermost** adapter migrated to bundled plugin. (salvage of [#30916](https://github.com/NousResearch/hermes-agent/pull/30916)) ([#31748](https://github.com/NousResearch/hermes-agent/pull/31748))
+
+### Telegram
+
+- Edit status messages in place instead of appending. (based on [#30141](https://github.com/NousResearch/hermes-agent/pull/30141) by @qike-ms) ([#30864](https://github.com/NousResearch/hermes-agent/pull/30864))
+- Skip-STT audio path + 2GB cap via local Bot API server. ([#28541](https://github.com/NousResearch/hermes-agent/pull/28541))
+- Route image documents (.png/.jpg/.webp/.gif) through vision pipeline. ([#28519](https://github.com/NousResearch/hermes-agent/pull/28519))
+- Route audio file attachments away from STT pipeline. ([#28478](https://github.com/NousResearch/hermes-agent/pull/28478))
+- `disable_topic_auto_rename` gateway flag. ([#28523](https://github.com/NousResearch/hermes-agent/pull/28523))
+- `ignore_root_dm` config to drop messages without thread_id. ([#28536](https://github.com/NousResearch/hermes-agent/pull/28536))
+- Chat-scoped auth without sender user_id. ([#28525](https://github.com/NousResearch/hermes-agent/pull/28525))
+- Fail-closed auth fallback when `TELEGRAM_ALLOWED_USERS` is empty. ([#28494](https://github.com/NousResearch/hermes-agent/pull/28494))
+- Roll over tool progress bubbles + scope audio_file_paths. ([#28482](https://github.com/NousResearch/hermes-agent/pull/28482))
+- Avoid duplicate text after auto-TTS voice replies. ([#28509](https://github.com/NousResearch/hermes-agent/pull/28509))
+- Mark final voice reply notify-worthy so Telegram delivers it audibly. ([#28504](https://github.com/NousResearch/hermes-agent/pull/28504))
+
+### Discord
+
+- Recover Windows voice opus decoding. ([#33182](https://github.com/NousResearch/hermes-agent/pull/33182))
+- `allow_any_attachment` config to accept arbitrary file types. ([#27245](https://github.com/NousResearch/hermes-agent/pull/27245))
+- Transcribe native voice notes. ([#28993](https://github.com/NousResearch/hermes-agent/pull/28993))
+- Define UI view classes after lazy install. ([#28817](https://github.com/NousResearch/hermes-agent/pull/28817))
+
+### Signal / Matrix / Feishu / Slack / WeCom
+
+- Signal: `require_mention` filter for group chats. ([#28574](https://github.com/NousResearch/hermes-agent/pull/28574))
+- Matrix: warn on clock-skew silent message drops. ([#27330](https://github.com/NousResearch/hermes-agent/pull/27330))
+- Matrix E2EE installs full dep set; plugins respect `is_connected`. ([#31688](https://github.com/NousResearch/hermes-agent/pull/31688))
+- Feishu: require webhook auth secret + honor config extras. ([#30746](https://github.com/NousResearch/hermes-agent/pull/30746))
+- Feishu: enforce auth and chat binding for approval buttons. ([#30744](https://github.com/NousResearch/hermes-agent/pull/30744))
+- Slack: socket recovery + Windows restart dedupe. ([#28873](https://github.com/NousResearch/hermes-agent/pull/28873))
+- WeCom: safe-parse untrusted XML. ([#32442](https://github.com/NousResearch/hermes-agent/pull/32442))
+
+### DingTalk / Webhooks / Microsoft Graph
+
+- DingTalk: transcribe native voice notes. ([#28993](https://github.com/NousResearch/hermes-agent/pull/28993))
+- Webhook: enforce `INSECURE_NO_AUTH` safety rail on dynamic route reloads. ([#30863](https://github.com/NousResearch/hermes-agent/pull/30863))
+- Webhook: restrict default toolset capabilities. ([#30745](https://github.com/NousResearch/hermes-agent/pull/30745))
+- Microsoft Graph: harden webhook auth requirements. ([#30169](https://github.com/NousResearch/hermes-agent/pull/30169))
+
+---
+
+## 🖥️ CLI & TUI
+
+### CLI
+
+- `/update` slash command in CLI and TUI. ([#23854](https://github.com/NousResearch/hermes-agent/pull/23854))
+- Update auto-rollback when post-pull syntax check fails. ([#28669](https://github.com/NousResearch/hermes-agent/pull/28669))
+- `--branch` flag for `hermes update`. (@jquesnelle) ([#29591](https://github.com/NousResearch/hermes-agent/pull/29591))
+- `/exit --delete` flag to remove session on quit. (salvage of [#17665](https://github.com/NousResearch/hermes-agent/pull/17665)) ([#27101](https://github.com/NousResearch/hermes-agent/pull/27101))
+- `▶ N` indicator in status bar for running `/background` tasks. ([#27175](https://github.com/NousResearch/hermes-agent/pull/27175))
+- Live background terminal-process count in status bar. ([#32061](https://github.com/NousResearch/hermes-agent/pull/32061))
+- Append session recap to `/status` output. (salvage of [#18587](https://github.com/NousResearch/hermes-agent/pull/18587)) ([#27176](https://github.com/NousResearch/hermes-agent/pull/27176))
+- Configurable paste-collapse thresholds (TUI + CLI). (salvage [#29723](https://github.com/NousResearch/hermes-agent/pull/29723)) ([#32087](https://github.com/NousResearch/hermes-agent/pull/32087))
+- `/resume` accepts position numbers. ([#31709](https://github.com/NousResearch/hermes-agent/pull/31709))
+- Bring tool-call display back — verbose mode, specific failure reasons, todo progress. ([#31293](https://github.com/NousResearch/hermes-agent/pull/31293))
+- Validate runtime token refresh in Qwen auth status. ([#31196](https://github.com/NousResearch/hermes-agent/pull/31196))
+
+### TUI
+
+- **TUI session orchestrator** — multiple live sessions in one TUI window. (salvages [#27642](https://github.com/NousResearch/hermes-agent/pull/27642)) ([#32980](https://github.com/NousResearch/hermes-agent/pull/32980))
+- `mouse_tracking` DEC mode presets. (salvage of [#26681](https://github.com/NousResearch/hermes-agent/pull/26681) by @OutThisLife) ([#30084](https://github.com/NousResearch/hermes-agent/pull/30084))
+- Termux scrollback preservation + touch-friendly defaults. ([#28910](https://github.com/NousResearch/hermes-agent/pull/28910))
+- Full assistant text in scrollback (no history truncation). ([#28829](https://github.com/NousResearch/hermes-agent/pull/28829))
+- Preserve scrollback when branching sessions. ([#30162](https://github.com/NousResearch/hermes-agent/pull/30162))
+- Preserve Python dunder identifiers in markdown. ([#28582](https://github.com/NousResearch/hermes-agent/pull/28582))
+- Active profile shown in TUI prompt. ([#28581](https://github.com/NousResearch/hermes-agent/pull/28581))
+- Improve Charizard completion menu contrast. ([#28346](https://github.com/NousResearch/hermes-agent/pull/28346))
+- Stop slash dropdown chopping last char of `/goal`. ([#31311](https://github.com/NousResearch/hermes-agent/pull/31311))
+- Clipboard copy on linux/wayland. ([#29342](https://github.com/NousResearch/hermes-agent/pull/29342))
+- Anchor `splitReasoning` unclosed-tag regex; stop eating last paragraph. ([#29426](https://github.com/NousResearch/hermes-agent/pull/29426))
+- Surface verbose tool details. ([#30225](https://github.com/NousResearch/hermes-agent/pull/30225))
+- Load Linux skills on Termux + salvage @adybag14-cyber's Termux gates. ([#30166](https://github.com/NousResearch/hermes-agent/pull/30166))
+- Handle images with codex app-server. ([#31220](https://github.com/NousResearch/hermes-agent/pull/31220))
+- Refresh virtual transcript on viewport resize. ([#31077](https://github.com/NousResearch/hermes-agent/pull/31077))
+- Ignore late thinking deltas after completion. ([#31055](https://github.com/NousResearch/hermes-agent/pull/31055))
+- Commit composer input bursts immediately. ([#31053](https://github.com/NousResearch/hermes-agent/pull/31053))
+- Log parent gateway lifecycle exits. ([#31051](https://github.com/NousResearch/hermes-agent/pull/31051))
+- Clear TTS env var on voice off + TTS indicator in status bar. ([#30987](https://github.com/NousResearch/hermes-agent/pull/30987))
+- Pass `--expose-gc` as node argv instead of NODE_OPTIONS. ([#29998](https://github.com/NousResearch/hermes-agent/pull/29998))
+- Align composer cursorLayout with wrap-ansi to kill multiline cursor drift. ([#27489](https://github.com/NousResearch/hermes-agent/pull/27489))
+- Harden Terminal.app rendering and color paths. ([#27251](https://github.com/NousResearch/hermes-agent/pull/27251))
+- Keep `/goal` verdict out of compact status row. ([#27971](https://github.com/NousResearch/hermes-agent/pull/27971))
+- Clamp curses color 8 for 8-color terminals (Docker). ([#30260](https://github.com/NousResearch/hermes-agent/pull/30260))
+
+---
+
+## 🔒 Security & Reliability
+
+### Promptware & memory hardening
+
+- **Promptware defense** — shared threat patterns + memory load-time scan + tool-result delimiters. ([#32269](https://github.com/NousResearch/hermes-agent/pull/32269))
+- Expand memory content scanning patterns to parity with skills guard. ([#9151](https://github.com/NousResearch/hermes-agent/pull/9151))
+- Harden Skills Guard multi-word prompt patterns. (@YLChen-007) ([#26852](https://github.com/NousResearch/hermes-agent/pull/26852))
+- Split cron scanner so skill prose stops false-positiving exfil patterns. ([#32339](https://github.com/NousResearch/hermes-agent/pull/32339))
+
+### File safety
+
+- Protect Hermes control-plane files from prompt injection (`auth.json`, `config.yaml`, `webhook_subscriptions.json`, `mcp-tokens/`). (salvages @PratikRai0101's [#14157](https://github.com/NousResearch/hermes-agent/pull/14157)) ([#30397](https://github.com/NousResearch/hermes-agent/pull/30397))
+- Write-deny `<root>/.env` when running under a profile. ([#29687](https://github.com/NousResearch/hermes-agent/pull/29687))
+- Defense-in-depth read-deny on credential stores. (salvages [#17659](https://github.com/NousResearch/hermes-agent/pull/17659) + [#8055](https://github.com/NousResearch/hermes-agent/pull/8055)) ([#30721](https://github.com/NousResearch/hermes-agent/pull/30721))
+- TTS `output_path` traversal + update ZIP symlink reject. (salvage [#6693](https://github.com/NousResearch/hermes-agent/pull/6693) + [#15881](https://github.com/NousResearch/hermes-agent/pull/15881)) ([#32056](https://github.com/NousResearch/hermes-agent/pull/32056))
+- Reject symlinked audio inputs. ([#10082](https://github.com/NousResearch/hermes-agent/pull/10082))
+
+### Credential safety
+
+- Avoid persisting borrowed credential secrets — runtime env-sourced keys no longer leak into `auth.json`. ([#31416](https://github.com/NousResearch/hermes-agent/pull/31416))
+- Validate Nous Portal `inference_base_url` against host allowlist. (salvages [#27612](https://github.com/NousResearch/hermes-agent/pull/27612)) ([#30611](https://github.com/NousResearch/hermes-agent/pull/30611))
+- Harden API server key placeholder handling. ([#30738](https://github.com/NousResearch/hermes-agent/pull/30738))
+- Harden Google Chat OAuth credential persistence. (@Zyrixtrex) ([#24788](https://github.com/NousResearch/hermes-agent/pull/24788))
+- xAI OAuth: pin inference `base_url` to x.ai origin. ([#28952](https://github.com/NousResearch/hermes-agent/pull/28952))
+- Quarantine dead OAuth tokens on terminal refresh failure (xAI, Codex, MiniMax). ([#28116](https://github.com/NousResearch/hermes-agent/pull/28116), [#28118](https://github.com/NousResearch/hermes-agent/pull/28118), [#28119](https://github.com/NousResearch/hermes-agent/pull/28119))
+
+### Supply-chain
+
+- **On-demand supply-chain audit via OSV.dev** — `hermes audit`. ([#31460](https://github.com/NousResearch/hermes-agent/pull/31460))
+- `hermes update` syntax-validates critical files post-pull, auto-rollback on failure. ([#28669](https://github.com/NousResearch/hermes-agent/pull/28669))
+- Quarantine `hermes.exe` vs concurrent Windows instance. ([#26677](https://github.com/NousResearch/hermes-agent/pull/26677))
+
+### Other hardening
+
+- Restrict default webhook toolset capabilities. ([#30745](https://github.com/NousResearch/hermes-agent/pull/30745))
+- Harden Microsoft Graph webhook auth requirements. ([#30169](https://github.com/NousResearch/hermes-agent/pull/30169))
+- Require source CIDR allowlisting for public msgraph webhook binds. ([#33722](https://github.com/NousResearch/hermes-agent/pull/33722))
+- Require `API_SERVER_KEY` before dispatching API server work. ([#33232](https://github.com/NousResearch/hermes-agent/pull/33232))
+- env_passthrough: apply GHSA-rhgp-j443-p4rf filter to config.yaml path. (@roadhero) ([#27794](https://github.com/NousResearch/hermes-agent/pull/27794))
+- Dashboard + WeCom: restrict markdown link schemes; safe-parse untrusted XML. ([#32442](https://github.com/NousResearch/hermes-agent/pull/32442))
+- Salvage project-plugin RCE bypass fix from PR [#29311](https://github.com/NousResearch/hermes-agent/pull/29311) (GHSA-5qr3-c538-wm9j). ([#30837](https://github.com/NousResearch/hermes-agent/pull/30837))
+- Cross-profile soft guard on file-write tools + system-prompt hint. ([#31290](https://github.com/NousResearch/hermes-agent/pull/31290))
+- Reject unsafe tar members in Android psutil compatibility installer. ([#33742](https://github.com/NousResearch/hermes-agent/pull/33742))
+- Reject non-regular tar members during tirith auto-install. ([#33786](https://github.com/NousResearch/hermes-agent/pull/33786))
+
+---
+
+## 🪟 Native Windows (Beta Continued)
+
+- Complete Windows bootstrap — `dep_ensure` + `install.ps1` + detection. (@alt-glitch) ([#27845](https://github.com/NousResearch/hermes-agent/pull/27845))
+- `install.ps1`: strip BOM, `-Commit`/`-Tag` pin params, harden git ops. (@jquesnelle) ([#28169](https://github.com/NousResearch/hermes-agent/pull/28169))
+- Consolidate ACP browser bootstrap into `install.{sh,ps1}`. (@alt-glitch) ([#27851](https://github.com/NousResearch/hermes-agent/pull/27851))
+- `hermes update` quarantines live `hermes.exe`. ([#26677](https://github.com/NousResearch/hermes-agent/pull/26677))
+- Discord voice opus decoding on Windows. ([#33182](https://github.com/NousResearch/hermes-agent/pull/33182))
+- Windows Docker Desktop compatible compose file. (@Sunil123135) ([#31031](https://github.com/NousResearch/hermes-agent/pull/31031))
+
+---
+
+## 🖥️ Web Dashboard
+
+- Hardened Slack socket recovery + Windows restart dedupe. ([#28873](https://github.com/NousResearch/hermes-agent/pull/28873))
+- Web dashboard: migrate checkboxes to `@nous-research/ui` + design-system polish. (@austinpickett) ([#28814](https://github.com/NousResearch/hermes-agent/pull/28814))
+- Web dashboard: collapsible sidebar. (@austinpickett) ([#33421](https://github.com/NousResearch/hermes-agent/pull/33421))
+- Dashboard typography & contrast pass. (salvage of [#28832](https://github.com/NousResearch/hermes-agent/pull/28832)) ([#30714](https://github.com/NousResearch/hermes-agent/pull/30714))
+- Skills page: lazy-fetch catalog instead of bundling 34MB into JS. ([#33809](https://github.com/NousResearch/hermes-agent/pull/33809))
+
+---
+
+## 🐳 Docker
+
+- **s6-overlay container supervision** — abstract `ServiceManager` protocol (systemd/launchd/Windows/s6 backends), per-profile gateway supervision in-container, container-restart reconciliation, hadolint/shellcheck CI. (salvage of [#30136](https://github.com/NousResearch/hermes-agent/pull/30136), @benbarclay) ([#31760](https://github.com/NousResearch/hermes-agent/pull/31760))
+- Auto-redirect `gateway run` to supervised mode inside the s6 image. (@benbarclay) ([#33583](https://github.com/NousResearch/hermes-agent/pull/33583))
+- Tee supervised gateway stdout to docker logs. (@benbarclay) ([#33621](https://github.com/NousResearch/hermes-agent/pull/33621))
+- Drop `docker exec` to hermes uid before invoking the CLI. (@benbarclay) ([#33628](https://github.com/NousResearch/hermes-agent/pull/33628))
+- Align HOME for dashboard and s6 gateway services. (@Dusk1e) ([#33481](https://github.com/NousResearch/hermes-agent/pull/33481))
+- Bake build-time git SHA into image so `hermes dump` reports it. (@benbarclay) ([#33655](https://github.com/NousResearch/hermes-agent/pull/33655))
+- `hermes update` prints `docker pull` guidance instead of bogus git error. (@benbarclay) ([#33659](https://github.com/NousResearch/hermes-agent/pull/33659))
+- Upgrade Node to 22 LTS via multi-stage from `node:22-bookworm-slim`. (@benbarclay) ([#33060](https://github.com/NousResearch/hermes-agent/pull/33060))
+- Drop `build-essential` from apt install. (@benbarclay) ([#33028](https://github.com/NousResearch/hermes-agent/pull/33028))
+- Propagate env through s6 to cont-init and main CMD. ([#32412](https://github.com/NousResearch/hermes-agent/pull/32412))
+- Targeted chown to preserve host file ownership in `HERMES_HOME`. ([#33033](https://github.com/NousResearch/hermes-agent/pull/33033))
+- `mkdir HERMES_HOME` as root in stage2 before chown / privilege drop. ([#33078](https://github.com/NousResearch/hermes-agent/pull/33078))
+- chown `ui-tui` and `node_modules` on UID remap so TUI esbuild works. ([#33045](https://github.com/NousResearch/hermes-agent/pull/33045))
+- Include `anthropic`, `bedrock`, `azure-identity` extras in image. ([#30504](https://github.com/NousResearch/hermes-agent/pull/30504))
+- Stop pushing per-commit SHA tags to Docker Hub. ([#29387](https://github.com/NousResearch/hermes-agent/pull/29387))
+- Simplify Docker tagging — push both `:main` and `:latest` on main push. ([#33225](https://github.com/NousResearch/hermes-agent/pull/33225))
+- Test slicing across GH actions jobs. (@ethernet8023) ([#30575](https://github.com/NousResearch/hermes-agent/pull/30575))
+- Discover agent-browser Chromium binary at boot. ([#33184](https://github.com/NousResearch/hermes-agent/pull/33184))
+
+---
+
+## 🌐 API Server
+
+- **Session control API** — `/api/sessions/*` (list/create/read/patch/delete/fork) + SSE-streaming chat. (salvages [#29302](https://github.com/NousResearch/hermes-agent/pull/29302) by @Codename-11 + multimodal followup by @Schwartz10) ([#33134](https://github.com/NousResearch/hermes-agent/pull/33134))
+- `GET /v1/skills` and `/v1/toolsets`. ([#33016](https://github.com/NousResearch/hermes-agent/pull/33016))
+- Coerce stringified booleans in stream/store/approval payloads. (salvage [#26639](https://github.com/NousResearch/hermes-agent/pull/26639)) ([#27293](https://github.com/NousResearch/hermes-agent/pull/27293))
+- Honor `key_env` in auth-failure fallback resolution. ([#30840](https://github.com/NousResearch/hermes-agent/pull/30840))
+
+---
+
+## 🎟️ ACP (VS Code / Zed / JetBrains)
+
+- Session edit auto-approval modes. (salvage of [#27034](https://github.com/NousResearch/hermes-agent/pull/27034)) ([#27862](https://github.com/NousResearch/hermes-agent/pull/27862))
+- Enrich Zed permission cards — command in title + `reject_always`. ([#28148](https://github.com/NousResearch/hermes-agent/pull/28148))
+- Replay session history before responding to `session/load`. ([#26957](https://github.com/NousResearch/hermes-agent/pull/26957), [#26943](https://github.com/NousResearch/hermes-agent/pull/26943))
+- Plugin-transformed final_response delivered through streaming gate. ([#31433](https://github.com/NousResearch/hermes-agent/pull/31433))
+
+---
+
+## 🔌 Plugin Surface
+
+- `register_tts_provider()` plugin hook. (salvage of [#30420](https://github.com/NousResearch/hermes-agent/pull/30420)) ([#31745](https://github.com/NousResearch/hermes-agent/pull/31745))
+- `register_transcription_provider()` hook + `stt.providers` command-provider registry. (salvage of [#30493](https://github.com/NousResearch/hermes-agent/pull/30493)) ([#31907](https://github.com/NousResearch/hermes-agent/pull/31907))
+- `register_auxiliary_task()` in PluginContext API. (salvage [#29817](https://github.com/NousResearch/hermes-agent/pull/29817)) ([#31177](https://github.com/NousResearch/hermes-agent/pull/31177))
+- Bundled `security-guidance` plugin. ([#33131](https://github.com/NousResearch/hermes-agent/pull/33131))
+- Discord and Mattermost migrated to bundled plugins. ([#30591](https://github.com/NousResearch/hermes-agent/pull/30591), [#31748](https://github.com/NousResearch/hermes-agent/pull/31748))
+- ntfy as platform plugin. ([#30867](https://github.com/NousResearch/hermes-agent/pull/30867))
+- Surface category-namespaced plugins in `hermes plugins list`. ([#27187](https://github.com/NousResearch/hermes-agent/pull/27187))
+- Plugin discovery failures raised to WARNING level. ([#28318](https://github.com/NousResearch/hermes-agent/pull/28318))
+- `hermes_plugins` included in gateway.log component filter. ([#28313](https://github.com/NousResearch/hermes-agent/pull/28313))
+- Seed plugin extras before `is_connected` gate. ([#31703](https://github.com/NousResearch/hermes-agent/pull/31703))
+- Dashboard: allowlist plugin assets + denylist subprocess-influencing env vars. ([#32277](https://github.com/NousResearch/hermes-agent/pull/32277))
+
+---
+
+## 📦 Distribution & Install
+
+- Install-method stamping + Docker detection. (@alt-glitch) ([#27843](https://github.com/NousResearch/hermes-agent/pull/27843))
+- Nix `#messaging` and `#full` package variants. (@alt-glitch) ([#33108](https://github.com/NousResearch/hermes-agent/pull/33108))
+- Pre-load messaging gateway deps via `--extra messaging`. (salvage [#26394](https://github.com/NousResearch/hermes-agent/pull/26394)) ([#27558](https://github.com/NousResearch/hermes-agent/pull/27558))
+- Avoid piping installer directly into `iex` (Windows). ([#28347](https://github.com/NousResearch/hermes-agent/pull/28347))
+- Ship bundled skills in wheel. ([#28421](https://github.com/NousResearch/hermes-agent/pull/28421))
+- Ship dashboard plugin assets in wheel. ([#28406](https://github.com/NousResearch/hermes-agent/pull/28406))
+- Make Camofox lazy-installed instead of eager. ([#27055](https://github.com/NousResearch/hermes-agent/pull/27055))
+- Wire STT lazy-install into transcription_tools.py. ([#30256](https://github.com/NousResearch/hermes-agent/pull/30256))
+
+---
+
+## 🐛 Notable Bug Fixes (highlights only)
+
+- Match bare custom provider by active base URL in `hermes model`. ([#28908](https://github.com/NousResearch/hermes-agent/pull/28908))
+- Route `auxiliary.vision.provider=openai` to api.openai.com, skip text-only main. ([#31452](https://github.com/NousResearch/hermes-agent/pull/31452))
+- Lint: skip per-file shell linter when LSP will handle the file. ([#29054](https://github.com/NousResearch/hermes-agent/pull/29054))
+- Treat empty credential pool entries as unauthenticated in `/model` picker. ([#28312](https://github.com/NousResearch/hermes-agent/pull/28312))
+- Reverted within window: Firecrawl integration tag, send_message @username auto-mentions, Telegram quick-command-only menus, Telegram pin-on-turn.
+
+---
+
+## 🧪 Testing
+
+- Disarm lazy-install probe so `_HAS_FASTER_WHISPER` patches work. ([#30334](https://github.com/NousResearch/hermes-agent/pull/30334))
+- Cover default board dashboard pin. ([#28361](https://github.com/NousResearch/hermes-agent/pull/28361))
+- Cover `_task_dict` `task_age` fallback. ([#28365](https://github.com/NousResearch/hermes-agent/pull/28365))
+- Allowlist `tmp_path` for `kanban_notify` artifact delivery tests. ([#30851](https://github.com/NousResearch/hermes-agent/pull/30851), [#30852](https://github.com/NousResearch/hermes-agent/pull/30852))
+- Cover null output stream terminal events in Codex. ([#33137](https://github.com/NousResearch/hermes-agent/pull/33137))
+
+---
+
+## 📚 Documentation
+
+- **30-day docs overhaul** — full correctness audit, every PR in the window covered, Nous Portal weave, sidebar reorg. ([#33782](https://github.com/NousResearch/hermes-agent/pull/33782))
+- Dedicated Nous Portal integration page and setup guide. ([#31296](https://github.com/NousResearch/hermes-agent/pull/31296))
+- Providers: move Nous Portal first, Google Gemini OAuth last. ([#31287](https://github.com/NousResearch/hermes-agent/pull/31287))
+- `session_search` rewrite for single-shape tool. ([#27840](https://github.com/NousResearch/hermes-agent/pull/27840))
+- Kanban: document failure_limit, max_retries, inline create shortcuts, goals & kanban settings. ([#28357](https://github.com/NousResearch/hermes-agent/pull/28357), [#28358](https://github.com/NousResearch/hermes-agent/pull/28358), [#28359](https://github.com/NousResearch/hermes-agent/pull/28359), [#28360](https://github.com/NousResearch/hermes-agent/pull/28360), [#28362](https://github.com/NousResearch/hermes-agent/pull/28362))
+- Kanban Codex lane skill. ([#28430](https://github.com/NousResearch/hermes-agent/pull/28430))
+- xAI OAuth: note X Premium+ also unlocks Grok OAuth. ([#29055](https://github.com/NousResearch/hermes-agent/pull/29055))
+- Docs site: Docker audio bridge notes, "Installing more tools in the container", xurl auth HOME in Docker.
+- Email: clarify gateway vs Himalaya setup. (@helix4u) ([#33634](https://github.com/NousResearch/hermes-agent/pull/33634))
+- Auth docs: replace stale `hermes login` references with `hermes auth add`. ([#32859](https://github.com/NousResearch/hermes-agent/pull/32859))
+
+---
+
+## 👥 Contributors
+
+### Core
+- @teknium1 (lead)
+
+### Notable salvages & cherry-picks
+
+- **@benbarclay** — s6-overlay container supervision (29 commits salvaged), Node 22 LTS upgrade, build-essential cleanup, `gateway run` auto-redirect in s6, tee supervised stdout to docker logs, `hermes update` Docker guidance, build-time SHA stamping
+- **@OutThisLife** — `mouse_tracking` DEC mode presets
+- **@jquesnelle** — Windows installer hardening, `--branch` flag for `hermes update`, install.ps1 BOM strip / commit-pin
+- **@alt-glitch** — Windows `dep_ensure` bootstrap, Nix package variants (`.#messaging`, `.#full`), install-method stamping, ACP browser bootstrap consolidation
+- **@austinpickett** — `/update` slash command, dashboard checkboxes → `@nous-research/ui`, mobile dashboard polish, collapsible sidebar
+- **@ethernet8023** — CI test slicing across GH Actions jobs, TUI clipboard copy fix
+- **@kshitijk4poor** — doctor section banner + fail-and-issue helpers extraction, post-tag salvage cluster (curator-fallout, kanban SQLite hardening, install world-readable uv dirs, xAI bare-code paste)
+- **@rewbs** — Nous JWT inference switch + refresh-token replay fix
+- **@Codename-11** + **@Schwartz10** — session control API (REST + SSE + multimodal followup)
+- **@Niraven** — kanban swarm topology helper
+- **@Interstellar-code** — kanban worker visibility endpoints
+- **@adybag14-cyber** — termux cold-start optimizations (multiple PRs)
+- **@qike-ms** — Telegram in-place status edits design
+- **@sprmn24** — ntfy adapter
+- **@Jaaneek** — xAI Web Search provider plugin
+- **@yannsunn** — xAI upstream adapter for `hermes proxy`
+- **@Cybourgeoisie** — OpenRouter sticky routing via session_id
+- **@memosr** — Nous Portal base_url allowlist validation
+- **@Sunil123135** — Windows Docker Desktop compose file
+- **@Dusk1e** — Docker HOME alignment for dashboard + s6 gateway services
+- **@beardthelion** — opencode-go anthropic_messages routing
+- **@YLChen-007** — Skills Guard multi-word prompt patterns
+- **@roadhero** — env_passthrough GHSA-rhgp-j443-p4rf filter
+- **@Zyrixtrex** — Google Chat OAuth credential persistence hardening
+- **@briandevans**, **@tomqiaozc** — defense-in-depth read-deny on credential stores
+- **@PratikRai0101** — control-plane file write protection
+- **@helix4u**, **@Bartok9**, **@zccyman** — auxiliary fallback ladder components
+- **@ms-alan**, **@ticketclosed-wontfix**, **@donovan-yohan** — TUI session orchestrator + follow-ups
+- **@daimon-nous[bot]** — cron per-job profile support
+- **@bisko** — re-pad `reasoning_content` on cross-provider fallback
+
+### All Contributors
+
+@02356abc, @0xchainer, @0xDevNinja, @0xjackyang, @0xsir0000, @0z1-ghb, @8bit64k, @aaronlab, @AceWattGit,
+@ACR27, @adam91holt, @AdamPlatin123, @Ade5954, @AdityaRajeshGadgil, @adybag14-cyber, @AhmetArif0, @ai-hana-ai,
+@alaamohanad169-ship-it, @alber70g, @albert748, @alt-glitch, @aqilaziz, @argabor, @asdlem, @austinpickett,
+@avifenesh, @awizemann, @B0Tch1, @Bartok9, @BaxBit, @Beandon13, @beardthelion, @benbarclay, @bensargotest-sys,
+@binhnt92, @bird, @bisko, @BlackishGreen33, @booker1207, @bradhallett, @briandevans, @Brixyy, @brndnsvr,
+@BROCCOLO1D, @btorresgil, @burjorjee, @carltonawong, @Carry00, @chaconne67, @chdlc, @chromalinx, @ChyuWei,
+@CipherFrame, @cmullins70, @CNSeniorious000, @codeblackhole1024, @Codename-11, @colin-chang, @counterposition,
+@cresslank, @CryptoByz, @cyb0rgk1tty, @Cybourgeoisie, @daizhonggeng, @darvsum, @davidcampbelldc, @deas,
+@dgians, @dillweed, @DoGMaTiiC, @donovan-yohan, @draplater, @Drexuxux, @dskwe, @dsr-restyn, @Dusk1e,
+@dusterbloom, @duyua9, @egilewski, @el-analista, @eliteworkstation94-ai, @eloklam, @EloquentBrush0x, @emonty,
+@emozilla, @erhnysr, @erikengervall, @Erosika, @ether-btc, @ethernet8023, @EvilHumphrey, @fabiosiqueira,
+@falasi, @falconexe, @fardoche6, @felix-windsor, @Fewmanism, @ffr31mr, @flamiinngo, @flanny7, @flooryyyy,
+@fonhal, @francip, @fujinice, @gianfrancopiana, @glennc, @Glucksberg, @godlin-gh, @Grogger, @guillaumemeyer,
+@Gutslabs, @H-Ali13381, @hanzckernel, @haran2001, @hawknewton, @hayka-pacha, @hehehe0803, @helix4u, @HenkDz,
+@Hermes, @hermesagent26, @Hinotoi-agent, @hongchen1993, @honor2030, @houenyang-momo, @ht1072, @hueilau,
+@iamfoz, @ilonagaja509-glitch, @InB4DevOps, @indigokarasu, @Interstellar-code, @iqdoctor, @iRonin, @Jaaneek,
+@JabberELF, @jacevys, @jackey8616, @jackjin1997, @jdelmerico, @jfuenmayor, @Jiahui-Gu, @JimLiu, @joe102084,
+@JohnC1009, @jonpol01, @Jpalmer95, @Julientalbot, @justemu, @justincc, @jvinals, @karthikeyann, @kasunvinod,
+@kchuang1015, @kenyonxu, @khungate, @kiranvk-2011, @kjames2001, @konsisumer, @kpadilha, @kriscolab,
+@krislidimo, @kronexoi, @kshitijk4poor, @kunci115, @Kylejeong2, @kylekahraman, @LaPhilosophie, @leeseoki0,
+@lemassykoi, @Lempkey, @LeonJS, @LeonSGP43, @lidge-jun, @LifeJiggy, @liuhao1024, @LizerAIDev, @loicnico96,
+@loongfay, @m0n3r0, @malaiwah, @matthewlai, @mavrickdeveloper, @maxmilian, @McClean-Edison, @memosr,
+@Mind-Dragon, @momowind, @MoonJuhan, @MoonRay305, @moortekweb-art, @MorAlekss, @ms-alan, @Nami4D,
+@nehaaprasaad, @nekwo, @nftpoetrist, @NickLarcombe, @nidhi-singh02, @Niraven, @nnnet, @noctilust, @novax635,
+@nthrow, @nv-kasikritc, @nycomar, @OCWC22, @oemtalks, @OmX, @ooovenenoso, @orcool, @oseftg, @outsourc-e,
+@OutThisLife, @Paperclip, @PaTTeeL, @pepelax, @phoenixshen, @Pluviobyte, @pnascimento9596, @pochi-gio, @pr7426,
+@PratikRai0101, @Prithvi1994, @psionic73, @ptichalouf, @Que0x, @QuenVix, @quocanh261997, @qWaitCrypto, @Qwinty,
+@r266-tech, @rak135, @rdasilva1016-ui, @rewbs, @roadhero, @rodrigoeqnit, @RonHillDev, @roycepersonalassistant,
+@rudi193-cmd, @RyanRana, @sadiksaifi, @samahn0601, @samggggflynn, @SamuelZ12, @sanghyuk-seo-nexcube,
+@Saurav0989, @savanne-kham, @Schrotti77, @Schwartz10, @SerenityTn, @sgtworkman, @sharziki, @shaun0927,
+@shellybotmoyer, @shunsuke-hikiyama, @SimbaKingjoe, @SimoKiihamaki, @sir-ad, @Slimydog21, @slowtokki0409,
+@Soju06, @someaka, @soynchux, @sprmn24, @Stark-X, @steezkelly, @stepanov1975, @stephenschoettler,
+@stevehq26-bot, @steveonjava, @Strontvod, @subtract0, @Sunil123135, @superearn-fisher, @Sylw3ster, @tchanee,
+@that-ambuj, @thedavidmurray, @TheOnlyMika, @therahul-yo, @thewillhuang, @ticketclosed-wontfix, @Timur00Kh,
+@tomqiaozc, @Tosko4, @Tranquil-Flow, @tw2818, @uzunkuyruk, @vaddisrinivas, @vanthinh6886, @vgocoder,
+@victorGPT, @vynxevainglory-ai, @waefrebeorn, @walli, @wangpuv, @wanwan2qq, @wesleysimplicio, @worlldz,
+@wpengpeng168, @WuKongAI-CMU, @wuli666, @Wysie, @wysie, @xxxigm, @yannsunn, @YanzhongSu, @YarrowQiao, @ygd58,
+@YLChen-007, @yoniebans, @yu-xin-c, @YuanHanzhong, @zapabob, @zccyman, @ziliangpeng, @zwolniony, @Zyrixtrex
+
+---
+
+**Full Changelog**: [v2026.5.16...v2026.5.28](https://github.com/NousResearch/hermes-agent/compare/v2026.5.16...v2026.5.28)
@@ -1,7 +1,7 @@
 {
  "id": "hermes-agent",
  "name": "Hermes Agent",
-  "version": "0.14.0",
+  "version": "0.15.0",
  "description": "Self-improving open-source AI agent by Nous Research with ACP editor integration, persistent memory, skills, and rich tool support.",
  "repository": "https://github.com/NousResearch/hermes-agent",
  "website": "https://hermes-agent.nousresearch.com/docs/user-guide/features/acp",
@@ -9,7 +9,7 @@
  "license": "MIT",
  "distribution": {
    "uvx": {
-      "package": "hermes-agent[acp]==0.14.0",
+      "package": "hermes-agent[acp]==0.15.0",
      "args": ["hermes-acp"]
    }
  }
@@ -4,3 +4,5 @@ These modules contain pure utility functions and self-contained classes
 that were previously embedded in the 3,600-line run_agent.py. Extracting
 them makes run_agent.py focused on the AIAgent orchestrator class.
 """
+
+from . import jiter_preload as _jiter_preload  # noqa: F401
@@ -1522,6 +1522,7 @@ def init_agent(
                platform=agent.platform or "cli",
                model=agent.model,
                context_length=getattr(agent.context_compressor, "context_length", 0),
+                conversation_id=getattr(agent, "_gateway_session_key", None),
            )
        except Exception as _ce_err:
            _ra().logger.debug("Context engine on_session_start: %s", _ce_err)
@@ -1994,6 +1994,36 @@ def copy_reasoning_content_for_api(agent, source_msg: dict, api_msg: dict) -> No
    api_msg.pop("reasoning_content", None)


+def reapply_reasoning_echo_for_provider(agent, api_messages: list) -> int:
+    """Re-pad assistant turns with reasoning_content for the active provider.
+
+    ``api_messages`` is built once, before the retry loop, while the *primary*
+    provider is active.  If a mid-conversation fallback then switches to a
+    require-side provider (DeepSeek / Kimi / MiMo thinking mode), assistant
+    turns that were built when the prior provider did NOT need the echo-back go
+    out without ``reasoning_content`` and the new provider rejects them with
+    HTTP 400 ("The reasoning_content in the thinking mode must be passed back").
+
+    Calling this immediately before building the request kwargs re-applies the
+    pad against the *current* provider.  It is idempotent and a no-op unless
+    ``_needs_thinking_reasoning_pad()`` is True for the active provider, so it
+    is safe to call every iteration and covers every fallback path.
+
+    Returns the number of assistant turns that gained reasoning_content.
+    """
+    if not agent._needs_thinking_reasoning_pad():
+        return 0
+    padded = 0
+    for api_msg in api_messages:
+        if api_msg.get("role") != "assistant":
+            continue
+        if api_msg.get("reasoning_content"):
+            continue
+        copy_reasoning_content_for_api(agent, api_msg, api_msg)
+        if api_msg.get("reasoning_content"):
+            padded += 1
+    return padded
+

 def _iter_pool_sockets(client: Any):
    """Yield raw sockets reachable from an OpenAI/httpx client pool.
@@ -77,16 +77,16 @@ ADAPTIVE_EFFORT_MAP = {
 # xhigh as a distinct level between high and max; older adaptive-thinking
 # models (4.6) reject it with a 400.  Keep this substring list in sync with
 # the Anthropic migration guide as new model families ship.
-_XHIGH_EFFORT_SUBSTRINGS = ("4-7", "4.7")
+_XHIGH_EFFORT_SUBSTRINGS = ("4-7", "4.7", "4-8", "4.8")

 # Models where extended thinking is deprecated/removed (4.6+ behavior: adaptive
 # is the only supported mode; 4.7 additionally forbids manual thinking entirely
 # and drops temperature/top_p/top_k).
-_ADAPTIVE_THINKING_SUBSTRINGS = ("4-6", "4.6", "4-7", "4.7")
+_ADAPTIVE_THINKING_SUBSTRINGS = ("4-6", "4.6", "4-7", "4.7", "4-8", "4.8")

 # Models where temperature/top_p/top_k return 400 if set to non-default values.
 # This is the Opus 4.7 contract; future 4.x+ models are expected to follow it.
-_NO_SAMPLING_PARAMS_SUBSTRINGS = ("4-7", "4.7")
+_NO_SAMPLING_PARAMS_SUBSTRINGS = ("4-7", "4.7", "4-8", "4.8")
 _FAST_MODE_SUPPORTED_SUBSTRINGS = ("opus-4-6", "opus-4.6")

 # ── Max output token limits per Anthropic model ───────────────────────
@@ -94,6 +94,8 @@ _FAST_MODE_SUPPORTED_SUBSTRINGS = ("opus-4-6", "opus-4.6")
 # max_tokens as a mandatory field.  Previously we hardcoded 16384, which
 # starves thinking-enabled models (thinking tokens count toward the limit).
 _ANTHROPIC_OUTPUT_LIMITS = {
+    # Claude 4.8
+    "claude-opus-4-8":   128_000,
    # Claude 4.7
    "claude-opus-4-7":   128_000,
    # Claude 4.6
@@ -2244,11 +2244,15 @@ def _is_payment_error(exc: Exception) -> bool:
    # but sometimes wrap them in 429 or other codes.
    # Daily quota exhaustion from Bedrock, Vertex AI, and similar providers
    # uses different language but is semantically identical to credit exhaustion.
-    if status in {402, 429, None}:
+    if status in {402, 404, 429, None}:
        if any(kw in err_lower for kw in (
            "credits", "insufficient funds",
            "can only afford", "billing",
            "payment required",
+            "out of funds", "run out of funds",
+            "balance_depleted", "no usable credits",
+            "model_not_supported_on_free_tier",
+            "not available on the free tier",
            # Daily / monthly / weekly quota exhaustion keywords
            "quota exceeded", "quota_exceeded",
            "too many tokens per day", "daily limit",
@@ -2260,6 +2264,18 @@ def _is_payment_error(exc: Exception) -> bool:
    return False


+def _nous_portal_account_has_fresh_paid_access() -> bool:
+    """Return True only when the fresh Nous account API says paid access is allowed."""
+    try:
+        from hermes_cli.nous_account import get_nous_portal_account_info
+
+        account_info = get_nous_portal_account_info(force_fresh=True)
+        return account_info.paid_service_access is True
+    except Exception as exc:
+        logger.debug("Auxiliary Nous paid-entitlement refresh check failed: %s", exc)
+        return False
+
+
 def _is_rate_limit_error(exc: Exception) -> bool:
    """Detect rate-limit errors that warrant provider fallback.

@@ -2288,6 +2304,10 @@ def _is_rate_limit_error(exc: Exception) -> bool:
        if not any(kw in err_lower for kw in (
            "credits", "insufficient funds", "billing",
            "payment required", "can only afford",
+            "out of funds", "run out of funds",
+            "balance_depleted", "no usable credits",
+            "model_not_supported_on_free_tier",
+            "not available on the free tier",
        )):
            return True
    return False
@@ -4937,6 +4957,41 @@ def call_llm(
            resolved_provider == "nous"
            or base_url_host_matches(_base_info, "inference-api.nousresearch.com")
        )
+        if (
+            _is_payment_error(first_err)
+            and client_is_nous
+            and _nous_portal_account_has_fresh_paid_access()
+        ):
+            refreshed_client, refreshed_model = _refresh_nous_auxiliary_client(
+                cache_provider=resolved_provider or "nous",
+                model=final_model,
+                async_mode=False,
+                base_url=resolved_base_url,
+                api_key=resolved_api_key,
+                api_mode=resolved_api_mode,
+                main_runtime=main_runtime,
+                is_vision=(task == "vision"),
+            )
+            if refreshed_client is not None:
+                logger.info(
+                    "Auxiliary %s: refreshed Nous runtime credentials after paid account check, retrying",
+                    task or "call",
+                )
+                if refreshed_model and refreshed_model != kwargs.get("model"):
+                    kwargs["model"] = refreshed_model
+                try:
+                    return _validate_llm_response(
+                        refreshed_client.chat.completions.create(**kwargs), task)
+                except Exception as retry_err:
+                    if not (
+                        _is_auth_error(retry_err)
+                        or _is_payment_error(retry_err)
+                        or _is_connection_error(retry_err)
+                        or _is_rate_limit_error(retry_err)
+                    ):
+                        raise
+                    first_err = retry_err
+
        if _is_auth_error(first_err) and client_is_nous:
            refreshed_client, refreshed_model = _refresh_nous_auxiliary_client(
                cache_provider=resolved_provider or "nous",
@@ -5339,6 +5394,40 @@ async def async_call_llm(
            resolved_provider == "nous"
            or base_url_host_matches(_client_base, "inference-api.nousresearch.com")
        )
+        if (
+            _is_payment_error(first_err)
+            and client_is_nous
+            and _nous_portal_account_has_fresh_paid_access()
+        ):
+            refreshed_client, refreshed_model = _refresh_nous_auxiliary_client(
+                cache_provider=resolved_provider or "nous",
+                model=final_model,
+                async_mode=True,
+                base_url=resolved_base_url,
+                api_key=resolved_api_key,
+                api_mode=resolved_api_mode,
+                is_vision=(task == "vision"),
+            )
+            if refreshed_client is not None:
+                logger.info(
+                    "Auxiliary %s (async): refreshed Nous runtime credentials after paid account check, retrying",
+                    task or "call",
+                )
+                if refreshed_model and refreshed_model != kwargs.get("model"):
+                    kwargs["model"] = refreshed_model
+                try:
+                    return _validate_llm_response(
+                        await refreshed_client.chat.completions.create(**kwargs), task)
+                except Exception as retry_err:
+                    if not (
+                        _is_auth_error(retry_err)
+                        or _is_payment_error(retry_err)
+                        or _is_connection_error(retry_err)
+                        or _is_rate_limit_error(retry_err)
+                    ):
+                        raise
+                    first_err = retry_err
+
        if _is_auth_error(first_err) and client_is_nous:
            refreshed_client, refreshed_model = _refresh_nous_auxiliary_client(
                cache_provider=resolved_provider or "nous",
@@ -403,13 +403,13 @@ def interruptible_api_call(agent, api_kwargs: dict):
                _elapsed, _ttfb_timeout, api_kwargs.get("model", "unknown"),
            )
            if _silent_hint:
-                agent._emit_status(
+                agent._buffer_status(
                    f"⚠️ No first byte from provider in {int(_elapsed)}s "
                    f"(codex stream, model: {api_kwargs.get('model', 'unknown')}). "
                    f"Reconnecting. {_silent_hint}"
                )
            else:
-                agent._emit_status(
+                agent._buffer_status(
                    f"⚠️ No first byte from provider in {int(_elapsed)}s "
                    f"(codex stream, model: {api_kwargs.get('model', 'unknown')}). "
                    f"Reconnecting."
@@ -455,7 +455,7 @@ def interruptible_api_call(agent, api_kwargs: dict):
                api_kwargs.get("model", "unknown"),
                f"{_est_tokens_for_codex_watchdog:,}",
            )
-            agent._emit_status(
+            agent._buffer_status(
                f"⚠️ Codex stream sent no events for {int(_event_stale_elapsed)}s "
                f"after first byte (model: {api_kwargs.get('model', 'unknown')}). "
                f"Reconnecting."
@@ -493,13 +493,13 @@ def interruptible_api_call(agent, api_kwargs: dict):
                api_kwargs.get("model", "unknown"), f"{_est_ctx:,}",
            )
            if _silent_hint:
-                agent._emit_status(
+                agent._buffer_status(
                    f"⚠️ No response from provider for {int(_elapsed)}s "
                    f"(non-streaming, model: {api_kwargs.get('model', 'unknown')}). "
                    f"{_silent_hint}"
                )
            else:
-                agent._emit_status(
+                agent._buffer_status(
                    f"⚠️ No response from provider for {int(_elapsed)}s "
                    f"(non-streaming, model: {api_kwargs.get('model', 'unknown')}). "
                    f"Aborting call."
@@ -1262,7 +1262,7 @@ def try_activate_fallback(agent, reason: "FailoverReason | None" = None) -> bool
                api_mode=agent.api_mode,
            )

-        agent._emit_status(
+        agent._buffer_status(
            f"🔄 Primary model failed — switching to fallback: "
            f"{fb_model} via {fb_provider}"
        )
@@ -2251,7 +2251,7 @@ def interruptible_streaming_api_call(agent, api_kwargs: dict, *, on_first_delta=
                            mid_tool_call=False,
                            diag=request_client_holder.get("diag"),
                        )
-                        agent._emit_status(
+                        agent._buffer_status(
                            "❌ Provider returned malformed streaming data after "
                            f"{_max_stream_retries + 1} attempts. "
                            "The provider may be experiencing issues — "
@@ -2358,7 +2358,7 @@ def interruptible_streaming_api_call(agent, api_kwargs: dict, *, on_first_delta=
                _stale_elapsed, _stream_stale_timeout,
                api_kwargs.get("model", "unknown"), f"{_est_ctx:,}",
            )
-            agent._emit_status(
+            agent._buffer_status(
                f"⚠️ No response from provider for {int(_stale_elapsed)}s "
                f"(model: {api_kwargs.get('model', 'unknown')}, "
                f"context: ~{_est_ctx:,} tokens). "
@@ -71,7 +71,12 @@ class ContextEngine(ABC):
    def update_from_response(self, usage: Dict[str, Any]) -> None:
        """Update tracked token usage from an API response.

-        Called after every LLM call with the usage dict from the response.
+        Called after every LLM call with a normalized usage dict. The legacy
+        keys ``prompt_tokens``, ``completion_tokens``, and ``total_tokens``
+        are always present. Newer hosts also include canonical buckets:
+        ``input_tokens``, ``output_tokens``, ``cache_read_tokens``,
+        ``cache_write_tokens``, and ``reasoning_tokens``. Engines should
+        treat those fields as optional for compatibility with older hosts.
        """

    @abstractmethod
@@ -421,6 +421,7 @@ def compress_context(
                agent.session_id or "",
                boundary_reason="compression",
                old_session_id=_old_sid,
+                conversation_id=getattr(agent, "_gateway_session_key", None),
            )
    except Exception as _ce_err:
        logger.debug("context engine on_session_start (compression): %s", _ce_err)
@@ -49,9 +49,8 @@ from agent.model_metadata import (
    MINIMUM_CONTEXT_LENGTH,
    estimate_messages_tokens_rough,
    estimate_request_tokens_rough,
-    get_next_probe_tier,
+    get_context_length_from_provider_error,
    parse_available_output_tokens_from_error,
-    parse_context_limit_from_error,
    save_context_length,
 )
 from agent.nous_rate_guard import (
@@ -127,6 +126,106 @@ def _ra():
    return run_agent


+def _nous_entitlement_message(capability: str) -> str:
+    try:
+        from hermes_cli.nous_account import (
+            format_nous_portal_entitlement_message,
+            get_nous_portal_account_info,
+        )
+
+        account_info = get_nous_portal_account_info(force_fresh=True)
+        message = format_nous_portal_entitlement_message(
+            account_info,
+            capability=capability,
+        )
+        return message or ""
+    except Exception:
+        return ""
+
+
+def _print_nous_entitlement_guidance(agent, capability: str) -> bool:
+    message = _nous_entitlement_message(capability)
+    if not message:
+        return False
+    for line in message.splitlines():
+        agent._vprint(f"{agent.log_prefix}   💡 {line}", force=True)
+    return True
+
+
+def _is_nous_inference_route(provider: str, base_url: str) -> bool:
+    provider = (provider or "").strip().lower()
+    if provider == "nous":
+        return True
+    base = str(base_url or "")
+    return (
+        base_url_host_matches(base, "inference-api.nousresearch.com")
+        or base_url_host_matches(base, "inference.nousresearch.com")
+    )
+
+
+def _billing_or_entitlement_message(
+    *,
+    capability: str,
+    provider: str,
+    base_url: str,
+    model: str,
+) -> str:
+    if _is_nous_inference_route(provider, base_url):
+        return _nous_entitlement_message(capability)
+
+    provider_label = (provider or "").strip() or "the selected provider"
+    model_label = (model or "").strip() or "the selected model"
+    lines = [
+        (
+            f"{provider_label} reported that billing, credits, or account "
+            f"entitlement is exhausted for {model_label}."
+        ),
+        "Add credits or update billing with that provider, then retry.",
+    ]
+    if base_url_host_matches(str(base_url or ""), "openrouter.ai"):
+        lines.append("OpenRouter credits: https://openrouter.ai/settings/credits")
+    lines.append("You can switch providers temporarily with /model <model> --provider <provider>.")
+    return "\n".join(lines)
+
+
+def _print_billing_or_entitlement_guidance(
+    agent,
+    *,
+    capability: str,
+    provider: str,
+    base_url: str,
+    model: str,
+) -> bool:
+    message = _billing_or_entitlement_message(
+        capability=capability,
+        provider=provider,
+        base_url=base_url,
+        model=model,
+    )
+    if not message:
+        return False
+    for line in message.splitlines():
+        agent._vprint(f"{agent.log_prefix}   💡 {line}", force=True)
+    return True
+
+
+def _try_refresh_nous_paid_entitlement_credentials(agent) -> bool:
+    """Refresh Nous runtime credentials after a fresh paid-entitlement check."""
+    try:
+        from hermes_cli.auth import NOUS_INFERENCE_AUTH_MODE_LEGACY
+        from hermes_cli.nous_account import get_nous_portal_account_info
+
+        account_info = get_nous_portal_account_info(force_fresh=True)
+        if account_info.paid_service_access is not True:
+            return False
+        return agent._try_refresh_nous_client_credentials(
+            force=False,
+            inference_auth_mode=NOUS_INFERENCE_AUTH_MODE_LEGACY,
+        )
+    except Exception:
+        return False
+
+
 def _restore_or_build_system_prompt(agent, system_message, conversation_history):
    """Restore the cached system prompt from the session DB or build it fresh.

@@ -1017,6 +1116,7 @@ def run_conversation(
        codex_auth_retry_attempted=False
        anthropic_auth_retry_attempted=False
        nous_auth_retry_attempted=False
+        nous_paid_entitlement_refresh_attempted=False
        copilot_auth_retry_attempted=False
        thinking_sig_retry_attempted = False
        invalid_encrypted_content_retry_attempted = False
@@ -1050,17 +1150,18 @@ def run_conversation(
                            f"Nous Portal rate limit active — "
                            f"resets in {_fmt_nous_remaining(_nous_remaining)}."
                        )
-                        agent._vprint(
-                            f"{agent.log_prefix}⏳ {_nous_msg} Trying fallback...",
-                            force=True,
+                        agent._buffer_vprint(
+                            f"⏳ {_nous_msg} Trying fallback..."
                        )
-                        agent._emit_status(f"⏳ {_nous_msg}")
+                        agent._buffer_status(f"⏳ {_nous_msg}")
                        if agent._try_activate_fallback():
                            retry_count = 0
                            compression_attempts = 0
                            primary_recovery_attempted = False
                            continue
-                        # No fallback available — return with clear message
+                        # No fallback available — surface buffered context
+                        # so user sees the rate-limit message that led here.
+                        agent._flush_status_buffer()
                        agent._persist_session(messages, conversation_history)
                        return {
                            "final_response": (
@@ -1082,6 +1183,14 @@ def run_conversation(

            try:
                agent._reset_stream_delivery_tracking()
+                # api_messages is built once, before this retry loop, while the
+                # primary provider is active.  A mid-conversation fallback can
+                # switch to a require-side provider (DeepSeek / Kimi / MiMo) that
+                # rejects assistant turns lacking reasoning_content.  Re-apply the
+                # echo-back pad for the *current* provider here (idempotent no-op
+                # unless the active provider needs it) so the fallback request
+                # isn't sent with stale, primary-shaped reasoning fields.
+                agent._reapply_reasoning_echo_for_provider(api_messages)
                api_kwargs = agent._build_api_kwargs(api_messages)
                if agent._force_ascii_payload:
                    _sanitize_structure_non_ascii(api_kwargs)
@@ -1275,9 +1384,10 @@ def run_conversation(
                            error_details.append("response.choices is empty")

                if response_invalid:
-                    # Stop spinner before printing error messages
+                    # Stop spinner silently — retry status is now buffered
+                    # and only surfaced if every retry+fallback exhausts.
                    if thinking_spinner:
-                        thinking_spinner.stop("(´;ω;`) oops, retrying...")
+                        thinking_spinner.stop("")
                        thinking_spinner = None
                    if agent.thinking_callback:
                        agent.thinking_callback("")
@@ -1290,7 +1400,7 @@ def run_conversation(
                    # rate-limit symptom.  Switch to fallback immediately
                    # rather than retrying with extended backoff.
                    if agent._fallback_index < len(agent._fallback_chain):
-                        agent._emit_status("⚠️ Empty/malformed response — switching to fallback...")
+                        agent._buffer_status("⚠️ Empty/malformed response — switching to fallback...")
                    if agent._try_activate_fallback():
                        retry_count = 0
                        compression_attempts = 0
@@ -1352,20 +1462,22 @@ def run_conversation(
                    else:
                        _failure_hint = f"response time {api_duration:.1f}s"

-                    agent._vprint(f"{agent.log_prefix}⚠️  Invalid API response (attempt {retry_count}/{max_retries}): {', '.join(error_details)}", force=True)
-                    agent._vprint(f"{agent.log_prefix}   🏢 Provider: {provider_name}", force=True)
+                    agent._buffer_vprint(f"⚠️  Invalid API response (attempt {retry_count}/{max_retries}): {', '.join(error_details)}")
+                    agent._buffer_vprint(f"   🏢 Provider: {provider_name}")
                    cleaned_provider_error = agent._clean_error_message(error_msg)
-                    agent._vprint(f"{agent.log_prefix}   📝 Provider message: {cleaned_provider_error}", force=True)
-                    agent._vprint(f"{agent.log_prefix}   ⏱️  {_failure_hint}", force=True)
+                    agent._buffer_vprint(f"   📝 Provider message: {cleaned_provider_error}")
+                    agent._buffer_vprint(f"   ⏱️  {_failure_hint}")
                    
                    if retry_count >= max_retries:
                        # Try fallback before giving up
-                        agent._emit_status(f"⚠️ Max retries ({max_retries}) for invalid responses — trying fallback...")
+                        agent._buffer_status(f"⚠️ Max retries ({max_retries}) for invalid responses — trying fallback...")
                        if agent._try_activate_fallback():
                            retry_count = 0
                            compression_attempts = 0
                            primary_recovery_attempted = False
                            continue
+                        # Terminal — flush buffered retry trace so user sees what happened.
+                        agent._flush_status_buffer()
                        agent._emit_status(f"❌ Max retries ({max_retries}) exceeded for invalid responses. Giving up.")
                        logger.error(f"{agent.log_prefix}Invalid API response after {max_retries} retries.")
                        agent._persist_session(messages, conversation_history)
@@ -1379,7 +1491,7 @@ def run_conversation(
                    
                    # Backoff before retry — jittered exponential: 5s base, 120s cap
                    wait_time = jittered_backoff(retry_count, base_delay=5.0, max_delay=120.0)
-                    agent._vprint(f"{agent.log_prefix}⏳ Retrying in {wait_time:.1f}s ({_failure_hint})...", force=True)
+                    agent._buffer_vprint(f"⏳ Retrying in {wait_time:.1f}s ({_failure_hint})...")
                    logger.warning(f"Invalid API response (retry {retry_count}/{max_retries}): {', '.join(error_details)} | Provider: {provider_name}")
                    
                    # Sleep in small increments to stay responsive to interrupts
@@ -1606,14 +1718,14 @@ def run_conversation(
                        if assistant_message is not None and _trunc_has_tool_calls:
                            if truncated_tool_call_retries < 1:
                                truncated_tool_call_retries += 1
-                                agent._vprint(
-                                    f"{agent.log_prefix}⚠️  Truncated tool call detected — retrying API call...",
-                                    force=True,
+                                agent._buffer_vprint(
+                                    f"⚠️  Truncated tool call detected — retrying API call..."
                                )
                                # Don't append the broken response to messages;
                                # just re-run the same API call from the current
                                # message state, giving the model another chance.
                                continue
+                            agent._flush_status_buffer()
                            agent._vprint(
                                f"{agent.log_prefix}⚠️  Truncated tool call response detected again — refusing to execute incomplete tool arguments.",
                                force=True,
@@ -1647,6 +1759,7 @@ def run_conversation(
                        }
                    else:
                        # First message was truncated - mark as failed
+                        agent._flush_status_buffer()
                        agent._vprint(f"{agent.log_prefix}❌ First response truncated - cannot recover", force=True)
                        agent._persist_session(messages, conversation_history)
                        return {
@@ -1668,10 +1781,19 @@ def run_conversation(
                    prompt_tokens = canonical_usage.prompt_tokens
                    completion_tokens = canonical_usage.output_tokens
                    total_tokens = canonical_usage.total_tokens
+                    # Forward canonical token + cache buckets so context engines
+                    # can make decisions on cache hit ratios / reasoning costs,
+                    # not just legacy aggregate tokens. Legacy keys stay for
+                    # back-compat with engines that only read prompt/completion/total.
                    usage_dict = {
                        "prompt_tokens": prompt_tokens,
                        "completion_tokens": completion_tokens,
                        "total_tokens": total_tokens,
+                        "input_tokens": canonical_usage.input_tokens,
+                        "output_tokens": canonical_usage.output_tokens,
+                        "cache_read_tokens": canonical_usage.cache_read_tokens,
+                        "cache_write_tokens": canonical_usage.cache_write_tokens,
+                        "reasoning_tokens": canonical_usage.reasoning_tokens,
                    }
                    agent.context_compressor.update_from_response(usage_dict)

@@ -1789,6 +1911,11 @@ def run_conversation(
                        )
                
                has_retried_429 = False  # Reset on success
+                # Note: don't clear the retry buffer here — an "API call
+                # success" only means we got bytes back, not that we got
+                # usable content. Empty responses still loop through the
+                # empty-retry path below; the buffer is cleared when
+                # genuinely successful content is detected later (~L4127).
                # Clear Nous rate limit state on successful request —
                # proves the limit has reset and other sessions can
                # resume hitting Nous.
@@ -1815,9 +1942,10 @@ def run_conversation(
                break

            except Exception as api_error:
-                # Stop spinner before printing error messages
+                # Stop spinner silently — retry status is buffered and
+                # only flushed when every retry+fallback is exhausted.
                if thinking_spinner:
-                    thinking_spinner.stop("(╥_╥) error, retrying...")
+                    thinking_spinner.stop("")
                    thinking_spinner = None
                if agent.thinking_callback:
                    agent.thinking_callback("")
@@ -1872,14 +2000,12 @@ def run_conversation(
                    if _surrogates_found or _is_surrogate_error:
                        agent._unicode_sanitization_passes += 1
                        if _surrogates_found:
-                            agent._vprint(
-                                f"{agent.log_prefix}⚠️  Stripped invalid surrogate characters from messages. Retrying...",
-                                force=True,
+                            agent._buffer_vprint(
+                                f"⚠️  Stripped invalid surrogate characters from messages. Retrying..."
                            )
                        else:
-                            agent._vprint(
-                                f"{agent.log_prefix}⚠️  Surrogate encoding error — retrying after full-payload sanitization...",
-                                force=True,
+                            agent._buffer_vprint(
+                                f"⚠️  Surrogate encoding error — retrying after full-payload sanitization..."
                            )
                        continue
                    if _is_ascii_codec:
@@ -2093,6 +2219,23 @@ def run_conversation(
                    classified.should_rotate_credential, classified.should_fallback,
                )

+                if (
+                    classified.reason == FailoverReason.billing
+                    and _is_nous_inference_route(
+                        getattr(agent, "provider", "") or "",
+                        getattr(agent, "base_url", "") or "",
+                    )
+                    and not nous_paid_entitlement_refresh_attempted
+                ):
+                    nous_paid_entitlement_refresh_attempted = True
+                    if _try_refresh_nous_paid_entitlement_credentials(agent):
+                        agent._vprint(
+                            f"{agent.log_prefix}🔐 Nous paid access verified — "
+                            "refreshed runtime credentials and retrying request...",
+                            force=True,
+                        )
+                        continue
+
                recovered_with_pool, has_retried_429 = agent._recover_with_credential_pool(
                    status_code=status_code,
                    has_retried_429=has_retried_429,
@@ -2190,7 +2333,7 @@ def run_conversation(
                    codex_auth_retry_attempted = True
                    if agent._try_refresh_codex_client_credentials(force=True):
                        _label = "xAI OAuth" if agent.provider == "xai-oauth" else "Codex"
-                        agent._vprint(f"{agent.log_prefix}🔐 {_label} auth refreshed after 401. Retrying request...")
+                        agent._buffer_vprint(f"🔐 {_label} auth refreshed after 401. Retrying request...")
                        continue
                if (
                    agent.api_mode == "chat_completions"
@@ -2217,7 +2360,8 @@ def run_conversation(
                    print(f"{agent.log_prefix}🔐 Nous 401 — Portal authentication failed.")
                    if _body_text:
                        print(f"{agent.log_prefix}   Response: {_body_text}")
-                    print(f"{agent.log_prefix}   Most likely: Portal OAuth expired, account out of credits, or agent key revoked.")
+                    if not _print_nous_entitlement_guidance(agent, "Nous model access"):
+                        print(f"{agent.log_prefix}   Most likely: Portal OAuth expired, account out of credits, or agent key revoked.")
                    print(f"{agent.log_prefix}   Troubleshooting:")
                    print(f"{agent.log_prefix}     • Re-authenticate: hermes auth add nous")
                    print(f"{agent.log_prefix}     • Check credits / billing: https://portal.nousresearch.com")
@@ -2230,7 +2374,7 @@ def run_conversation(
                ):
                    copilot_auth_retry_attempted = True
                    if agent._try_refresh_copilot_client_credentials():
-                        agent._vprint(f"{agent.log_prefix}🔐 Copilot credentials refreshed after 401. Retrying request...")
+                        agent._buffer_vprint(f"🔐 Copilot credentials refreshed after 401. Retrying request...")
                        continue
                if (
                    agent.api_mode == "anthropic_messages"
@@ -2405,41 +2549,37 @@ def run_conversation(
                _base = getattr(agent, "base_url", "unknown")
                _model = getattr(agent, "model", "unknown")
                _status_code_str = f" [HTTP {status_code}]" if status_code else ""
-                agent._vprint(f"{agent.log_prefix}⚠️  API call failed (attempt {retry_count}/{max_retries}): {error_type}{_status_code_str}", force=True)
-                agent._vprint(f"{agent.log_prefix}   🔌 Provider: {_provider}  Model: {_model}", force=True)
-                agent._vprint(f"{agent.log_prefix}   🌐 Endpoint: {_base}", force=True)
-                agent._vprint(f"{agent.log_prefix}   📝 Error: {_error_summary}", force=True)
+                agent._buffer_vprint(f"⚠️  API call failed (attempt {retry_count}/{max_retries}): {error_type}{_status_code_str}")
+                agent._buffer_vprint(f"   🔌 Provider: {_provider}  Model: {_model}")
+                agent._buffer_vprint(f"   🌐 Endpoint: {_base}")
+                agent._buffer_vprint(f"   📝 Error: {_error_summary}")
                if status_code and status_code < 500:
                    _err_body = getattr(api_error, "body", None)
                    _err_body_str = str(_err_body)[:300] if _err_body else None
                    if _err_body_str:
-                        agent._vprint(f"{agent.log_prefix}   📋 Details: {_err_body_str}", force=True)
-                agent._vprint(f"{agent.log_prefix}   ⏱️  Elapsed: {elapsed_time:.2f}s  Context: {len(api_messages)} msgs, ~{approx_tokens:,} tokens")
+                        agent._buffer_vprint(f"   📋 Details: {_err_body_str}")
+                agent._buffer_vprint(f"   ⏱️  Elapsed: {elapsed_time:.2f}s  Context: {len(api_messages)} msgs, ~{approx_tokens:,} tokens")

                # Actionable hint for OpenRouter "no tool endpoints" error.
-                # This fires regardless of whether fallback succeeds — the
-                # user needs to know WHY their model failed so they can fix
-                # their provider routing, not just silently fall back.
+                # Buffered like the rest of the retry trace — surfaced only
+                # if every retry+fallback exhausts.  Avoids spamming users
+                # who recover automatically via fallback.
                if (
                    agent._is_openrouter_url()
                    and "support tool use" in error_msg
                ):
-                    agent._vprint(
-                        f"{agent.log_prefix}   💡 No OpenRouter providers for {_model} support tool calling with your current settings.",
-                        force=True,
+                    agent._buffer_vprint(
+                        f"   💡 No OpenRouter providers for {_model} support tool calling with your current settings."
                    )
                    if agent.providers_allowed:
-                        agent._vprint(
-                            f"{agent.log_prefix}      Your provider_routing.only restriction is filtering out tool-capable providers.",
-                            force=True,
+                        agent._buffer_vprint(
+                            f"      Your provider_routing.only restriction is filtering out tool-capable providers."
                        )
-                        agent._vprint(
-                            f"{agent.log_prefix}      Try removing the restriction or adding providers that support tools for this model.",
-                            force=True,
+                        agent._buffer_vprint(
+                            f"      Try removing the restriction or adding providers that support tools for this model."
                        )
-                    agent._vprint(
-                        f"{agent.log_prefix}      Check which providers support tools: https://openrouter.ai/models/{_model}",
-                        force=True,
+                    agent._buffer_vprint(
+                        f"      Check which providers support tools: https://openrouter.ai/models/{_model}"
                    )

                # Check for interrupt before deciding to retry
@@ -2489,11 +2629,10 @@ def run_conversation(
                            # user later enables extra usage the 1M limit
                            # should come back automatically.
                            compressor._context_probe_persistable = False
-                        agent._vprint(
-                            f"{agent.log_prefix}⚠️  Anthropic long-context tier "
+                        agent._buffer_vprint(
+                            f"⚠️  Anthropic long-context tier "
                            f"requires extra usage — reducing context: "
-                            f"{old_ctx:,} → {_reduced_ctx:,} tokens",
-                            force=True,
+                            f"{old_ctx:,} → {_reduced_ctx:,} tokens"
                        )

                    compression_attempts += 1
@@ -2509,7 +2648,7 @@ def run_conversation(
                        # messages to the new session, not skipping them.
                        conversation_history = None
                        if len(messages) < original_len or old_ctx > _reduced_ctx:
-                            agent._emit_status(
+                            agent._buffer_status(
                                f"🗜️ Context reduced to {_reduced_ctx:,} tokens "
                                f"(was {old_ctx:,}), retrying..."
                            )
@@ -2538,7 +2677,12 @@ def run_conversation(
                        base_url=getattr(agent, "base_url", None),
                    )
                    if not pool_may_recover:
-                        agent._emit_status("⚠️ Rate limited — switching to fallback provider...")
+                        if classified.reason == FailoverReason.billing:
+                            agent._buffer_status(
+                                "⚠️ Billing or credits exhausted — switching to fallback provider..."
+                            )
+                        else:
+                            agent._buffer_status("⚠️ Rate limited — switching to fallback provider...")
                        if agent._try_activate_fallback(reason=classified.reason):
                            retry_count = 0
                            compression_attempts = 0
@@ -2650,6 +2794,8 @@ def run_conversation(
                if is_payload_too_large:
                    compression_attempts += 1
                    if compression_attempts > max_compression_attempts:
+                        # Terminal — surface the buffered retry trace.
+                        agent._flush_status_buffer()
                        agent._vprint(f"{agent.log_prefix}❌ Max compression attempts ({max_compression_attempts}) reached for payload-too-large error.", force=True)
                        agent._vprint(f"{agent.log_prefix}   💡 Try /new to start a fresh conversation, or /compress to retry compression.", force=True)
                        logger.error(f"{agent.log_prefix}413 compression failed after {max_compression_attempts} attempts.")
@@ -2663,7 +2809,7 @@ def run_conversation(
                            "failed": True,
                            "compression_exhausted": True,
                        }
-                    agent._emit_status(f"⚠️  Request payload too large (413) — compression attempt {compression_attempts}/{max_compression_attempts}...")
+                    agent._buffer_status(f"⚠️  Request payload too large (413) — compression attempt {compression_attempts}/{max_compression_attempts}...")

                    original_len = len(messages)
                    messages, active_system_prompt = agent._compress_context(
@@ -2676,11 +2822,14 @@ def run_conversation(
                    conversation_history = None

                    if len(messages) < original_len:
-                        agent._emit_status(f"🗜️ Compressed {original_len} → {len(messages)} messages, retrying...")
+                        agent._buffer_status(f"🗜️ Compressed {original_len} → {len(messages)} messages, retrying...")
                        time.sleep(2)  # Brief pause between compression retries
                        restart_with_compressed_messages = True
                        break
                    else:
+                        # Terminal — surface buffered context so the user
+                        # sees what compression attempts were made.
+                        agent._flush_status_buffer()
                        agent._vprint(f"{agent.log_prefix}❌ Payload too large and cannot compress further.", force=True)
                        agent._vprint(f"{agent.log_prefix}   💡 Try /new to start a fresh conversation, or /compress to retry compression.", force=True)
                        logger.error(f"{agent.log_prefix}413 payload too large. Cannot compress further.")
@@ -2724,16 +2873,16 @@ def run_conversation(
                        # touching context_length or triggering compression.
                        safe_out = max(1, available_out - 64)  # small safety margin
                        agent._ephemeral_max_output_tokens = safe_out
-                        agent._vprint(
-                            f"{agent.log_prefix}⚠️  Output cap too large for current prompt — "
+                        agent._buffer_vprint(
+                            f"⚠️  Output cap too large for current prompt — "
                            f"retrying with max_tokens={safe_out:,} "
-                            f"(available_tokens={available_out:,}; context_length unchanged at {old_ctx:,})",
-                            force=True,
+                            f"(available_tokens={available_out:,}; context_length unchanged at {old_ctx:,})"
                        )
                        # Still count against compression_attempts so we don't
                        # loop forever if the error keeps recurring.
                        compression_attempts += 1
                        if compression_attempts > max_compression_attempts:
+                            agent._flush_status_buffer()
                            agent._vprint(f"{agent.log_prefix}❌ Max compression attempts ({max_compression_attempts}) reached.", force=True)
                            agent._vprint(f"{agent.log_prefix}   💡 Try /new to start a fresh conversation, or /compress to retry compression.", force=True)
                            logger.error(f"{agent.log_prefix}Context compression failed after {max_compression_attempts} attempts.")
@@ -2750,9 +2899,13 @@ def run_conversation(
                        restart_with_compressed_messages = True
                        break

-                    # Error is about the INPUT being too large — reduce context_length.
-                    # Try to parse the actual limit from the error message
-                    parsed_limit = parse_context_limit_from_error(error_msg)
+                    # Error is about the INPUT being too large.  Only reduce
+                    # context_length when the provider explicitly reports the
+                    # real lower limit.  If the provider only says "input
+                    # exceeds the context window", keep the configured window
+                    # and try compression; guessing probe tiers can incorrectly
+                    # turn a user-configured 1M window into 256K/128K/64K.
+                    new_ctx = get_context_length_from_provider_error(error_msg, old_ctx)
                    _provider_lower = (getattr(agent, "provider", "") or "").lower()
                    _base_lower = (getattr(agent, "base_url", "") or "").rstrip("/").lower()
                    is_minimax_provider = (
@@ -2764,24 +2917,12 @@ def run_conversation(
                    )
                    minimax_delta_only_overflow = (
                        is_minimax_provider
-                        and parsed_limit is None
+                        and new_ctx is None
                        and "context window exceeds limit (" in error_msg
                    )
-                    if parsed_limit and parsed_limit < old_ctx:
-                        new_ctx = parsed_limit
-                        agent._vprint(f"{agent.log_prefix}Context limit detected from API: {new_ctx:,} tokens (was {old_ctx:,})", force=True)
-                    elif minimax_delta_only_overflow:
-                        new_ctx = old_ctx
-                        agent._vprint(
-                            f"{agent.log_prefix}Provider reported overflow amount only; "
-                            f"keeping context_length at {old_ctx:,} tokens and compressing.",
-                            force=True,
-                        )
-                    else:
-                        # Step down to the next probe tier
-                        new_ctx = get_next_probe_tier(old_ctx)

-                    if new_ctx and new_ctx < old_ctx:
+                    if new_ctx is not None:
+                        agent._buffer_vprint(f"Context limit detected from API: {new_ctx:,} tokens (was {old_ctx:,})")
                        compressor.update_model(
                            model=agent.model,
                            context_length=new_ctx,
@@ -2791,23 +2932,26 @@ def run_conversation(
                            api_mode=agent.api_mode,
                        )
                        # Context probing flags — only set on built-in
-                        # compressor (plugin engines manage their own).
+                        # compressor (plugin engines manage their own).  This
+                        # value came from the provider, so it is safe to cache.
                        if hasattr(compressor, "_context_probed"):
                            compressor._context_probed = True
-                            # Only persist limits parsed from the provider's
-                            # error message (a real number).  Guessed fallback
-                            # tiers from get_next_probe_tier() should stay
-                            # in-memory only — persisting them pollutes the
-                            # cache with wrong values.
-                            compressor._context_probe_persistable = bool(
-                                parsed_limit and parsed_limit == new_ctx
-                            )
-                        agent._vprint(f"{agent.log_prefix}⚠️  Context length exceeded — stepping down: {old_ctx:,} → {new_ctx:,} tokens", force=True)
+                            compressor._context_probe_persistable = True
+                        agent._buffer_vprint(f"⚠️  Context length exceeded — using provider limit: {old_ctx:,} → {new_ctx:,} tokens")
+                    elif minimax_delta_only_overflow:
+                        agent._buffer_vprint(
+                            f"Provider reported overflow amount only; "
+                            f"keeping context_length at {old_ctx:,} tokens and compressing."
+                        )
                    else:
-                        agent._vprint(f"{agent.log_prefix}⚠️  Context length exceeded at minimum tier — attempting compression...", force=True)
+                        agent._buffer_vprint(
+                            f"⚠️  Context length exceeded, but provider did not report a max context length; "
+                            f"keeping context_length at {old_ctx:,} tokens and compressing."
+                        )

                    compression_attempts += 1
                    if compression_attempts > max_compression_attempts:
+                        agent._flush_status_buffer()
                        agent._vprint(f"{agent.log_prefix}❌ Max compression attempts ({max_compression_attempts}) reached.", force=True)
                        agent._vprint(f"{agent.log_prefix}   💡 Try /new to start a fresh conversation, or /compress to retry compression.", force=True)
                        logger.error(f"{agent.log_prefix}Context compression failed after {max_compression_attempts} attempts.")
@@ -2821,7 +2965,7 @@ def run_conversation(
                            "failed": True,
                            "compression_exhausted": True,
                        }
-                    agent._emit_status(f"🗜️ Context too large (~{approx_tokens:,} tokens) — compressing ({compression_attempts}/{max_compression_attempts})...")
+                    agent._buffer_status(f"🗜️ Context too large (~{approx_tokens:,} tokens) — compressing ({compression_attempts}/{max_compression_attempts})...")

                    original_len = len(messages)
                    messages, active_system_prompt = agent._compress_context(
@@ -2835,12 +2979,13 @@ def run_conversation(

                    if len(messages) < original_len or new_ctx and new_ctx < old_ctx:
                        if len(messages) < original_len:
-                            agent._emit_status(f"🗜️ Compressed {original_len} → {len(messages)} messages, retrying...")
+                            agent._buffer_status(f"🗜️ Compressed {original_len} → {len(messages)} messages, retrying...")
                        time.sleep(2)  # Brief pause between compression retries
                        restart_with_compressed_messages = True
                        break
                    else:
                        # Can't compress further and already at minimum tier
+                        agent._flush_status_buffer()
                        agent._vprint(f"{agent.log_prefix}❌ Context length exceeded and cannot compress further.", force=True)
                        agent._vprint(f"{agent.log_prefix}   💡 The conversation has accumulated too much content. Try /new to start fresh, or /compress to manually trigger compression.", force=True)
                        logger.error(f"{agent.log_prefix}Context length exceeded: {approx_tokens:,} tokens. Cannot compress further.")
@@ -2929,7 +3074,10 @@ def run_conversation(
                if is_client_error:
                    # Try fallback before aborting — a different provider
                    # may not have the same issue (rate limit, auth, etc.)
-                    agent._emit_status(f"⚠️ Non-retryable error (HTTP {status_code}) — trying fallback...")
+                    if classified.reason == FailoverReason.content_policy_blocked:
+                        agent._buffer_status("⚠️ Provider safety filter blocked this request — trying fallback...")
+                    else:
+                        agent._buffer_status(f"⚠️ Non-retryable error (HTTP {status_code}) — trying fallback...")
                    if agent._try_activate_fallback():
                        retry_count = 0
                        compression_attempts = 0
@@ -2939,16 +3087,38 @@ def run_conversation(
                        agent._dump_api_request_debug(
                            api_kwargs, reason="non_retryable_client_error", error=api_error,
                        )
-                    agent._emit_status(
-                        f"❌ Non-retryable error (HTTP {status_code}): "
-                        f"{agent._summarize_api_error(api_error)}"
-                    )
+                    # Terminal — flush buffered context so the user sees
+                    # what was tried before the abort.
+                    agent._flush_status_buffer()
+                    if classified.reason == FailoverReason.content_policy_blocked:
+                        agent._emit_status(
+                            f"❌ Provider safety filter blocked this request: "
+                            f"{agent._summarize_api_error(api_error)}"
+                        )
+                    else:
+                        agent._emit_status(
+                            f"❌ Non-retryable error (HTTP {status_code}): "
+                            f"{agent._summarize_api_error(api_error)}"
+                        )
                    agent._vprint(f"{agent.log_prefix}❌ Non-retryable client error (HTTP {status_code}). Aborting.", force=True)
                    agent._vprint(f"{agent.log_prefix}   🔌 Provider: {_provider}  Model: {_model}", force=True)
                    agent._vprint(f"{agent.log_prefix}   🌐 Endpoint: {_base}", force=True)
                    # Actionable guidance for common auth errors
                    if classified.is_auth or classified.reason == FailoverReason.billing:
-                        if _provider in {"openai-codex", "xai-oauth", "nous"} and status_code == 401:
+                        if classified.reason == FailoverReason.billing and _print_billing_or_entitlement_guidance(
+                            agent,
+                            capability="model access",
+                            provider=_provider,
+                            base_url=str(_base),
+                            model=_model,
+                        ):
+                            pass
+                        elif _provider == "nous" and _print_nous_entitlement_guidance(
+                            agent,
+                            "Nous model access",
+                        ):
+                            pass
+                        elif _provider in {"openai-codex", "xai-oauth", "nous"} and status_code == 401:
                            if _provider == "openai-codex":
                                agent._vprint(f"{agent.log_prefix}   💡 Codex OAuth token was rejected (HTTP 401). Your token may have been", force=True)
                                agent._vprint(f"{agent.log_prefix}      refreshed by another client (Codex CLI, VS Code). To fix:", force=True)
@@ -2976,6 +3146,28 @@ def run_conversation(
                                agent._vprint(f"{agent.log_prefix}      • Check credits: https://openrouter.ai/settings/credits", force=True)
                    else:
                        agent._vprint(f"{agent.log_prefix}   💡 This type of error won't be fixed by retrying.", force=True)
+                    # Content-policy blocks deserve their own actionable
+                    # guidance — neither "fix your API key" nor "retry won't
+                    # help" tells the user what to actually do. The provider
+                    # has refused this specific prompt, so the recovery is
+                    # either a rephrase or routing to a different model.
+                    if classified.reason == FailoverReason.content_policy_blocked:
+                        agent._vprint(
+                            f"{agent.log_prefix}   💡 The provider's safety filter rejected this specific prompt.",
+                            force=True,
+                        )
+                        agent._vprint(
+                            f"{agent.log_prefix}      • Try rephrasing the request, narrowing the context, or splitting into smaller steps.",
+                            force=True,
+                        )
+                        agent._vprint(
+                            f"{agent.log_prefix}      • Configure a fallback provider so future blocks route automatically:",
+                            force=True,
+                        )
+                        agent._vprint(
+                            f"{agent.log_prefix}        hermes fallback add   (interactive picker — same as `hermes model`)",
+                            force=True,
+                        )
                    logger.error(f"{agent.log_prefix}Non-retryable client error: {api_error}")
                    # Skip session persistence when the error is likely
                    # context-overflow related (status 400 + large session).
@@ -2990,6 +3182,23 @@ def run_conversation(
                        )
                    else:
                        agent._persist_session(messages, conversation_history)
+                    if classified.reason == FailoverReason.content_policy_blocked:
+                        _summary = agent._summarize_api_error(api_error)
+                        _policy_response = (
+                            f"⚠️  The model provider's safety filter blocked this request "
+                            f"(not a Hermes/gateway failure).\n\n"
+                            f"Provider message: {_summary}\n\n"
+                            f"Try rephrasing the request, narrowing the context, or "
+                            f"adding a fallback provider with `hermes fallback add`."
+                        )
+                        return {
+                            "final_response": _policy_response,
+                            "messages": messages,
+                            "api_calls": api_call_count,
+                            "completed": False,
+                            "failed": True,
+                            "error": f"content_policy_blocked: {_summary}",
+                        }
                    return {
                        "final_response": None,
                        "messages": messages,
@@ -3011,14 +3220,32 @@ def run_conversation(
                        retry_count = 0
                        continue
                    # Try fallback before giving up entirely
-                    agent._emit_status(f"⚠️ Max retries ({max_retries}) exhausted — trying fallback...")
+                    agent._buffer_status(f"⚠️ Max retries ({max_retries}) exhausted — trying fallback...")
                    if agent._try_activate_fallback():
                        retry_count = 0
                        compression_attempts = 0
                        primary_recovery_attempted = False
                        continue
+                    # Terminal — flush buffered retry/fallback trace.
+                    agent._flush_status_buffer()
                    _final_summary = agent._summarize_api_error(api_error)
-                    if is_rate_limited:
+                    _billing_guidance = ""
+                    if classified.reason == FailoverReason.billing:
+                        agent._emit_status(f"❌ Billing or credits exhausted — {_final_summary}")
+                        _billing_guidance = _billing_or_entitlement_message(
+                            capability="model access",
+                            provider=_provider,
+                            base_url=str(_base),
+                            model=_model,
+                        )
+                        _print_billing_or_entitlement_guidance(
+                            agent,
+                            capability="model access",
+                            provider=_provider,
+                            base_url=str(_base),
+                            model=_model,
+                        )
+                    elif is_rate_limited:
                        agent._emit_status(f"❌ Rate limited after {max_retries} retries — {_final_summary}")
                    else:
                        agent._emit_status(f"❌ API failed after {max_retries} retries — {_final_summary}")
@@ -3063,7 +3290,12 @@ def run_conversation(
                            api_kwargs, reason="max_retries_exhausted", error=api_error,
                        )
                    agent._persist_session(messages, conversation_history)
-                    _final_response = f"API call failed after {max_retries} retries: {_final_summary}"
+                    if classified.reason == FailoverReason.billing:
+                        _final_response = f"Billing or credits exhausted: {_final_summary}"
+                        if _billing_guidance:
+                            _final_response += f"\n\n{_billing_guidance}"
+                    else:
+                        _final_response = f"API call failed after {max_retries} retries: {_final_summary}"
                    if _is_stream_drop:
                        _final_response += (
                            "\n\nThe provider's stream connection keeps "
@@ -3095,9 +3327,9 @@ def run_conversation(
                                pass
                wait_time = _retry_after if _retry_after else jittered_backoff(retry_count, base_delay=2.0, max_delay=60.0)
                if is_rate_limited:
-                    agent._emit_status(f"⏱️ Rate limited. Waiting {wait_time:.1f}s (attempt {retry_count + 1}/{max_retries})...")
+                    agent._buffer_status(f"⏱️ Rate limited. Waiting {wait_time:.1f}s (attempt {retry_count + 1}/{max_retries})...")
                else:
-                    agent._emit_status(f"⏳ Retrying in {wait_time:.1f}s (attempt {retry_count}/{max_retries})...")
+                    agent._buffer_status(f"⏳ Retrying in {wait_time:.1f}s (attempt {retry_count}/{max_retries})...")
                logger.warning(
                    "Retrying API call in %ss (attempt %s/%s) %s error=%s",
                    wait_time,
@@ -3256,14 +3488,15 @@ def run_conversation(
            if has_incomplete_scratchpad(assistant_message.content or ""):
                agent._incomplete_scratchpad_retries += 1
                
-                agent._vprint(f"{agent.log_prefix}⚠️  Incomplete <REASONING_SCRATCHPAD> detected (opened but never closed)")
+                agent._buffer_vprint(f"⚠️  Incomplete <REASONING_SCRATCHPAD> detected (opened but never closed)")
                
                if agent._incomplete_scratchpad_retries <= 2:
-                    agent._vprint(f"{agent.log_prefix}🔄 Retrying API call ({agent._incomplete_scratchpad_retries}/2)...")
+                    agent._buffer_vprint(f"🔄 Retrying API call ({agent._incomplete_scratchpad_retries}/2)...")
                    # Don't add the broken message, just retry
                    continue
                else:
                    # Max retries - discard this turn and save as partial
+                    agent._flush_status_buffer()
                    agent._vprint(f"{agent.log_prefix}❌ Max retries (2) for incomplete scratchpad. Saving as partial.", force=True)
                    agent._incomplete_scratchpad_retries = 0
                    
@@ -3371,9 +3604,10 @@ def run_conversation(
                    available = ", ".join(sorted(agent.valid_tool_names))
                    invalid_name = invalid_tool_calls[0]
                    invalid_preview = invalid_name[:80] + "..." if len(invalid_name) > 80 else invalid_name
-                    agent._vprint(f"{agent.log_prefix}⚠️  Unknown tool '{invalid_preview}' — sending error to model for agent-correction ({agent._invalid_tool_retries}/3)")
+                    agent._buffer_vprint(f"⚠️  Unknown tool '{invalid_preview}' — sending error to model for agent-correction ({agent._invalid_tool_retries}/3)")

                    if agent._invalid_tool_retries >= 3:
+                        agent._flush_status_buffer()
                        agent._vprint(f"{agent.log_prefix}❌ Max retries (3) for invalid tool calls exceeded. Stopping as partial.", force=True)
                        agent._invalid_tool_retries = 0
                        agent._persist_session(messages, conversation_history)
@@ -3457,16 +3691,16 @@ def run_conversation(
                    agent._invalid_json_retries += 1

                    tool_name, error_msg = invalid_json_args[0]
-                    agent._vprint(f"{agent.log_prefix}⚠️  Invalid JSON in tool call arguments for '{tool_name}': {error_msg}")
+                    agent._buffer_vprint(f"⚠️  Invalid JSON in tool call arguments for '{tool_name}': {error_msg}")

                    if agent._invalid_json_retries < 3:
-                        agent._vprint(f"{agent.log_prefix}🔄 Retrying API call ({agent._invalid_json_retries}/3)...")
+                        agent._buffer_vprint(f"🔄 Retrying API call ({agent._invalid_json_retries}/3)...")
                        # Don't add anything to messages, just retry the API call
                        continue
                    else:
                        # Instead of returning partial, inject tool error results so the model can recover.
                        # Using tool results (not user messages) preserves role alternation.
-                        agent._vprint(f"{agent.log_prefix}⚠️  Injecting recovery tool results for invalid JSON...")
+                        agent._buffer_vprint(f"⚠️  Injecting recovery tool results for invalid JSON...")
                        agent._invalid_json_retries = 0  # Reset for next attempt
                        
                        # Append the assistant message with its (broken) tool_calls
@@ -3774,7 +4008,7 @@ def run_conversation(
                            "Empty response after tool calls — nudging model "
                            "to continue processing"
                        )
-                        agent._emit_status(
+                        agent._buffer_status(
                            "⚠️ Model returned empty after tool calls — "
                            "nudging to continue"
                        )
@@ -3820,7 +4054,7 @@ def run_conversation(
                            "prefilling to continue (%d/2)",
                            agent._thinking_prefill_retries,
                        )
-                        agent._emit_status(
+                        agent._buffer_status(
                            f"↻ Thinking-only response — prefilling to continue "
                            f"({agent._thinking_prefill_retries}/2)"
                        )
@@ -3855,7 +4089,7 @@ def run_conversation(
                            "retry %d/3 (model=%s)",
                            agent._empty_content_retries, agent.model,
                        )
-                        agent._emit_status(
+                        agent._buffer_status(
                            f"⚠️ Empty response from model — retrying "
                            f"({agent._empty_content_retries}/3)"
                        )
@@ -3874,13 +4108,13 @@ def run_conversation(
                            agent._empty_content_retries, agent.model,
                            agent.provider,
                        )
-                        agent._emit_status(
+                        agent._buffer_status(
                            "⚠️ Model returning empty responses — "
                            "switching to fallback provider..."
                        )
                        if agent._try_activate_fallback():
                            agent._empty_content_retries = 0
-                            agent._emit_status(
+                            agent._buffer_status(
                                f"↻ Switched to fallback: {agent.model} "
                                f"({agent.provider})"
                            )
@@ -3894,6 +4128,9 @@ def run_conversation(
                    # Exhausted retries and fallback chain (or no
                    # fallback configured).  Fall through to the
                    # "(empty)" terminal.
+                    # Surface the buffered retry/fallback trace so the
+                    # user can see what was attempted before "(empty)".
+                    agent._flush_status_buffer()
                    _turn_exit_reason = "empty_response_exhausted"
                    reasoning_text = agent._extract_reasoning(assistant_message)
                    agent._drop_trailing_empty_response_scaffolding(messages)
@@ -3938,6 +4175,9 @@ def run_conversation(
                # Reset retry counter/signature on successful content
                agent._empty_content_retries = 0
                agent._thinking_prefill_retries = 0
+                # Successful content reached — drop any buffered retry
+                # status from earlier failed attempts in this turn.
+                agent._clear_status_buffer()

                if (
                    agent.api_mode == "codex_responses"
@@ -904,10 +904,6 @@ def get_cute_tool_message(
            extra = f" +{len(urls)-1}" if len(urls) > 1 else ""
            return _wrap(f"┊ 📄 fetch     {_trunc(domain, 35)}{extra}  {dur}")
        return _wrap(f"┊ 📄 fetch     pages  {dur}")
-    if tool_name == "web_crawl":
-        url = args.get("url", "")
-        domain = url.replace("https://", "").replace("http://", "").split("/")[0]
-        return _wrap(f"┊ 🕸️  crawl     {_trunc(domain, 35)}  {dur}")
    if tool_name == "terminal":
        return _wrap(f"┊ 💻 $         {_trunc(args.get('command', ''), 42)}  {dur}")
    if tool_name == "process":
@@ -44,9 +44,10 @@ class FailoverReason(enum.Enum):
    payload_too_large = "payload_too_large"  # 413 — compress payload
    image_too_large = "image_too_large"   # Native image part exceeds provider's per-image limit — shrink and retry

-    # Model
+    # Model / provider policy
    model_not_found = "model_not_found"  # 404 or invalid model — fallback to different model
    provider_policy_blocked = "provider_policy_blocked"  # Aggregator (e.g. OpenRouter) blocked the only endpoint due to account data/privacy policy
+    content_policy_blocked = "content_policy_blocked"  # Provider safety filter rejected this prompt — deterministic per-request, don't retry unchanged

    # Request format
    format_error = "format_error"        # 400 bad request — abort or strip + retry
@@ -97,13 +98,20 @@ _BILLING_PATTERNS = [
    "insufficient_quota",
    "insufficient balance",
    "credit balance",
+    "credits exhausted",
    "credits have been exhausted",
+    "no usable credits",
    "top up your credits",
    "payment required",
    "billing hard limit",
    "exceeded your current quota",
    "account is deactivated",
    "plan does not include",
+    "out of funds",
+    "run out of funds",
+    "balance_depleted",
+    "model_not_supported_on_free_tier",
+    "not available on the free tier",
 ]

 # Patterns that indicate rate limiting (transient, will resolve)
@@ -282,6 +290,45 @@ _PROVIDER_POLICY_BLOCKED_PATTERNS = [
    "no endpoints found matching your data policy",
 ]

+# Provider content-policy / safety-filter blocks. Distinct from
+# ``provider_policy_blocked`` above (which is an OpenRouter *account*-level
+# data/privacy guardrail) — these are *per-prompt* safety decisions made by
+# the upstream model provider. They are deterministic for the unchanged
+# request, so retrying the same prompt three times just reproduces the same
+# block and burns paid attempts on a refusal. The recovery is to switch to a
+# configured fallback model/provider immediately, or surface the block to
+# the user with actionable guidance if no fallback exists.
+#
+# Patterns are intentionally narrow — each phrase is a verbatim string from
+# a specific provider's safety pipeline, not a generic word like "policy" or
+# "violation" that could collide with billing/auth/format errors:
+#   • OpenAI Codex cybersecurity refusal (gpt-5.5, the case from #18028)
+#   • OpenAI moderation refusal ("violates our usage policies", with
+#     "usage policies" disambiguating from billing's "exceeded ... policy")
+#   • Anthropic safety refusal ("prompt was flagged by ... safety system")
+#   • OpenAI Responses content filter
+_CONTENT_POLICY_BLOCKED_PATTERNS = [
+    # OpenAI Codex (#18028) — message may arrive without an HTTP status
+    "flagged for possible cybersecurity risk",
+    "trusted access for cyber",
+    # OpenAI moderation — chat completions / responses
+    "violates our usage policies",
+    "violates openai's usage policies",
+    "your request was flagged by",
+    # Anthropic safety system
+    "prompt was flagged by our safety",
+    "responses cannot be generated due to safety",
+    # Generic content-filter wording seen on Azure / OpenAI Responses.
+    # ``content_filter`` (underscore) is the OpenAI-standard error/finish
+    # token surfaced verbatim by their SDKs when a request is blocked.
+    # ``responsibleaipolicyviolation`` is Azure OpenAI's error code.
+    # Deliberately NOT matching the space variant ("content filter") — it
+    # appears in benign config descriptions and tooltip text that providers
+    # echo back; the underscore form is provider-specific enough.
+    "content_filter",
+    "responsibleaipolicyviolation",
+]
+
 # Auth patterns (non-status-code signals)
 _AUTH_PATTERNS = [
    "invalid api key",
@@ -485,6 +532,20 @@ def classify_api_error(

    # ── 1. Provider-specific patterns (highest priority) ────────────

+    # Provider content-policy / safety-filter block. The provider has made a
+    # deterministic refusal decision about THIS prompt — retrying unchanged
+    # just reproduces the same refusal and burns paid attempts. Must run
+    # before status-based classification so a 400 safety block isn't
+    # downgraded to a generic ``format_error`` and a status-less block
+    # (OpenAI Codex SDK can raise without one) isn't left in the retryable
+    # ``unknown`` bucket. See issue #18028.
+    if any(p in error_msg for p in _CONTENT_POLICY_BLOCKED_PATTERNS):
+        return _result(
+            FailoverReason.content_policy_blocked,
+            retryable=False,
+            should_fallback=True,
+        )
+
    # Anthropic thinking block signature invalid (400).
    # Don't gate on provider — OpenRouter proxies Anthropic errors, so the
    # provider may be "openrouter" even though the error is Anthropic-specific.
@@ -690,8 +751,13 @@ def _classify_by_status(
        )

    if status_code == 403:
-        # OpenRouter 403 "key limit exceeded" is actually billing
-        if "key limit exceeded" in error_msg or "spending limit" in error_msg:
+        # OpenRouter 403 "key limit exceeded" is actually billing. Other
+        # providers also use 403 for account-plan or credit exhaustion.
+        if (
+            "key limit exceeded" in error_msg
+            or "spending limit" in error_msg
+            or any(p in error_msg for p in _BILLING_PATTERNS)
+        ):
            return result_fn(
                FailoverReason.billing,
                retryable=False,
@@ -708,6 +774,17 @@ def _classify_by_status(
        return _classify_402(error_msg, result_fn)

    if status_code == 404:
+        # Nous API currently surfaces HA/NAS credit depletion as a paid model
+        # becoming unavailable on the Free Tier, returned as 404 rather than
+        # 402. Treat that as entitlement/billing exhaustion, not a missing
+        # model, so the retry loop can show credit/top-up guidance.
+        if any(p in error_msg for p in _BILLING_PATTERNS):
+            return result_fn(
+                FailoverReason.billing,
+                retryable=False,
+                should_rotate_credential=True,
+                should_fallback=True,
+            )
        # OpenRouter policy-block 404 — distinct from "model not found".
        # The model exists; the user's account privacy setting excludes the
        # only endpoint serving it. Falling back to another provider won't
@@ -973,7 +1050,15 @@ def _classify_by_error_code(
            should_rotate_credential=True,
        )

-    if code_lower in {"insufficient_quota", "billing_not_active", "payment_required"}:
+    if code_lower in {
+        "insufficient_quota",
+        "billing_not_active",
+        "payment_required",
+        "insufficient_credits",
+        "no_usable_credits",
+        "balance_depleted",
+        "model_not_supported_on_free_tier",
+    }:
        return result_fn(
            FailoverReason.billing,
            retryable=False,
@@ -0,0 +1,39 @@
+"""Best-effort early import for the OpenAI SDK's native streaming parser.
+
+The OpenAI SDK imports ``jiter`` while constructing streaming chat-completion
+responses.  On some Windows installs the native extension can be imported
+directly from the Hermes venv, but the first import fails when it happens later
+inside the threaded streaming request path.  Loading it once during agent
+package import avoids that import-order failure while preserving the normal
+SDK error path for genuinely missing or broken installs.
+"""
+
+from __future__ import annotations
+
+import importlib
+
+_JITER_PRELOADED = False
+_JITER_PRELOAD_ERROR: Exception | None = None
+
+
+def preload_jiter_native_extension() -> bool:
+    """Import jiter's native extension early if it is available."""
+
+    global _JITER_PRELOADED, _JITER_PRELOAD_ERROR
+
+    if _JITER_PRELOADED:
+        return True
+
+    try:
+        importlib.import_module("jiter.jiter")
+        from jiter import from_json as _from_json  # noqa: F401
+    except Exception as exc:
+        _JITER_PRELOAD_ERROR = exc
+        return False
+
+    _JITER_PRELOADED = True
+    _JITER_PRELOAD_ERROR = None
+    return True
+
+
+preload_jiter_native_extension()
@@ -141,6 +141,8 @@ DEFAULT_CONTEXT_LENGTHS = {
    # fuzzy-match collisions (e.g. "anthropic/claude-sonnet-4" is a
    # substring of "anthropic/claude-sonnet-4.6").
    # OpenRouter-prefixed models resolve via OpenRouter live API or models.dev.
+    "claude-opus-4-8": 1000000,
+    "claude-opus-4.8": 1000000,
    "claude-opus-4-7": 1000000,
    "claude-opus-4.7": 1000000,
    "claude-opus-4-6": 1000000,
@@ -911,12 +913,33 @@ def parse_context_limit_from_error(error_msg: str) -> Optional[int]:
    return None


+def get_context_length_from_provider_error(
+    error_msg: str,
+    current_context_length: int,
+) -> Optional[int]:
+    """Return a provider-reported lower context limit, if one is present.
+
+    Context-overflow recovery must not invent a new model window size.  Some
+    providers only say that the input exceeds the context window without
+    reporting the actual maximum.  In that case callers should keep the
+    configured context length and try compression only, rather than stepping
+    down through guessed probe tiers (1M → 256K → 128K → ...).
+    """
+    parsed_limit = parse_context_limit_from_error(error_msg)
+    if parsed_limit is None:
+        return None
+    if parsed_limit < current_context_length:
+        return parsed_limit
+    return None
+
+
 def parse_available_output_tokens_from_error(error_msg: str) -> Optional[int]:
    """Detect an "output cap too large" error and return how many output tokens are available.

    Background — two distinct context errors exist:
      1. "Prompt too long"  — the INPUT itself exceeds the context window.
-           Fix: compress history and/or halve context_length.
+           Fix: compress history, and only reduce context_length if the
+           provider explicitly reports the actual lower limit.
      2. "max_tokens too large" — input is fine, but input + requested_output > window.
           Fix: reduce max_tokens (the output cap) for this call.
           Do NOT touch context_length — the window hasn't shrunk.
@@ -406,19 +406,14 @@ def redact_sensitive_text(text: str, *, force: bool = False, code_file: bool = F
    if "eyJ" in text:
        text = _JWT_RE.sub(lambda m: _mask_token(m.group(0)), text)

-    # URL userinfo (http(s)://user:pass@host) — redact for non-DB schemes.
-    # DB schemes are handled above by _DB_CONNSTR_RE.
-    if "://" in text:
-        text = _redact_url_userinfo(text)
-
-        # URL query params containing opaque tokens (?access_token=…&code=…)
-        if "?" in text:
-            text = _redact_url_query_params(text)
-
-    # HTTP access logs can contain relative request targets with query params
-    # and no URL scheme, e.g. `"POST /hook?password=... HTTP/1.1"`.
-    if "?" in text and "=" in text and _has_http_method_substring(text):
-        text = _redact_http_request_target_query_params(text)
+    # NOTE: Web-URL redaction (query params + userinfo + HTTP access-log
+    # request targets) is intentionally OFF. Many legitimate workflows pass
+    # opaque tokens through query strings — magic-link checkouts, OAuth
+    # callbacks the agent is meant to follow, pre-signed share URLs — and
+    # blanket-redacting param values by name breaks those skills mid-flow.
+    # Known credential shapes (sk-, ghp_, JWTs, etc.) inside URLs are still
+    # caught by _PREFIX_RE and _JWT_RE above. DB connection-string passwords
+    # are still caught by _DB_CONNSTR_RE.

    # Form-urlencoded bodies (only triggers on clean k=v&k=v inputs).
    if "&" in text and "=" in text:
@@ -258,7 +258,7 @@ def emit_stream_drop(
        except Exception:
            pass
    try:
-        agent._emit_status(
+        agent._buffer_status(
            f"⚠️ {provider} stream {kind} ({type(error).__name__}){_suffix} "
            f"— reconnecting, retry {attempt}/{max_attempts}"
        )
@@ -83,6 +83,34 @@ _UTC_NOW = lambda: datetime.now(timezone.utc)
 # Official docs snapshot entries. Models whose published pricing and cache
 # semantics are stable enough to encode exactly.
 _OFFICIAL_DOCS_PRICING: Dict[tuple[str, str], PricingEntry] = {
+    # ── Anthropic Claude 4.8 ─────────────────────────────────────────────
+    # Same $5/$25 base pricing as 4.6/4.7.  Fast-mode variant is a separate
+    # model ID with 2x premium (vs the 6x premium on older Opus generations).
+    # Source: https://openrouter.ai/anthropic/claude-opus-4.8
+    (
+        "anthropic",
+        "claude-opus-4-8",
+    ): PricingEntry(
+        input_cost_per_million=Decimal("5.00"),
+        output_cost_per_million=Decimal("25.00"),
+        cache_read_cost_per_million=Decimal("0.50"),
+        cache_write_cost_per_million=Decimal("6.25"),
+        source="official_docs_snapshot",
+        source_url="https://platform.claude.com/docs/en/about-claude/pricing",
+        pricing_version="anthropic-pricing-2026-05",
+    ),
+    (
+        "anthropic",
+        "claude-opus-4-8-fast",
+    ): PricingEntry(
+        input_cost_per_million=Decimal("10.00"),
+        output_cost_per_million=Decimal("50.00"),
+        cache_read_cost_per_million=Decimal("1.00"),
+        cache_write_cost_per_million=Decimal("12.50"),
+        source="official_docs_snapshot",
+        source_url="https://openrouter.ai/anthropic/claude-opus-4.8-fast",
+        pricing_version="anthropic-pricing-2026-05",
+    ),
    # ── Anthropic Claude 4.7 ─────────────────────────────────────────────
    # Opus 4.5/4.6/4.7 share $5/$25 pricing (new tokenizer, up to 35% more
    # tokens for the same text).
@@ -61,14 +61,14 @@ from typing import Any, Dict, List


 class WebSearchProvider(abc.ABC):
-    """Abstract base class for a web search/extract/crawl backend.
+    """Abstract base class for a web search/extract backend.

    Subclasses must implement :meth:`is_available` and at least one of
-    :meth:`search` / :meth:`extract` / :meth:`crawl`. The
-    :meth:`supports_search` / :meth:`supports_extract` / :meth:`supports_crawl`
-    capability flags let the registry route each tool call to the right
-    provider, and let multi-capability providers (Firecrawl, Tavily, Exa,
-    …) advertise multiple capabilities from a single class.
+    :meth:`search` / :meth:`extract`. The :meth:`supports_search` /
+    :meth:`supports_extract` capability flags let the registry route each
+    tool call to the right provider, and let multi-capability providers
+    (Firecrawl, Tavily, Exa, …) advertise multiple capabilities from a
+    single class.
    """

    @property
@@ -113,22 +113,6 @@ class WebSearchProvider(abc.ABC):
        """
        return False

-    def supports_crawl(self) -> bool:
-        """Return True if this provider implements :meth:`crawl`.
-
-        Crawl differs from extract in that the agent provides a *seed URL*
-        and the provider walks linked pages on its own — useful for
-        documentation sites where the agent doesn't know all relevant
-        URLs upfront. Tavily is the only built-in backend that natively
-        crawls today; Firecrawl provides a similar capability that we
-        don't currently surface as a tool.
-
-        Providers that don't crawl should leave this as False; the
-        dispatcher in :func:`tools.web_tools.web_crawl_tool` will fall
-        back to its auxiliary-model summarization path.
-        """
-        return False
-
    def search(self, query: str, limit: int = 5) -> Dict[str, Any]:
        """Execute a web search.

@@ -173,26 +157,6 @@ class WebSearchProvider(abc.ABC):
            f"{self.name} does not support extract (override supports_extract)"
        )

-    def crawl(self, url: str, **kwargs: Any) -> Any:
-        """Crawl a seed URL and return results.
-
-        Override when :meth:`supports_crawl` returns True. The default
-        raises NotImplementedError; callers should gate on
-        :meth:`supports_crawl` before calling.
-
-        Return shape: ``{"results": [{"url": str, "title": str,
-        "content": str, ...}, ...]}`` matching what
-        :func:`tools.web_tools.web_crawl_tool` post-processing expects.
-
-        Implementations MAY be ``async def``.
-
-        ``kwargs`` may carry forward-compat fields (e.g. ``max_depth``,
-        ``include_domains``) — implementations should ignore unknown keys.
-        """
-        raise NotImplementedError(
-            f"{self.name} does not support crawl (override supports_crawl)"
-        )
-
    def get_setup_schema(self) -> Dict[str, Any]:
        """Return provider metadata for the ``hermes tools`` picker.

@@ -11,7 +11,7 @@ Active selection
 ----------------
 The active provider is chosen by configuration with this precedence:

-1. ``web.search_backend`` / ``web.extract_backend`` / ``web.crawl_backend``
+1. ``web.search_backend`` / ``web.extract_backend``
   (per-capability override).
 2. ``web.backend`` (shared fallback).
 3. If exactly one capability-eligible provider is registered AND available,
@@ -24,10 +24,10 @@ The active provider is chosen by configuration with this precedence:
 5. Otherwise ``None`` — the tool surfaces a helpful error pointing at
   ``hermes tools``.

-The capability filter (``supports_search`` / ``supports_extract`` /
-``supports_crawl``) is applied at every step so a search-only provider
-(``brave-free``) configured as ``web.extract_backend`` correctly falls
-through to an extract-capable backend.
+The capability filter (``supports_search`` / ``supports_extract``) is
+applied at every step so a search-only provider (``brave-free``)
+configured as ``web.extract_backend`` correctly falls through to an
+extract-capable backend.
 """

 from __future__ import annotations
@@ -131,7 +131,7 @@ _LEGACY_PREFERENCE = (


 def _resolve(configured: Optional[str], *, capability: str) -> Optional[WebSearchProvider]:
-    """Resolve the active provider for a capability ("search" | "extract" | "crawl").
+    """Resolve the active provider for a capability ("search" | "extract").

    Resolution rules (in order):

@@ -168,8 +168,6 @@ def _resolve(configured: Optional[str], *, capability: str) -> Optional[WebSearc
            return bool(p.supports_search())
        if capability == "extract":
            return bool(p.supports_extract())
-        if capability == "crawl":
-            return bool(p.supports_crawl())
        return False

    def _is_available_safe(p: WebSearchProvider) -> bool:
@@ -241,21 +239,6 @@ def get_active_extract_provider() -> Optional[WebSearchProvider]:
    return _resolve(explicit, capability="extract")


-def get_active_crawl_provider() -> Optional[WebSearchProvider]:
-    """Resolve the currently-active web crawl provider.
-
-    Reads ``web.crawl_backend`` (preferred) or ``web.backend`` (shared
-    fallback) from config.yaml; falls back per the module docstring.
-
-    Crawl is a niche capability — among built-in providers only Tavily and
-    Firecrawl implement it. Callers should expect ``None`` and fall back to
-    a different strategy (e.g. summarize-via-LLM) when neither is
-    configured.
-    """
-    explicit = _read_config_key("web", "crawl_backend") or _read_config_key("web", "backend")
-    return _resolve(explicit, capability="crawl")
-
-
 def _reset_for_tests() -> None:
    """Clear the registry. **Test-only.**"""
    with _lock:
@@ -168,7 +168,7 @@ from hermes_cli.browser_connect import (
    try_launch_chrome_debug,
 )
 from hermes_cli.env_loader import load_hermes_dotenv
-from utils import base_url_host_matches, is_truthy_value
+from utils import base_url_host_matches

 _hermes_home = get_hermes_home()
 _project_env = Path(__file__).parent / '.env'
@@ -3747,7 +3747,7 @@ class HermesCLI:
            percent_label = f"{percent}%" if percent is not None else "--"
            duration_label = snapshot["duration"]

-            yolo_active = bool(os.getenv("HERMES_YOLO_MODE"))
+            yolo_active = self._is_session_yolo_active()
            if width < 52:
                text = f"⚕ {snapshot['model_short']} · {duration_label}"
                if yolo_active:
@@ -3808,7 +3808,7 @@ class HermesCLI:
            # line and produce duplicated status bar rows over long sessions.
            width = self._get_tui_terminal_width()
            duration_label = snapshot["duration"]
-            yolo_active = bool(os.getenv("HERMES_YOLO_MODE"))
+            yolo_active = self._is_session_yolo_active()

            if width < 52:
                frags = [
@@ -6907,6 +6907,7 @@ class HermesCLI:
            pass

        # Switch to the new session
+        self._transfer_session_yolo(self.session_id, new_session_id)
        self.session_id = new_session_id
        self.session_start = now
        self._pending_title = None
@@ -7586,8 +7587,19 @@ class HermesCLI:
        parts = cmd_original.split(None, 1)  # split off '/model'
        raw_args = parts[1].strip() if len(parts) > 1 else ""

-        # Parse --provider and --global flags
-        model_input, explicit_provider, persist_global = parse_model_flags(raw_args)
+        # Parse --provider, --global, and --refresh flags
+        model_input, explicit_provider, persist_global, force_refresh = parse_model_flags(raw_args)
+
+        # --refresh: wipe the on-disk picker cache before building the
+        # provider list. Forces a live re-fetch of every authed provider's
+        # /v1/models endpoint on this open.
+        if force_refresh:
+            try:
+                from hermes_cli.models import clear_provider_models_cache
+                clear_provider_models_cache()
+                _cprint("  Cleared model picker cache. Refreshing...")
+            except Exception:
+                pass

        # Single inventory context — replaces the inline config-slice the
        # dashboard / TUI used to duplicate. Overlay live session state
@@ -7626,6 +7638,7 @@ class HermesCLI:
                _cprint("")
                _cprint("  /model <name>                        switch model")
                _cprint("  /model --provider <slug>             switch provider")
+                _cprint("  /model --refresh                     re-fetch live model lists")
                return

            self._open_model_picker(
@@ -9607,20 +9620,92 @@ class HermesCLI:
        }
        _cprint(labels.get(self.tool_progress_mode, ""))

-    def _toggle_yolo(self):
-        """Toggle YOLO mode — skip all dangerous command approval prompts."""
-        import os
-        from hermes_cli.colors import Colors as _Colors
+    def _transfer_session_yolo(self, old_session_id: str, new_session_id: str) -> None:
+        """Move YOLO bypass state from an old session key to a new one.

-        current = is_truthy_value(os.environ.get("HERMES_YOLO_MODE"))
-        if current:
-            os.environ.pop("HERMES_YOLO_MODE", None)
+        Called whenever ``self.session_id`` is reassigned mid-run — ``/branch``
+        forks into a new session, and auto-compression rotates the agent's
+        session id into a fresh continuation session. Without this transfer
+        the user's ``/yolo ON`` toggle would silently revert on the very next
+        turn (the same UX failure mode that motivated this entire fix), since
+        ``_session_yolo`` is keyed by session id.
+
+        Mirrors ``tui_gateway/server.py`` (~line 1297-1305) which performs the
+        same transfer for the TUI's session-rename path. No-op when YOLO
+        wasn't enabled or when the ids match.
+        """
+        if not old_session_id or not new_session_id or old_session_id == new_session_id:
+            return
+        try:
+            from tools.approval import (
+                disable_session_yolo,
+                enable_session_yolo,
+                is_session_yolo_enabled,
+            )
+        except Exception:
+            return
+        if is_session_yolo_enabled(old_session_id):
+            enable_session_yolo(new_session_id)
+            disable_session_yolo(old_session_id)
+
+    def _is_session_yolo_active(self) -> bool:
+        """Whether YOLO bypass is currently enabled for this CLI session.
+
+        Reads from ``tools.approval._session_yolo`` (the same set that
+        ``enable_session_yolo`` / ``disable_session_yolo`` write to) so the
+        status bar reflects the actual bypass state instead of a stale env
+        var. Also honors the process-start ``--yolo`` flag, which freezes
+        ``HERMES_YOLO_MODE`` into ``_YOLO_MODE_FROZEN`` before tool imports
+        happen.
+        """
+        try:
+            from tools.approval import (
+                _YOLO_MODE_FROZEN,
+                is_session_yolo_enabled,
+            )
+        except Exception:
+            return False
+        if _YOLO_MODE_FROZEN:
+            return True
+        # Use ``getattr`` so test fixtures that build a CLI via ``__new__``
+        # (skipping ``__init__``) don't trip an AttributeError here; the
+        # status-bar builders swallow exceptions silently but lose every
+        # field after the failure.
+        session_key = getattr(self, "session_id", None) or "default"
+        return is_session_yolo_enabled(session_key)
+
+    def _toggle_yolo(self):
+        """Toggle YOLO mode — skip all dangerous command approval prompts.
+
+        Per-session toggle that mirrors the gateway and TUI ``/yolo`` handlers
+        (see ``gateway/run.py:_handle_yolo_command`` and
+        ``tui_gateway/server.py`` key=="yolo"). We deliberately do NOT mutate
+        ``HERMES_YOLO_MODE`` here — that env var is read once at module import
+        time into ``tools.approval._YOLO_MODE_FROZEN`` to keep prompt-injected
+        skills from flipping the bypass mid-session, so setting it after CLI
+        startup is a silent no-op. Routing through ``enable_session_yolo`` /
+        ``disable_session_yolo`` gives the same auditable, per-session bypass
+        the other surfaces have. ``run_conversation`` binds
+        ``self.session_id`` as the active approval session key via
+        ``set_current_session_key`` so the bypass takes effect on the very
+        next dangerous command in this run.
+        """
+        from hermes_cli.colors import Colors as _Colors
+        from tools.approval import (
+            disable_session_yolo,
+            enable_session_yolo,
+            is_session_yolo_enabled,
+        )
+
+        session_key = self.session_id or "default"
+        if is_session_yolo_enabled(session_key):
+            disable_session_yolo(session_key)
            _cprint(
                f"  ⚠ YOLO mode {_Colors.BOLD}{_Colors.RED}OFF{_Colors.RESET}"
                " — dangerous commands will require approval."
            )
        else:
-            os.environ["HERMES_YOLO_MODE"] = "1"
+            enable_session_yolo(session_key)
            _cprint(
                f"  ⚡ YOLO mode {_Colors.BOLD}{_Colors.GREEN}ON{_Colors.RESET}"
                " — all commands auto-approved. Use with caution."
@@ -10667,7 +10752,8 @@ class HermesCLI:
        if not reqs.get("stt_available", reqs.get("stt_key_set")):
            raise RuntimeError(
                "Voice mode requires an STT provider for transcription.\n"
-                "Option 1: pip install faster-whisper  (free, local)\n"
+                "Option 1: uv pip install faster-whisper  "
+                "(free, local; `pip install faster-whisper` also works if pip is on PATH)\n"
                "Option 2: Set GROQ_API_KEY (free tier)\n"
                "Option 3: Set VOICE_TOOLS_OPENAI_KEY (paid)"
            )
@@ -11756,6 +11842,23 @@ class HermesCLI:
                    set_secret_capture_callback(self._secret_capture_callback)
                except Exception:
                    pass
+                # Bind this turn's approval session key into the contextvar so
+                # ``tools.approval.is_current_session_yolo_enabled()`` resolves
+                # against the same key that ``/yolo`` toggles under (see
+                # ``_toggle_yolo`` → ``enable_session_yolo(self.session_id)``).
+                # Mirrors ``tui_gateway/server.py`` and ``gateway/run.py`` which
+                # bind the same contextvar before invoking the agent.
+                try:
+                    from tools.approval import (
+                        reset_current_session_key,
+                        set_current_session_key,
+                    )
+                    _approval_session_token = set_current_session_key(
+                        self.session_id or "default"
+                    )
+                except Exception:
+                    reset_current_session_key = None  # type: ignore[assignment]
+                    _approval_session_token = None
                agent_message = _voice_prefix + message if _voice_prefix else message
                # Prepend pending model switch note so the model knows about the switch
                _msn = getattr(self, '_pending_model_switch_note', None)
@@ -11797,6 +11900,15 @@ class HermesCLI:
                        set_secret_capture_callback(None)
                    except Exception:
                        pass
+                    # Release the per-turn approval session key. ``_session_yolo``
+                    # state itself is preserved across turns (so /yolo persists
+                    # for the whole CLI run); we just unbind the contextvar so a
+                    # reused thread doesn't see stale identity on its next run.
+                    if _approval_session_token is not None and reset_current_session_key is not None:
+                        try:
+                            reset_current_session_key(_approval_session_token)
+                        except Exception:
+                            pass

            # Start agent in background thread (daemon so it cannot keep the
            # process alive when the user closes the terminal tab — SIGHUP
@@ -11927,6 +12039,7 @@ class HermesCLI:
                and getattr(self.agent, "session_id", None)
                and self.agent.session_id != self.session_id
            ):
+                self._transfer_session_yolo(self.session_id, self.agent.session_id)
                self.session_id = self.agent.session_id
                self._pending_title = None

@@ -14967,6 +15080,39 @@ def main(
                    time.sleep(_grace)
        except Exception:
            pass  # never block signal handling
+        # Kanban worker exit path (#28181): SIGTERM hits a dispatcher-spawned
+        # worker that's likely in a non-daemon thread waiting on a child
+        # subprocess in _wait_for_process. Raising KeyboardInterrupt only
+        # unwinds the main thread; the worker thread keeps running, the
+        # process gets reparented to init, and the dispatcher's _pid_alive
+        # check returns True forever — task stuck in 'running' indefinitely.
+        # Skip the controlled-unwind dance and call os._exit(0) so the kernel
+        # reclaims the PID immediately and detect_crashed_workers can reclaim
+        # the stale claim on the next tick. Flush logging + stdout/stderr
+        # first so the final debug trace isn't lost; SIGALRM deadman guards
+        # the flush against any rare blocking-I/O case (the reporter measured
+        # flush in <1ms; the alarm is a failsafe, not the common path).
+        if os.environ.get("HERMES_KANBAN_TASK"):
+            try:
+                import signal as _sig_mod
+                if hasattr(_sig_mod, "SIGALRM"):
+                    # Cancel any pre-existing alarm to avoid colliding with
+                    # caller-installed timers.
+                    _sig_mod.signal(_sig_mod.SIGALRM, lambda *_: os._exit(0))
+                    _sig_mod.alarm(2)
+            except Exception:
+                pass
+            try:
+                import logging as _lg
+                _lg.shutdown()
+            except Exception:
+                pass
+            for _stream in (sys.stdout, sys.stderr):
+                try:
+                    _stream.flush()
+                except Exception:
+                    pass
+            os._exit(0)
        raise KeyboardInterrupt()
    try:
        import signal as _signal
@@ -24,7 +24,8 @@ Exposes an HTTP server with endpoints:

 Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
 AnythingLLM, NextChat, ChatBox, etc.) can connect to hermes-agent
-through this adapter by pointing at http://localhost:8642/v1.
+through this adapter by pointing at http://localhost:8642/v1 and
+authenticating with API_SERVER_KEY.

 Requires:
 - aiohttp (already available in the gateway)
@@ -844,11 +845,11 @@ class APIServerAdapter(BasePlatformAdapter):
        Validate Bearer token from Authorization header.

        Returns None if auth is OK, or a 401 web.Response on failure.
-        If no API key is configured, all requests are allowed (only when API
-        server is local).
+        connect() refuses to start the API server without API_SERVER_KEY, so
+        the no-key branch only exists for tests or unsupported manual wiring.
        """
        if not self._api_key:
-            return None  # No key configured — allow all (local-only use)
+            return None

        auth_header = request.headers.get("Authorization", "")
        if auth_header.startswith("Bearer "):
@@ -4099,11 +4100,13 @@ class APIServerAdapter(BasePlatformAdapter):
            if hasattr(sweep_task, "add_done_callback"):
                sweep_task.add_done_callback(self._background_tasks.discard)

-            # Refuse to start network-accessible without authentication
-            if is_network_accessible(self._host) and not self._api_key:
+            # Refuse to start without authentication. The API server can
+            # dispatch terminal-capable agent work, so every deployment needs
+            # an explicit API_SERVER_KEY regardless of bind address.
+            if not self._api_key:
                logger.error(
-                    "[%s] Refusing to start: binding to %s requires API_SERVER_KEY. "
-                    "Set API_SERVER_KEY or use the default 127.0.0.1.",
+                    "[%s] Refusing to start: API_SERVER_KEY is required for the API server, "
+                    "including loopback-only binds on %s.",
                    self.name, self._host,
                )
                return False
@@ -4141,14 +4144,6 @@ class APIServerAdapter(BasePlatformAdapter):
            await self._site.start()

            self._mark_connected()
-            if not self._api_key:
-                logger.warning(
-                    "[%s] ⚠️  No API key configured (API_SERVER_KEY / platforms.api_server.key). "
-                    "All requests will be accepted without authentication. "
-                    "Set an API key for production deployments to prevent "
-                    "unauthorized access to sessions, responses, and cron jobs.",
-                    self.name,
-                )
            logger.info(
                "[%s] API server listening on http://%s:%d (model: %s)",
                self.name, self._host, self._port, self._model_name,
@@ -829,6 +829,13 @@ _HERMES_HOME = get_hermes_home()
 MEDIA_DELIVERY_ALLOW_DIRS_ENV = "HERMES_MEDIA_ALLOW_DIRS"
 MEDIA_DELIVERY_TRUST_RECENT_ENV = "HERMES_MEDIA_TRUST_RECENT_FILES"
 MEDIA_DELIVERY_TRUST_RECENT_SECONDS_ENV = "HERMES_MEDIA_TRUST_RECENT_SECONDS"
+# Strict mode toggles the original allowlist+recency path-validation behavior.
+# Off by default — symmetric with inbound (we accept any document type the
+# user uploads), and with the denylist still blocking obvious credential /
+# system paths. Operators running public-facing gateways where prompt
+# injection from one user could exfiltrate the host's secrets to that same
+# user should set this to true.
+MEDIA_DELIVERY_STRICT_ENV = "HERMES_MEDIA_DELIVERY_STRICT"
 MEDIA_DELIVERY_SAFE_ROOTS = (
    IMAGE_CACHE_DIR,
    AUDIO_CACHE_DIR,
@@ -918,6 +925,21 @@ def _media_delivery_recency_seconds() -> float:
    return float(_MEDIA_DELIVERY_TRUST_RECENT_DEFAULT_SECONDS)


+def _media_delivery_strict_mode() -> bool:
+    """Return True when path validation should require allowlist/recency match.
+
+    Off by default. In non-strict mode, ``validate_media_delivery_path``
+    accepts any existing regular file that isn't under the credential /
+    system-path denylist — restoring the pre-#29523 behavior for the
+    single-user case. Strict mode preserves the original
+    allowlist+recency-window logic for operators running public-facing
+    gateways where prompt injection from one user shouldn't be able to
+    exfiltrate the host's secrets to that same user.
+    """
+    raw = os.environ.get(MEDIA_DELIVERY_STRICT_ENV, "0").strip().lower()
+    return raw in ("1", "true", "yes", "on")
+
+
 def _media_delivery_denied_paths() -> List[Path]:
    """Return absolute denylist paths under which delivery is never allowed."""
    denied = [Path(p) for p in _MEDIA_DELIVERY_DENIED_PREFIXES]
@@ -972,10 +994,22 @@ def _path_is_within(path: Path, root: Path) -> bool:
 def validate_media_delivery_path(path: str) -> Optional[str]:
    """Return a safe absolute file path for native media delivery, else None.

-    MEDIA tags and bare local paths in model output are untrusted text. Only
-    existing regular files under Hermes-managed media caches, or roots the
-    operator explicitly allowlists, may be uploaded as native attachments.
-    Symlinks are resolved before the containment check.
+    Default mode (single-user / private gateway): accept any existing regular
+    file that isn't under the credential / system-path denylist
+    (``_MEDIA_DELIVERY_DENIED_PREFIXES`` + ``~/.ssh``, ``~/.aws``, etc.).
+    This matches the symmetry of inbound delivery — Telegram/Discord/Slack
+    will hand the agent any file the user uploads, and the agent can hand
+    back any file that isn't a credential.
+
+    Strict mode (opt-in via ``gateway.strict`` in ``config.yaml`` or
+    ``HERMES_MEDIA_DELIVERY_STRICT=1``): the file MUST live under a
+    Hermes-managed cache, under an operator-allowlisted root
+    (``HERMES_MEDIA_ALLOW_DIRS``), or be freshly produced inside the
+    configured recency window. Suitable for public-facing bots where
+    prompt injection from one user shouldn't be able to exfiltrate the
+    host's secrets to that same user.
+
+    Symlinks are resolved before any containment / denylist check.
    """
    if not path:
        return None
@@ -999,6 +1033,8 @@ def validate_media_delivery_path(path: str) -> Optional[str]:
    if not resolved.is_file():
        return None

+    # Cache / operator allowlist is always honored — these are unconditionally
+    # trusted regardless of mode.
    for root in _media_delivery_allowed_roots():
        try:
            resolved_root = root.expanduser().resolve(strict=False)
@@ -1007,9 +1043,18 @@ def validate_media_delivery_path(path: str) -> Optional[str]:
        if _path_is_within(resolved, resolved_root):
            return str(resolved)

-    # Outside the cache/operator allowlist: fall back to recency-based trust
-    # for files the agent has just produced (e.g. ``pandoc -o /tmp/report.pdf``
-    # or ``write_file("/home/user/report.pdf", ...)``). System paths and
+    # Non-strict mode (default): accept anything not on the denylist.
+    # The denylist still blocks /etc, /proc, ~/.ssh, ~/.aws, ~/.hermes/.env,
+    # ~/.hermes/auth.json, etc. — so the obvious prompt-injection sites
+    # (``MEDIA:/etc/passwd``, ``MEDIA:~/.ssh/id_rsa``) remain rejected.
+    if not _media_delivery_strict_mode():
+        if _path_under_denied_prefix(resolved):
+            return None
+        return str(resolved)
+
+    # Strict mode: fall back to recency-based trust for freshly-produced
+    # files (e.g. ``pandoc -o /tmp/report.pdf`` or
+    # ``write_file("/home/user/report.pdf", ...)``). System paths and
    # credential locations remain blocked even when "recent" — see
    # ``_MEDIA_DELIVERY_DENIED_PREFIXES`` for the denylist.
    window = _media_delivery_recency_seconds()
@@ -25,6 +25,7 @@ from gateway.platforms.base import (
    MessageEvent,
    MessageType,
    SendResult,
+    is_network_accessible,
 )

 logger = logging.getLogger(__name__)
@@ -132,12 +133,24 @@ class MSGraphWebhookAdapter(BasePlatformAdapter):
    def set_notification_scheduler(self, scheduler: Optional[NotificationScheduler]) -> None:
        self._notification_scheduler = scheduler

+    def _source_allowlist_required_but_missing(self) -> bool:
+        return is_network_accessible(self._host) and not self._allowed_source_networks
+
    async def connect(self) -> bool:
        if self._client_state is None:
            logger.error(
                "[msgraph_webhook] Refusing to start without extra.client_state configured"
            )
            return False
+        if self._source_allowlist_required_but_missing():
+            logger.error(
+                "[msgraph_webhook] Refusing to start: binding to %s requires "
+                "extra.allowed_source_cidrs. Configure the Microsoft Graph "
+                "source CIDRs or bind to loopback (127.0.0.1/::1) behind a "
+                "tunnel or reverse proxy.",
+                self._host,
+            )
+            return False

        app = web.Application()
        app.router.add_get(self._health_path, self._handle_health)
@@ -177,6 +190,8 @@ class MSGraphWebhookAdapter(BasePlatformAdapter):
        return {"name": chat_id, "type": "webhook"}

    async def _handle_health(self, request: "web.Request") -> "web.Response":
+        if not self._source_ip_allowed(request):
+            return web.Response(status=403)
        return web.json_response(
            {
                "status": "ok",
@@ -271,9 +286,12 @@ class MSGraphWebhookAdapter(BasePlatformAdapter):
    def _source_ip_allowed(self, request: "web.Request") -> bool:
        """Return True if the request's source IP is in the configured allowlist.

-        When ``allowed_source_cidrs`` is empty (the default), everything is
-        allowed — preserves behavior for dev tunnels / localhost setups.
+        Loopback-only binds may omit ``allowed_source_cidrs`` for local reverse
+        proxies and dev tunnels. Network-accessible binds fail closed until an
+        explicit CIDR allowlist is configured.
        """
+        if self._source_allowlist_required_but_missing():
+            return False
        if not self._allowed_source_networks:
            return True
        peer = request.remote or ""
@@ -932,9 +932,14 @@ if _config_path.exists():
            _redact = _security_cfg.get("redact_secrets")
            if _redact is not None:
                os.environ["HERMES_REDACT_SECRETS"] = str(_redact).lower()
-        # Gateway settings (media delivery allowlist + recency trust)
+        # Gateway settings (media delivery allowlist + recency trust + strict mode)
        _gateway_cfg = _cfg.get("gateway", {})
        if isinstance(_gateway_cfg, dict):
+            _strict = _gateway_cfg.get("strict")
+            if _strict is not None:
+                os.environ["HERMES_MEDIA_DELIVERY_STRICT"] = (
+                    "1" if _strict else "0"
+                )
            _allow_dirs = _gateway_cfg.get("media_delivery_allow_dirs")
            if _allow_dirs:
                if isinstance(_allow_dirs, str):
@@ -8017,7 +8022,8 @@ class GatewayRunner:
                                "🎤 I received your voice message but can't transcribe it — "
                                "no speech-to-text provider is configured.\n\n"
                                "To enable voice: install faster-whisper "
-                                "(`pip install faster-whisper` in the Hermes venv) "
+                                "(`uv pip install faster-whisper` in the Hermes venv; "
+                                "`pip install faster-whisper` also works if pip is on PATH) "
                                "and set `stt.enabled: true` in config.yaml, "
                                "then /restart the gateway."
                            )
@@ -10240,8 +10246,16 @@ class GatewayRunner:

        raw_args = event.get_command_args().strip()

-        # Parse --provider and --global flags
-        model_input, explicit_provider, persist_global = parse_model_flags(raw_args)
+        # Parse --provider, --global, and --refresh flags
+        model_input, explicit_provider, persist_global, force_refresh = parse_model_flags(raw_args)
+
+        # --refresh: bust the disk cache so the picker shows live data.
+        if force_refresh:
+            try:
+                from hermes_cli.models import clear_provider_models_cache
+                clear_provider_models_cache()
+            except Exception:
+                pass

        # Read current model/provider from config
        current_model = ""
@@ -18192,6 +18206,72 @@ class GatewayRunner:
        return response


+def _run_planned_stop_watcher(
+    stop_event: threading.Event,
+    runner,
+    loop: asyncio.AbstractEventLoop,
+    shutdown_handler,
+    *,
+    poll_interval: float = 0.5,
+) -> None:
+    """Poll for the planned-stop marker and trigger graceful shutdown.
+
+    On Windows, ``asyncio.add_signal_handler`` raises NotImplementedError
+    for SIGTERM/SIGINT, so the standard signal-driven shutdown path
+    never runs when ``hermes gateway stop`` signals the gateway. The
+    consequence is that the drain loop is skipped — in-flight agent
+    sessions are killed mid-turn and ``resume_pending`` is never set,
+    so the next gateway boot has no idea those sessions need to be
+    auto-resumed (issue #33778, v0.13.0 session-resume feature broken
+    on native Windows).
+
+    This watcher runs on every platform (cheap, defensive) and bridges
+    the gap on Windows by translating a filesystem marker into the
+    same shutdown-handler invocation a real SIGTERM would have produced
+    on POSIX. The CLI's ``hermes_cli.gateway_windows.stop()`` writes
+    the marker via ``write_planned_stop_marker(pid)`` and then waits
+    for the gateway PID to exit; this watcher is what makes that
+    exit happen cleanly.
+
+    On POSIX this is a no-op safety net — the signal handler always
+    races us to consuming the marker file because it fires synchronously
+    from the kernel's signal delivery.
+
+    Args:
+        stop_event: cleared by start_gateway() during normal shutdown
+            to tell the watcher to exit.
+        runner: the GatewayRunner instance; we check ``_running`` and
+            ``_draining`` to avoid triggering shutdown if the gateway
+            is already in one of those states.
+        loop: the asyncio event loop the shutdown handler must run on.
+        shutdown_handler: same callable that's wired to SIGTERM —
+            tolerates a ``None`` signal argument (planned stop case)
+            and consumes the marker via
+            ``consume_planned_stop_marker_for_self()``.
+        poll_interval: seconds between marker checks. 0.5s gives a
+            responsive shutdown without burning CPU.
+    """
+    from gateway.status import _get_planned_stop_marker_path
+    marker_path = _get_planned_stop_marker_path()
+    while not stop_event.is_set():
+        try:
+            if (
+                marker_path.exists()
+                and not getattr(runner, "_draining", False)
+                and getattr(runner, "_running", False)
+            ):
+                # Drive the same path as a real signal handler.
+                # Pass signal=None — the handler tolerates that and consumes
+                # the marker via consume_planned_stop_marker_for_self,
+                # which also validates target_pid + start_time match us.
+                loop.call_soon_threadsafe(shutdown_handler, None)
+                # Done — the handler will set _draining; we exit on next tick.
+                break
+        except Exception as _e:
+            logger.debug("Planned-stop watcher tick error: %s", _e)
+        stop_event.wait(poll_interval)
+
+
 def _start_cron_ticker(stop_event: threading.Event, adapters=None, loop=None, interval: int = 60):
    """
    Background thread that ticks the cron scheduler at a regular interval.
@@ -18596,7 +18676,28 @@ async def start_gateway(config: Optional[GatewayConfig] = None, replace: bool =
                pass
    else:
        logger.info("Skipping signal handlers (not running in main thread).")
-    
+
+    # Windows fallback: asyncio.add_signal_handler raises NotImplementedError
+    # on Windows, so `hermes gateway stop`'s SIGTERM (which Python maps to
+    # TerminateProcess on Windows) never invokes shutdown_signal_handler.
+    # That means the drain loop never runs, mark_resume_pending never fires,
+    # and sessions are silently lost across restarts (issue #33778).
+    #
+    # The fix is a marker-polling thread: `hermes gateway stop` writes the
+    # planned-stop marker BEFORE killing, and this thread notices it and
+    # drives the same shutdown path the signal handler would have.  Runs
+    # on every platform (cheap, defensive) so non-signal-bearing
+    # environments (Windows native, sandboxed CI runners that mask
+    # SIGTERM) still get a clean drain.
+    _planned_stop_watcher_stop = threading.Event()
+    _planned_stop_watcher_thread = threading.Thread(
+        target=_run_planned_stop_watcher,
+        args=(_planned_stop_watcher_stop, runner, loop, shutdown_signal_handler),
+        daemon=True,
+        name="planned-stop-watcher",
+    )
+    _planned_stop_watcher_thread.start()
+
    # Claim the PID file BEFORE bringing up any platform adapters.
    # This closes the --replace race window: two concurrent `gateway run
    # --replace` invocations both pass the termination-wait above, but
@@ -18674,6 +18775,10 @@ async def start_gateway(config: Optional[GatewayConfig] = None, replace: bool =
    cron_stop.set()
    cron_thread.join(timeout=5)

+    # Stop the planned-stop watcher (daemon=True so this is belt-and-suspenders).
+    _planned_stop_watcher_stop.set()
+    _planned_stop_watcher_thread.join(timeout=2)
+
    # Close MCP server connections
    try:
        from tools.mcp_tool import shutdown_mcp_servers
@@ -552,11 +552,6 @@ class GatewayStreamConsumer:
                    self._last_edit_time = time.monotonic()

                if got_done:
-                    # Record that the final content reached the user even
-                    # if the cosmetic final edit below fails.
-                    if current_update_visible and self._accumulated:
-                        self._final_content_delivered = True
-
                    # Final edit without cursor. If progressive editing failed
                    # mid-stream, send a single continuation/fallback message
                    # here instead of letting the base gateway path send the
@@ -573,6 +568,7 @@ class GatewayStreamConsumer:
                            # final edit — but only for adapters that don't
                            # need an explicit finalize signal.
                            self._final_response_sent = True
+                            self._final_content_delivered = True
                        elif self._message_id:
                            # Either the mid-stream edit didn't run (no
                            # visible update this tick) OR the adapter needs
@@ -580,8 +576,12 @@ class GatewayStreamConsumer:
                            self._final_response_sent = await self._send_or_edit(
                                self._accumulated, finalize=True,
                            )
+                            if self._final_response_sent:
+                                self._final_content_delivered = True
                        elif not self._already_sent:
                            self._final_response_sent = await self._send_or_edit(self._accumulated)
+                            if self._final_response_sent:
+                                self._final_content_delivered = True
                    return

                if commentary_text is not None:
@@ -641,6 +641,7 @@ class GatewayStreamConsumer:
            # "Let me search…") had been delivered, not the real answer.
            if _best_effort_ok and not self._final_response_sent:
                self._final_response_sent = True
+                self._final_content_delivered = True
        except Exception as e:
            logger.error("Stream consumer error: %s", e)

@@ -778,6 +779,7 @@ class GatewayStreamConsumer:
                        pass
                self._already_sent = True
                self._final_response_sent = True
+                self._final_content_delivered = True
                return

        raw_limit = getattr(self.adapter, "MAX_MESSAGE_LENGTH", 4096)
@@ -814,11 +816,13 @@ class GatewayStreamConsumer:

            if not result or not result.success:
                if sent_any_chunk:
-                    # Some continuation text already reached the user. Suppress
-                    # the base gateway final-send path so we don't resend the
-                    # full response and create another duplicate.
+                    # Some continuation text already reached the user, but not
+                    # the full response. Do NOT set _final_response_sent — the
+                    # base gateway final-send path should still deliver the
+                    # complete response so the user gets the full answer.
+                    # Suppress only _already_sent to avoid a duplicate send
+                    # of the same partial content.
                    self._already_sent = True
-                    self._final_response_sent = True
                    self._message_id = last_message_id
                    self._last_sent_text = last_successful_chunk
                    self._fallback_prefix = ""
@@ -856,6 +860,7 @@ class GatewayStreamConsumer:
        self._message_id = last_message_id
        self._already_sent = True
        self._final_response_sent = True
+        self._final_content_delivered = True
        self._last_sent_text = chunks[-1]
        self._fallback_prefix = ""

@@ -14,8 +14,8 @@ Provides subcommands for:
 import os
 import sys

-__version__ = "0.14.0"
-__release_date__ = "2026.5.16"
+__version__ = "0.15.0"
+__release_date__ = "2026.5.28"


 def _ensure_utf8():
@@ -802,16 +802,18 @@ def format_auth_error(error: Exception) -> str:
        return f"{error} Run `hermes model` to re-authenticate."

    if error.code == "subscription_required":
-        return (
-            "No active paid subscription found on Nous Portal. "
-            "Please purchase/activate a subscription, then retry."
-        )
+        if error.provider == "nous":
+            return _format_nous_entitlement_auth_error(error)
+        return "No active paid subscription found. Please purchase/activate a subscription, then retry."

    if error.code == "insufficient_credits":
-        return (
-            "Subscription credits are exhausted. "
-            "Top up/renew credits in Nous Portal, then retry."
-        )
+        if error.provider == "nous":
+            return _format_nous_entitlement_auth_error(error)
+        return "Subscription credits are exhausted. Top up/renew credits, then retry."
+
+    if error.code in {"subscription_expired", "no_usable_credits", "account_missing"}:
+        if error.provider == "nous":
+            return _format_nous_entitlement_auth_error(error)

    if error.code == "temporarily_unavailable":
        return f"{error} Please retry in a few seconds."
@@ -819,6 +821,25 @@ def format_auth_error(error: Exception) -> str:
    return str(error)


+def _format_nous_entitlement_auth_error(error: AuthError) -> str:
+    try:
+        from hermes_cli.nous_account import (
+            format_nous_portal_entitlement_message,
+            get_nous_portal_account_info,
+        )
+
+        account_info = get_nous_portal_account_info(force_fresh=True)
+        message = format_nous_portal_entitlement_message(
+            account_info,
+            capability="Nous model access",
+        )
+        if message:
+            return message
+    except Exception:
+        pass
+    return f"{error} Check credits or billing in Nous Portal, then retry."
+
+
 def _token_fingerprint(token: Any) -> Optional[str]:
    """Return a short hash fingerprint for telemetry without leaking token bytes."""
    if not isinstance(token, str):
@@ -3160,6 +3181,9 @@ def _prompt_manual_callback_paste(redirect_uri: str) -> dict:
    print("not on your laptop) — that is expected.  Copy the FULL URL")
    print("from your browser's address bar of that failed page and paste")
    print("it below.  A bare '?code=...&state=...' fragment also works.")
+    print("If the consent page shows the authorization code in-page")
+    print("(xAI's current behavior) rather than redirecting, paste the")
+    print("bare code value on its own.")
    print("───────────────────────────────────────────────────────────────")
    try:
        raw = input("Callback URL: ")
@@ -3291,16 +3315,38 @@ def _sync_codex_pool_entries(
    tokens: Dict[str, str],
    last_refresh: Optional[str],
 ) -> None:
-    """Mirror a fresh Codex re-auth into the credential_pool singleton entries.
+    """Mirror a fresh Codex re-auth into the credential_pool OAuth entries.

    The runtime selects credentials from ``credential_pool.openai-codex``, not
    from ``providers.openai-codex.tokens``.  A re-auth invalidates the prior
-    OAuth pair server-side, but the pool's ``device_code`` entry keeps holding
-    the now-consumed refresh token plus any stale error markers — so the next
-    request spends a dead token and gets a 401 ``token_invalidated``.  Update
-    the singleton-seeded entries in lockstep with the provider tokens and clear
-    the error state so the fresh credentials take effect immediately.  Manual
-    (``manual:*``) entries are independent credentials and are left untouched.
+    OAuth pair server-side, but pool entries keep holding the now-consumed
+    refresh token plus any stale error markers — so the next request spends a
+    dead token and gets a 401 ``token_invalidated``.
+
+    What gets refreshed:
+
+    * ``device_code`` — the singleton-seeded entry written by the device-code
+      OAuth flow when the user logged in via ``hermes setup`` / the model
+      picker.  Always synced with the fresh tokens.
+    * ``manual:device_code`` — entries created by ``hermes auth add openai-codex``
+      that use the same device-code OAuth mechanism.  An interactive re-auth
+      proves the user owns the ChatGPT account, so it is safe (and expected)
+      to refresh these entries too.  Without this, a user who once ran the
+      ``hermes auth add`` workaround for #33000 would silently leave that
+      manual entry stale on every subsequent re-auth, recreating the issue
+      reported in #33538.
+
+    What does NOT get refreshed:
+
+    * ``manual:api_key`` and any other non-device-code manual sources — those
+      are independent credentials (an explicit API key, a different ChatGPT
+      account, etc.) and must not be overwritten by a single re-auth.
+
+    Error markers (``last_status``, ``last_error_*``) are also cleared on
+    every device-code-backed entry — even those whose tokens we did not
+    rewrite — so that an interactive re-auth gives every relevant pool entry
+    a fresh selection chance instead of leaving them marked unhealthy from a
+    pre-re-auth 401.
    """
    access_token = tokens.get("access_token")
    if not access_token:
@@ -3312,8 +3358,15 @@ def _sync_codex_pool_entries(
    entries = pool.get("openai-codex")
    if not isinstance(entries, list):
        return
+    # Sources whose tokens should be rewritten by a fresh Codex device-code
+    # OAuth re-auth.  ``manual:api_key`` and unknown sources are intentionally
+    # excluded — they represent independent credentials.
+    REFRESHABLE_SOURCES = {"device_code", "manual:device_code"}
    for entry in entries:
-        if not isinstance(entry, dict) or entry.get("source") != "device_code":
+        if not isinstance(entry, dict):
+            continue
+        source = entry.get("source")
+        if source not in REFRESHABLE_SOURCES:
            continue
        entry["access_token"] = access_token
        if refresh_token:
@@ -5627,6 +5680,8 @@ def _empty_nous_auth_status() -> Dict[str, Any]:
        "access_expires_at": None,
        "agent_key_expires_at": None,
        "has_refresh_token": False,
+        "inference_credential_present": False,
+        "credential_source": None,
    }


@@ -5655,24 +5710,36 @@ def _snapshot_nous_pool_status() -> Dict[str, Any]:
            return (agent_exp, access_exp, -priority)

        entry = max(entries, key=_entry_sort_key)
-        access_token = (
-            getattr(entry, "access_token", None)
-            or getattr(entry, "runtime_api_key", "")
-        )
-        if not access_token:
+        runtime_key = getattr(entry, "runtime_api_key", None) or getattr(entry, "access_token", "")
+        if not runtime_key:
            return _empty_nous_auth_status()
+        access_token = getattr(entry, "access_token", None)
+        auth_type = str(getattr(entry, "auth_type", "") or "").strip().lower()
+        refresh_token = getattr(entry, "refresh_token", None)
+        is_portal_oauth = bool(access_token) and (
+            auth_type.startswith("oauth") or bool(refresh_token)
+        )
+        label = getattr(entry, "label", "unknown")
+        portal_status_url = None
+        if is_portal_oauth:
+            portal_status_url = (
+                getattr(entry, "portal_base_url", None)
+                or DEFAULT_NOUS_PORTAL_URL
+            )

        return {
-            "logged_in": True,
-            "portal_base_url": getattr(entry, "portal_base_url", None)
-            or getattr(entry, "base_url", None),
+            "logged_in": is_portal_oauth,
+            "portal_base_url": portal_status_url,
            "inference_base_url": getattr(entry, "inference_base_url", None)
+            or getattr(entry, "runtime_base_url", None)
            or getattr(entry, "base_url", None),
-            "access_token": access_token,
+            "access_token": access_token if is_portal_oauth else None,
            "access_expires_at": getattr(entry, "expires_at", None),
            "agent_key_expires_at": getattr(entry, "agent_key_expires_at", None),
-            "has_refresh_token": bool(getattr(entry, "refresh_token", None)),
-            "source": f"pool:{getattr(entry, 'label', 'unknown')}",
+            "has_refresh_token": bool(refresh_token),
+            "inference_credential_present": True,
+            "credential_source": f"pool:{label}",
+            "source": f"pool:{label}",
        }
    except Exception:
        return _empty_nous_auth_status()
@@ -5755,6 +5822,10 @@ def _compute_nous_auth_status() -> Dict[str, Any]:
            "agent_key_expires_at": state.get("agent_key_expires_at"),
            "has_refresh_token": bool(state.get("refresh_token")),
            "access_token": state.get("access_token"),
+            "inference_credential_present": bool(
+                state.get("access_token") or state.get("agent_key")
+            ),
+            "credential_source": "auth_store",
            "source": "auth_store",
        }
        try:
@@ -5772,6 +5843,8 @@ def _compute_nous_auth_status() -> Dict[str, Any]:
                    or refreshed_state.get("agent_key_expires_at")
                    or base_status.get("agent_key_expires_at"),
                    "has_refresh_token": bool(refreshed_state.get("refresh_token")),
+                    "inference_credential_present": True,
+                    "credential_source": "auth_store",
                    "source": f"runtime:{creds.get('source', 'portal')}",
                    "key_id": creds.get("key_id"),
                }
@@ -6283,6 +6356,7 @@ def _prompt_model_selection(
    pricing: Optional[Dict[str, Dict[str, str]]] = None,
    unavailable_models: Optional[List[str]] = None,
    portal_url: str = "",
+    unavailable_message: str = "",
 ) -> Optional[str]:
    """Interactive model selection. Puts current_model first with a marker. Returns chosen model ID or None.

@@ -6374,18 +6448,22 @@ def _prompt_model_selection(
        choices.append("  Enter custom model name")
        choices.append("  Skip (keep current)")

+        _upgrade_url = (portal_url or DEFAULT_NOUS_PORTAL_URL).rstrip("/")
+        unavailable_footer = unavailable_message.strip()
+        if not unavailable_footer and _unavailable:
+            unavailable_footer = f"Upgrade at {_upgrade_url} for paid models"
+
        # Print the unavailable block BEFORE the menu via regular print().
        # simple_term_menu pads title lines to terminal width (causes wrapping),
        # so we keep the title minimal and use stdout for the static block.
        # clear_screen=False means our printed output stays visible above.
-        _upgrade_url = (portal_url or DEFAULT_NOUS_PORTAL_URL).rstrip("/")
        if _unavailable:
            print(menu_title)
            print()
            for mid in _unavailable:
                print(f"{_DIM}     {_label(mid)}{_RESET}")
            print()
-            print(f"{_DIM}  ── Upgrade at {_upgrade_url} for paid models ──{_RESET}")
+            print(f"{_DIM}  ── {unavailable_footer} ──{_RESET}")
            print()
            effective_title = "Available free models:"
        else:
@@ -6427,8 +6505,11 @@ def _prompt_model_selection(

    if _unavailable:
        _upgrade_url = (portal_url or DEFAULT_NOUS_PORTAL_URL).rstrip("/")
+        unavailable_footer = unavailable_message.strip() or (
+            f"Unavailable models (requires paid tier — upgrade at {_upgrade_url})"
+        )
        print()
-        print(f"  {_DIM}── Unavailable models (requires paid tier — upgrade at {_upgrade_url}) ──{_RESET}")
+        print(f"  {_DIM}── {unavailable_footer} ──{_RESET}")
        for mid in _unavailable:
            print(f"  {'':>{num_width}}  {_DIM}{_label(mid)}{_RESET}")
    print()
@@ -6777,6 +6858,12 @@ def _xai_oauth_loopback_login(
    remote VM).  The same PKCE verifier, ``state``, and ``nonce`` are
    used for both paths so the upstream-side OAuth flow is identical.
    """
+    def _stdin_supports_manual_paste() -> bool:
+        try:
+            return bool(getattr(sys.stdin, "isatty", lambda: False)())
+        except Exception:
+            return False
+
    discovery = _xai_oauth_discovery(timeout_seconds)
    authorization_endpoint = discovery["authorization_endpoint"]
    token_endpoint = discovery["token_endpoint"]
@@ -6840,12 +6927,28 @@ def _xai_oauth_loopback_login(
                else:
                    print("Could not open the browser automatically; use the URL above.")

-            callback = _xai_wait_for_callback(
-                server,
-                thread,
-                callback_result,
-                timeout_seconds=max(30.0, timeout_seconds * 9),
-            )
+            try:
+                callback = _xai_wait_for_callback(
+                    server,
+                    thread,
+                    callback_result,
+                    timeout_seconds=max(30.0, timeout_seconds * 9),
+                )
+            except AuthError as exc:
+                if (
+                    getattr(exc, "code", "") != "xai_callback_timeout"
+                    or not _stdin_supports_manual_paste()
+                ):
+                    raise
+                print()
+                print("xAI loopback callback timed out.")
+                print("If your browser reached a failed 127.0.0.1 callback page,")
+                print("paste that FULL callback URL below to continue this login.")
+                print("You can also re-run with `--manual-paste` to skip the")
+                print("loopback listener from the start.")
+                callback = _prompt_manual_callback_paste(redirect_uri)
+                if callback.get("code") is None and callback.get("error") is None:
+                    raise exc
        except Exception:
            try:
                server.shutdown()
@@ -6865,7 +6968,21 @@ def _xai_oauth_loopback_login(
            provider="xai-oauth",
            code="xai_authorization_failed",
        )
-    if callback.get("state") != state:
+    callback_state = callback.get("state")
+    # Manual-paste bare-code path: when a user pastes only the opaque
+    # authorization code (no ``code=``/``state=`` query parameters),
+    # ``_parse_pasted_callback`` returns ``state=None``.  xAI's consent
+    # page renders the code in-page rather than redirecting through the
+    # 127.0.0.1 callback, so on many remote setups (Cloud Shell, headless
+    # VPS, container consoles) the bare code is the only thing the user
+    # can obtain.  PKCE (code_verifier) still binds the exchange to this
+    # client, so the local state-equality check is redundant on the
+    # bare-code path — we substitute the locally generated state to keep
+    # the rest of the validation chain (and the token exchange) unchanged.
+    # See #26923 (AccursedGalaxy comment, 2026-05-20).
+    if callback_state is None and manual_paste:
+        callback_state = state
+    if callback_state != state:
        raise AuthError(
            "xAI authorization failed: state mismatch.",
            provider="xai-oauth",
@@ -7626,8 +7743,9 @@ def _nous_device_code_login(
            portal_url = auth_state.get(
                "portal_base_url", DEFAULT_NOUS_PORTAL_URL
            ).rstrip("/")
+            message = format_auth_error(exc)
            print()
-            print("Your Nous Portal account does not have an active subscription.")
+            print(message)
            print(f"  Subscribe here: {portal_url}/billing")
            print()
            print("After subscribing, run `hermes model` again to finish setup.")
@@ -7737,11 +7855,30 @@ def _login_nous(args, pconfig: ProviderConfig) -> None:

            print()
            unavailable_models: list = []
+            unavailable_message = ""
            if model_ids:
                pricing = get_pricing_for_provider("nous")
-                free_tier = check_nous_free_tier()
+                # Force fresh account data for model selection so recent credit
+                # purchases are reflected immediately.
+                free_tier = check_nous_free_tier(force_fresh=True)
                _portal_for_recs = auth_state.get("portal_base_url", "")
                if free_tier:
+                    try:
+                        from hermes_cli.nous_account import (
+                            format_nous_portal_entitlement_message,
+                            get_nous_portal_account_info,
+                        )
+
+                        _account_info = get_nous_portal_account_info(force_fresh=True)
+                        unavailable_message = (
+                            format_nous_portal_entitlement_message(
+                                _account_info,
+                                capability="paid Nous models",
+                            )
+                            or ""
+                        )
+                    except Exception:
+                        unavailable_message = ""
                    # The Portal's freeRecommendedModels endpoint is the
                    # source of truth for what's free *right now*. Augment
                    # the curated list with anything new the Portal flags
@@ -7768,11 +7905,12 @@ def _login_nous(args, pconfig: ProviderConfig) -> None:
                    model_ids, pricing=pricing,
                    unavailable_models=unavailable_models,
                    portal_url=_portal,
+                    unavailable_message=unavailable_message,
                )
            elif unavailable_models:
                _url = (_portal or DEFAULT_NOUS_PORTAL_URL).rstrip("/")
                print("No free models currently available.")
-                print(f"Upgrade at {_url} to access paid models.")
+                print(unavailable_message or f"Upgrade at {_url} to access paid models.")
            else:
                print("No curated models available for Nous Portal.")
        except Exception as exc:
@@ -512,6 +512,7 @@ def _quick_snapshot_root(hermes_home: Optional[Path] = None) -> Path:
 def create_quick_snapshot(
    label: Optional[str] = None,
    hermes_home: Optional[Path] = None,
+    keep: Optional[int] = None,
 ) -> Optional[str]:
    """Create a quick state snapshot of critical files.

@@ -585,8 +586,10 @@ def create_quick_snapshot(
    with open(snap_dir / "manifest.json", "w", encoding="utf-8") as f:
        json.dump(meta, f, indent=2)

-    # Auto-prune
-    _prune_quick_snapshots(root, keep=_QUICK_DEFAULT_KEEP)
+    # Auto-prune. Defaults preserve historical manual /snapshot behavior; callers
+    # with known high-churn safety snapshots (for example pre-update) can pass a
+    # smaller keep value so large state.db copies do not accumulate indefinitely.
+    _prune_quick_snapshots(root, keep=_QUICK_DEFAULT_KEEP if keep is None else keep)

    logger.info("State snapshot created: %s (%d files)", snap_id, len(manifest))
    return snap_id
@@ -123,7 +123,7 @@ COMMAND_REGISTRY: list[CommandDef] = [
    CommandDef("config", "Show current configuration", "Configuration",
               cli_only=True),
    CommandDef("model", "Switch model for this session", "Configuration",
-               aliases=("provider",), args_hint="[model] [--provider name] [--global]"),
+               aliases=("provider",), args_hint="[model] [--provider name] [--global] [--refresh]"),
    CommandDef("codex-runtime", "Toggle codex app-server runtime for OpenAI/Codex models",
               "Configuration", aliases=("codex_runtime",),
               args_hint="[auto|codex_app_server]"),
@@ -1806,6 +1806,21 @@ DEFAULT_CONFIG = {
    # Gateway settings — control how messaging platforms (Telegram, Discord,
    # Slack, etc.) deliver agent-produced files as native attachments.
    "gateway": {
+        # When false (default), any file path the agent emits is delivered
+        # as a native attachment as long as it isn't under the credential /
+        # system-path denylist (/etc, /proc, ~/.ssh, ~/.aws, ~/.hermes/.env,
+        # auth.json, etc.). This matches the symmetry of inbound delivery
+        # — we accept any document type the user uploads, and the agent
+        # can hand back any file that isn't a credential.
+        #
+        # When true, fall back to the older allowlist+recency-window
+        # behavior: files must live under the Hermes cache, under
+        # ``media_delivery_allow_dirs``, or be freshly produced inside the
+        # ``trust_recent_files_seconds`` window. Recommended for
+        # public-facing gateways where prompt injection from one user
+        # shouldn't be able to exfiltrate the host's secrets to that same
+        # user. Bridged to HERMES_MEDIA_DELIVERY_STRICT.
+        "strict": False,
        # Extra directories from which model-emitted bare file paths may be
        # uploaded as native gateway attachments. Files inside the Hermes
        # cache (~/.hermes/cache/{documents,images,audio,video,screenshots})
@@ -1813,7 +1828,7 @@ DEFAULT_CONFIG = {
        # (project dirs, scratch dirs, mounted shares). Accepts a list of
        # absolute paths or a single os.pathsep-separated string. Bridged
        # to HERMES_MEDIA_ALLOW_DIRS at gateway startup. Tilde paths are
-        # expanded.
+        # expanded. Honored in both default and strict mode.
        "media_delivery_allow_dirs": [],
        # When true, files whose mtime is within ``trust_recent_files_seconds``
        # of "now" are trusted for native delivery even outside the cache /
@@ -1821,10 +1836,12 @@ DEFAULT_CONFIG = {
        # PDFs the agent writes into a working directory. System paths
        # (/etc, /proc, ~/.ssh, ~/.aws, etc.) remain blocked regardless.
        # Disable to fall back to pure-allowlist mode. Bridged to
-        # HERMES_MEDIA_TRUST_RECENT_FILES.
+        # HERMES_MEDIA_TRUST_RECENT_FILES. Only consulted when ``strict``
+        # is true; in default mode the denylist alone gates delivery.
        "trust_recent_files": True,
        # Recency window in seconds. 600 (10 min) comfortably covers a
        # multi-tool agent turn. Bridged to HERMES_MEDIA_TRUST_RECENT_SECONDS.
+        # Only consulted when ``strict`` is true.
        "trust_recent_files_seconds": 600,
    },

@@ -2505,10 +2522,10 @@ OPTIONAL_ENV_VARS = {
        "advanced": True,
    },
    "TAVILY_API_KEY": {
-        "description": "Tavily API key for AI-native web search, extract, and crawl",
+        "description": "Tavily API key for AI-native web search and extract",
        "prompt": "Tavily API key",
        "url": "https://app.tavily.com/home",
-        "tools": ["web_search", "web_extract", "web_crawl"],
+        "tools": ["web_search", "web_extract"],
        "password": True,
        "category": "tool",
    },
@@ -2992,8 +3009,8 @@ OPTIONAL_ENV_VARS = {
        "advanced": True,
    },
    "API_SERVER_KEY": {
-        "description": "Bearer token for API server authentication. Required for non-loopback binding; server refuses to start without it. On loopback (127.0.0.1), all requests are allowed if empty.",
-        "prompt": "API server auth key (required for network access)",
+        "description": "Bearer token for API server authentication. Required whenever the API server is enabled; server refuses to start without it.",
+        "prompt": "API server auth key",
        "url": None,
        "password": True,
        "category": "messaging",
@@ -3008,7 +3025,7 @@ OPTIONAL_ENV_VARS = {
        "advanced": True,
    },
    "API_SERVER_HOST": {
-        "description": "Host/bind address for the API server (default: 127.0.0.1). Use 0.0.0.0 for network access — server refuses to start without API_SERVER_KEY.",
+        "description": "Host/bind address for the API server (default: 127.0.0.1). API_SERVER_KEY is still required even on loopback binds.",
        "prompt": "API server host",
        "url": None,
        "password": False,
@@ -1014,12 +1014,70 @@ def start() -> None:
    _report_gateway_start(f"direct spawn (PID {pid})")


-def stop() -> None:
-    """Stop the gateway. Tries /End on the scheduled task, then kills any stragglers."""
-    _assert_windows()
-    from hermes_cli.gateway import kill_gateway_processes
+def _drain_gateway_pid(pid: int, drain_timeout: float) -> bool:
+    """Write the planned-stop marker and wait for the gateway PID to exit.

-    stopped_any = False
+    Windows cannot deliver POSIX signals to a Python asyncio loop
+    (``loop.add_signal_handler`` raises NotImplementedError), so writing
+    the marker is the ONLY way to ask a running gateway to drain
+    in-flight agents and persist ``resume_pending`` before exit. The
+    gateway's planned-stop watcher thread (gateway/run.py) polls for
+    the marker and drives the same shutdown path the SIGTERM handler
+    would have on POSIX.
+
+    Returns True if the PID exited within the timeout, False if it
+    didn't (caller should escalate to schtasks /End + taskkill).
+    """
+    if pid <= 0:
+        return False
+    try:
+        from gateway.status import write_planned_stop_marker, _pid_exists
+    except ImportError:
+        return False
+
+    try:
+        write_planned_stop_marker(pid)
+    except Exception:
+        # Best-effort: if the marker can't be written, we have no choice
+        # but to fall through to a hard kill.  Caller decides escalation.
+        pass
+
+    deadline = time.monotonic() + max(drain_timeout, 1.0)
+    while time.monotonic() < deadline:
+        if not _pid_exists(pid):
+            return True
+        time.sleep(0.5)
+    return False
+
+
+def stop() -> None:
+    """Stop the gateway.
+
+    Writes the planned-stop marker first so the gateway can drain
+    in-flight agents and persist ``resume_pending`` before exit (the
+    gateway's marker-watcher thread picks this up — Windows asyncio
+    can't deliver SIGTERM to the loop, so the marker is our only IPC).
+    Then escalates: ``schtasks /End`` (kills the scheduled-task tree)
+    + ``kill_gateway_processes(force=True)`` for any strays.
+    """
+    _assert_windows()
+    from hermes_cli.gateway import kill_gateway_processes, _get_restart_drain_timeout
+    from gateway.status import get_running_pid
+
+    # Phase 1: ask the running gateway (if any) to drain itself by writing
+    # the planned-stop marker, then wait briefly for it to exit cleanly.
+    # On clean exit, sessions land with resume_pending=True and the next
+    # boot will auto-resume them.
+    pid = get_running_pid()
+    drained = False
+    if pid is not None:
+        try:
+            drain_timeout = float(_get_restart_drain_timeout() or 30.0)
+        except Exception:
+            drain_timeout = 30.0
+        drained = _drain_gateway_pid(pid, drain_timeout)
+
+    stopped_any = drained
    if is_task_registered():
        code, _out, err = _exec_schtasks(["/End", "/TN", get_task_name()])
        # schtasks returns nonzero when the task isn't currently running — don't treat that as an error.
@@ -1028,12 +1086,19 @@ def stop() -> None:
        elif "not running" not in (err or "").lower():
            print(f"⚠ schtasks /End returned code {code}: {err.strip()}")

-    killed = kill_gateway_processes(all_profiles=False)
+    # Phase 3: hard-kill any strays.  When drain succeeded this is a no-op;
+    # when drain timed out this is the escalation that ensures the PID
+    # actually exits.  Use force=True on Windows so taskkill /T /F walks
+    # the descendant tree (browser helpers, etc.).
+    killed = kill_gateway_processes(all_profiles=False, force=not drained)
    if killed:
        stopped_any = True
        print(f"✓ Killed {killed} gateway process(es)")
    if stopped_any:
-        print("✓ Gateway stopped")
+        if drained:
+            print("✓ Gateway stopped (drained cleanly)")
+        else:
+            print("✓ Gateway stopped")
    else:
        print("✗ No gateway was running")

@@ -71,6 +71,7 @@ new locking.
 from __future__ import annotations

 import contextlib
+import hashlib
 import json
 import os
 import re
@@ -982,6 +983,89 @@ CREATE INDEX IF NOT EXISTS idx_notify_task           ON kanban_notify_subs(task_
 _INITIALIZED_PATHS: set[str] = set()
 _INIT_LOCK = threading.RLock()
 _SQLITE_HEADER = b"SQLite format 3\x00"
+DEFAULT_BUSY_TIMEOUT_MS = 120_000
+
+
+def _resolve_busy_timeout_ms() -> int:
+    """Return the SQLite busy timeout for Kanban connections.
+
+    Kanban is the shared cross-profile dispatch bus, so worker stampedes are
+    expected.  A long busy timeout lets SQLite serialize writers via WAL rather
+    than surfacing transient ``database is locked`` failures during bursts.
+    """
+    raw = os.environ.get("HERMES_KANBAN_BUSY_TIMEOUT_MS", "").strip()
+    if raw:
+        try:
+            parsed = int(raw)
+        except ValueError:
+            parsed = 0
+        if parsed > 0:
+            return parsed
+    return DEFAULT_BUSY_TIMEOUT_MS
+
+
+def _sqlite_connect(path: Path) -> sqlite3.Connection:
+    """Open a Kanban SQLite connection with consistent lock waiting."""
+    busy_timeout_ms = _resolve_busy_timeout_ms()
+    conn = sqlite3.connect(
+        str(path),
+        isolation_level=None,
+        timeout=busy_timeout_ms / 1000.0,
+    )
+    # ``sqlite3.connect(timeout=...)`` normally maps to busy_timeout, but set
+    # the PRAGMA explicitly so it is observable and survives future wrapper
+    # changes. Parameter binding is not supported for PRAGMA assignments.
+    conn.execute(f"PRAGMA busy_timeout={busy_timeout_ms}")
+    return conn
+
+
+@contextlib.contextmanager
+def _cross_process_init_lock(path: Path):
+    """Serialize first-connect WAL/schema/integrity setup across processes.
+
+    ``_INIT_LOCK`` only protects threads inside one Python process. During a
+    dispatcher burst, many worker processes can all hit a fresh/legacy board at
+    once and each process has an empty ``_INITIALIZED_PATHS`` cache. This file
+    lock keeps header validation, integrity probing, WAL activation, and
+    additive migrations single-file/single-writer across the whole host while
+    leaving normal post-init DB usage concurrent under SQLite WAL.
+    """
+    path.parent.mkdir(parents=True, exist_ok=True)
+    lock_path = path.with_name(path.name + ".init.lock")
+    handle = lock_path.open("a+b")
+    try:
+        if _IS_WINDOWS:
+            import msvcrt
+
+            # Lock a single byte in the sidecar file. ``msvcrt.locking`` starts
+            # at the current file position, so seek explicitly before both
+            # lock and unlock.  The file is opened in append/read binary mode so
+            # it always exists but the byte-range lock is the synchronization
+            # primitive; no payload needs to be written.
+            handle.seek(0)
+            locking = getattr(msvcrt, "locking")
+            lock_mode = getattr(msvcrt, "LK_LOCK")
+            locking(handle.fileno(), lock_mode, 1)
+        else:
+            import fcntl
+
+            fcntl.flock(handle.fileno(), fcntl.LOCK_EX)
+        yield
+    finally:
+        try:
+            if _IS_WINDOWS:
+                import msvcrt
+
+                handle.seek(0)
+                locking = getattr(msvcrt, "locking")
+                unlock_mode = getattr(msvcrt, "LK_UNLCK")
+                locking(handle.fileno(), unlock_mode, 1)
+            else:
+                import fcntl
+
+                fcntl.flock(handle.fileno(), fcntl.LOCK_UN)
+        finally:
+            handle.close()


 def _looks_like_tls_record_at(data: bytes, offset: int) -> bool:
@@ -1055,14 +1139,21 @@ class KanbanDbCorruptError(RuntimeError):


 def _backup_corrupt_db(path: Path) -> Optional[Path]:
-    """Copy a corrupt DB (and its WAL/SHM sidecars) to a timestamped backup.
+    """Copy a corrupt DB (and its WAL/SHM sidecars) to a content-addressed backup.
+
+    The backup filename is deterministic in the main DB's sha256, so repeated
+    quarantines of the same corrupt bytes (gateway restarts, dispatcher retries,
+    multi-profile fleets all hitting the same shared DB) reuse one backup
+    instead of amplifying disk usage by N. If the corrupt bytes actually
+    change between attempts — e.g. a partial repair or further damage — the
+    fingerprint changes and a separate backup is preserved.

    Returns the backup path of the main DB file, or ``None`` if the copy
    itself failed (the caller still raises loudly in that case).

-    Writes are confined to the original DB's parent directory. The
-    backup basename is derived purely from ``path.name``, never from
-    caller-supplied directory segments — no traversal is possible.
+    Writes are confined to the original DB's parent directory. The backup
+    basename is derived purely from ``path.name`` and a content hash, never
+    from caller-supplied directory segments — no traversal is possible.
    """
    # Resolve once and pin the parent so subsequent path operations cannot
    # escape it. ``Path.resolve()`` collapses any ``..`` segments and
@@ -1070,32 +1161,31 @@ def _backup_corrupt_db(path: Path) -> Optional[Path]:
    resolved = path.resolve()
    parent = resolved.parent
    base_name = resolved.name  # basename only
-    stamp = datetime.now().strftime("%Y%m%d_%H%M%S")
-    candidate = parent / f"{base_name}.corrupt.{stamp}.bak"
-    # Defensive: candidate must still be inside parent after construction.
-    # f-string interpolation of ``base_name`` cannot escape ``parent``
-    # because ``base_name`` is itself a resolved basename, but assert it
-    # anyway so static analyzers can see the containment guarantee.
-    if candidate.parent != parent:
-        return None
-    counter = 0
-    while candidate.exists():
-        counter += 1
-        candidate = parent / f"{base_name}.corrupt.{stamp}.{counter}.bak"
-        if candidate.parent != parent:
-            return None
+    digest = hashlib.sha256()
    try:
-        shutil.copy2(resolved, candidate)
+        with resolved.open("rb") as handle:
+            for chunk in iter(lambda: handle.read(1024 * 1024), b""):
+                digest.update(chunk)
    except OSError:
        return None
+    token = digest.hexdigest()[:16]
+    candidate = parent / f"{base_name}.corrupt.{token}.bak"
+    # Defensive: candidate must still be inside parent after construction.
+    if candidate.parent != parent:
+        return None
+    if not candidate.exists():
+        try:
+            shutil.copy2(resolved, candidate)
+        except OSError:
+            return None
    for suffix in ("-wal", "-shm"):
        sidecar = parent / (base_name + suffix)
        if sidecar.parent != parent or not sidecar.exists():
            continue
+        sidecar_backup = parent / (candidate.name + suffix)
+        if sidecar_backup.parent != parent or sidecar_backup.exists():
+            continue
        try:
-            sidecar_backup = parent / (candidate.name + suffix)
-            if sidecar_backup.parent != parent:
-                continue
            shutil.copy2(sidecar, sidecar_backup)
        except OSError:
            pass
@@ -1142,7 +1232,7 @@ def _guard_existing_db_is_healthy(path: Path) -> None:
        return
    reason: Optional[str] = None
    try:
-        probe = sqlite3.connect(str(resolved), timeout=5, isolation_level=None)
+        probe = _sqlite_connect(resolved)
        try:
            row = probe.execute("PRAGMA integrity_check").fetchone()
        finally:
@@ -1188,51 +1278,52 @@ def connect(
    else:
        path = kanban_db_path(board=board)
    path.parent.mkdir(parents=True, exist_ok=True)
-    # Cheap byte-level check first — catches the #29507 TLS-overwrite shape
-    # and other invalid-header cases without opening a sqlite connection.
-    _validate_sqlite_header(path)
-    # Full integrity probe — catches corruption past the header (malformed
-    # pages, broken internal metadata). Cached per-path after first success
-    # via _INITIALIZED_PATHS so it only runs once per process per path.
-    _guard_existing_db_is_healthy(path)
-    resolved = str(path.resolve())
-    conn = sqlite3.connect(str(path), isolation_level=None, timeout=30)
-    try:
-        conn.row_factory = sqlite3.Row
-        with _INIT_LOCK:
-            # WAL activation can take an exclusive lock while SQLite creates the
-            # sidecar files for a fresh database. Keep it in the same process-local
-            # critical section as schema initialization so concurrent gateway
-            # startup threads do not race before _INITIALIZED_PATHS is populated.
-            # WAL doesn't work on network filesystems (NFS/SMB/FUSE). Shared helper
-            # falls back to DELETE with one WARNING so kanban stays usable there.
-            # See hermes_state._WAL_INCOMPAT_MARKERS for detection logic.
-            from hermes_state import apply_wal_with_fallback
-            apply_wal_with_fallback(conn, db_label=f"kanban.db ({path.name})")
-            # FULL (was NORMAL): fsync before each checkpoint to narrow the
-            # crash window that can leave a b-tree page header torn.
-            conn.execute("PRAGMA synchronous=FULL")
-            conn.execute("PRAGMA wal_autocheckpoint=100")
-            conn.execute("PRAGMA foreign_keys=ON")
-            # Zero freed pages so a later torn write cannot expose stale
-            # cell content; persisted in the DB header for new DBs.
-            conn.execute("PRAGMA secure_delete=ON")
-            # Surface corrupt cells as read errors instead of silent
-            # wrong-data returns.
-            conn.execute("PRAGMA cell_size_check=ON")
-            needs_init = resolved not in _INITIALIZED_PATHS
-            if needs_init:
-                # Idempotent: runs CREATE TABLE IF NOT EXISTS + the additive
-                # migrations. Cached so subsequent connect() calls in the same
-                # process are cheap. The lock prevents same-process dispatcher
-                # threads from racing through the additive ALTER TABLE pass with
-                # stale PRAGMA snapshots during gateway startup.
-                conn.executescript(SCHEMA_SQL)
-                _migrate_add_optional_columns(conn)
-                _INITIALIZED_PATHS.add(resolved)
-    except Exception:
-        conn.close()
-        raise
+    with _cross_process_init_lock(path):
+        # Cheap byte-level check first — catches the #29507 TLS-overwrite shape
+        # and other invalid-header cases without opening a sqlite connection.
+        _validate_sqlite_header(path)
+        # Full integrity probe — catches corruption past the header (malformed
+        # pages, broken internal metadata). Cached per-path after first success
+        # via _INITIALIZED_PATHS so it only runs once per process per path.
+        _guard_existing_db_is_healthy(path)
+        resolved = str(path.resolve())
+        conn = _sqlite_connect(path)
+        try:
+            conn.row_factory = sqlite3.Row
+            with _INIT_LOCK:
+                # WAL activation can take an exclusive lock while SQLite creates the
+                # sidecar files for a fresh database. Keep it in the same process-local
+                # critical section as schema initialization so concurrent gateway
+                # startup threads do not race before _INITIALIZED_PATHS is populated.
+                # WAL doesn't work on network filesystems (NFS/SMB/FUSE). Shared helper
+                # falls back to DELETE with one WARNING so kanban stays usable there.
+                # See hermes_state._WAL_INCOMPAT_MARKERS for detection logic.
+                from hermes_state import apply_wal_with_fallback
+                apply_wal_with_fallback(conn, db_label=f"kanban.db ({path.name})")
+                # FULL (was NORMAL): fsync before each checkpoint to narrow the
+                # crash window that can leave a b-tree page header torn.
+                conn.execute("PRAGMA synchronous=FULL")
+                conn.execute("PRAGMA wal_autocheckpoint=100")
+                conn.execute("PRAGMA foreign_keys=ON")
+                # Zero freed pages so a later torn write cannot expose stale
+                # cell content; persisted in the DB header for new DBs.
+                conn.execute("PRAGMA secure_delete=ON")
+                # Surface corrupt cells as read errors instead of silent
+                # wrong-data returns.
+                conn.execute("PRAGMA cell_size_check=ON")
+                needs_init = resolved not in _INITIALIZED_PATHS
+                if needs_init:
+                    # Idempotent: runs CREATE TABLE IF NOT EXISTS + the additive
+                    # migrations. Cached so subsequent connect() calls in the same
+                    # process are cheap. The lock prevents same-process dispatcher
+                    # threads from racing through the additive ALTER TABLE pass with
+                    # stale PRAGMA snapshots during gateway startup.
+                    conn.executescript(SCHEMA_SQL)
+                    _migrate_add_optional_columns(conn)
+                    _INITIALIZED_PATHS.add(resolved)
+        except Exception:
+            conn.close()
+            raise
    return conn


@@ -4299,21 +4390,20 @@ def reap_worker_zombies() -> "list[int]":
    Returns the list of reaped PIDs. Safe to call when there are no
    children (returns []). No-op on Windows.
    """
-    if os.name == "nt":
-        return []
    reaped: "list[int]" = []
-    try:
-        while True:
-            try:
-                pid, status = os.waitpid(-1, os.WNOHANG)
-            except ChildProcessError:
-                break
-            if pid == 0:
-                break
-            _record_worker_exit(pid, status)
-            reaped.append(pid)
-    except Exception:
-        pass
+    if os.name != "nt":
+        try:
+            while True:
+                try:
+                    pid, status = os.waitpid(-1, os.WNOHANG)
+                except ChildProcessError:
+                    break
+                if pid == 0:
+                    break
+                _record_worker_exit(pid, status)
+                reaped.append(pid)
+        except Exception:
+            pass
    return reaped


@@ -2117,6 +2117,13 @@ def cmd_postinstall(args):
 def cmd_model(args):
    """Select default model — starts with provider selection, then model picker."""
    _require_tty("model")
+    if getattr(args, "refresh", False):
+        try:
+            from hermes_cli.models import clear_provider_models_cache
+            clear_provider_models_cache()
+            print("  Cleared model picker cache.")
+        except Exception:
+            pass
    select_provider_and_model(args=args)


@@ -2997,6 +3004,7 @@ def _model_flow_nous(config, current_model="", args=None):
    """Nous Portal provider: ensure logged in, then pick model."""
    from hermes_cli.auth import (
        get_provider_auth_state,
+        NOUS_INFERENCE_AUTH_MODE_LEGACY,
        _prompt_model_selection,
        _save_model_choice,
        _update_config_for_provider,
@@ -3092,8 +3100,21 @@ def _model_flow_nous(config, current_model="", args=None):
    # Fetch live pricing (non-blocking — returns empty dict on failure)
    pricing = get_pricing_for_provider("nous")

-    # Check if user is on free tier
-    free_tier = check_nous_free_tier()
+    # Force fresh account data for model selection so recent credit purchases
+    # are reflected immediately.
+    free_tier = check_nous_free_tier(force_fresh=True)
+    if not free_tier:
+        try:
+            refreshed_creds = resolve_nous_runtime_credentials(
+                min_key_ttl_seconds=5 * 60,
+                inference_auth_mode=NOUS_INFERENCE_AUTH_MODE_LEGACY,
+            )
+            if refreshed_creds:
+                creds = refreshed_creds
+        except Exception:
+            # Runtime inference has its own paid-entitlement recovery path; do
+            # not block model selection if this opportunistic remint fails.
+            pass

    # Resolve portal URL early — needed both for upgrade links and for the
    # freeRecommendedModels endpoint below.
@@ -3115,7 +3136,24 @@ def _model_flow_nous(config, current_model="", args=None):
    # newly-launched paid models surface in the picker too — independent
    # of CLI release cadence.
    unavailable_models: list[str] = []
+    unavailable_message = ""
    if free_tier:
+        try:
+            from hermes_cli.nous_account import (
+                format_nous_portal_entitlement_message,
+                get_nous_portal_account_info,
+            )
+
+            _account_info = get_nous_portal_account_info(force_fresh=True)
+            unavailable_message = (
+                format_nous_portal_entitlement_message(
+                    _account_info,
+                    capability="paid Nous models",
+                )
+                or ""
+            )
+        except Exception:
+            unavailable_message = ""
        model_ids, pricing = union_with_portal_free_recommendations(
            model_ids, pricing, _nous_portal_url,
        )
@@ -3137,7 +3175,7 @@ def _model_flow_nous(config, current_model="", args=None):
            from hermes_cli.auth import DEFAULT_NOUS_PORTAL_URL

            _url = (_nous_portal_url or DEFAULT_NOUS_PORTAL_URL).rstrip("/")
-            print(f"Upgrade at {_url} to access paid models.")
+            print(unavailable_message or f"Upgrade at {_url} to access paid models.")
        return

    print(
@@ -3150,6 +3188,7 @@ def _model_flow_nous(config, current_model="", args=None):
        pricing=pricing,
        unavailable_models=unavailable_models,
        portal_url=_nous_portal_url,
+        unavailable_message=unavailable_message,
    )
    if selected:
        _save_model_choice(selected)
@@ -6477,6 +6516,104 @@ def _web_ui_build_needed(web_dir: Path) -> bool:
    return False


+def _run_with_idle_timeout(
+    cmd: list[str],
+    cwd: Path,
+    *,
+    idle_timeout_seconds: int = 180,
+    indent: str = "    ",
+) -> subprocess.CompletedProcess:
+    """Run a subprocess that streams output, with an idle-output timeout.
+
+    Issue #33788: ``npm run build`` (Vite) was invoked with
+    ``capture_output=True`` and no timeout. On low-memory hosts (notably
+    WSL2 with the default 4 GB cap) the build can stall or sit silent for
+    minutes; users see a frozen terminal, assume the update is hung, and
+    reboot — leaving the editable install in a half-state with the
+    ``hermes`` launcher present but ``hermes_cli`` not importable.
+
+    This helper fixes both halves: stdout is streamed (so the user sees
+    progress), and if no bytes have appeared on stdout/stderr for
+    ``idle_timeout_seconds``, the process is terminated and the call
+    returns with a non-zero ``returncode``. The caller's existing
+    stale-dist fallback (#23817) takes over from there.
+
+    Returns a ``CompletedProcess`` with merged stdout (text), empty
+    stderr, and an integer returncode. Never raises on idle timeout —
+    propagation of failure is via the returncode.
+    """
+    merged_chunks: list[str] = []
+    last_output_ts = _time.monotonic()
+    lock = threading.Lock()
+
+    try:
+        proc = subprocess.Popen(
+            cmd,
+            cwd=cwd,
+            stdout=subprocess.PIPE,
+            stderr=subprocess.STDOUT,
+            text=True,
+            encoding="utf-8",
+            errors="replace",
+            bufsize=1,
+        )
+    except OSError as exc:
+        # E.g. npm not on PATH between the which() check and now.
+        return subprocess.CompletedProcess(cmd, 127, stdout="", stderr=str(exc))
+
+    def _reader() -> None:
+        nonlocal last_output_ts
+        assert proc.stdout is not None
+        for line in proc.stdout:
+            try:
+                print(f"{indent}{line.rstrip()}", flush=True)
+            except UnicodeEncodeError:
+                # Windows cp1252 fallback — same pattern as _say().
+                enc = getattr(sys.stdout, "encoding", None) or "ascii"
+                safe = line.rstrip().encode(enc, errors="replace").decode(enc, errors="replace")
+                print(f"{indent}{safe}", flush=True)
+            with lock:
+                merged_chunks.append(line)
+                last_output_ts = _time.monotonic()
+
+    reader_thread = threading.Thread(target=_reader, daemon=True)
+    reader_thread.start()
+
+    idle_killed = False
+    while True:
+        try:
+            rc = proc.wait(timeout=5)
+            break
+        except subprocess.TimeoutExpired:
+            with lock:
+                idle = _time.monotonic() - last_output_ts
+            if idle > idle_timeout_seconds:
+                idle_killed = True
+                proc.terminate()
+                try:
+                    rc = proc.wait(timeout=3)
+                except subprocess.TimeoutExpired:
+                    proc.kill()
+                    rc = proc.wait()
+                break
+
+    # Drain reader so we don't leak the stdout file descriptor.
+    reader_thread.join(timeout=2)
+
+    combined = "".join(merged_chunks)
+    if idle_killed:
+        msg = (
+            f"\n  ⚠ Build produced no output for {idle_timeout_seconds}s — terminated.\n"
+            "    Common causes: out-of-memory on a low-RAM host (WSL/container),\n"
+            "    a stuck Node process, or an antivirus scan stalling I/O.\n"
+        )
+        combined += msg
+        # Force a non-zero rc even if terminate() raced with a clean exit.
+        if rc == 0:
+            rc = 124  # GNU `timeout` convention
+    return subprocess.CompletedProcess(cmd, rc, stdout=combined, stderr="")
+
+
 def _run_npm_install_deterministic(
    npm: str,
    cwd: Path,
@@ -6582,31 +6719,26 @@ def _build_web_ui(web_dir: Path, *, fatal: bool = False) -> bool:
        if fatal:
            _say("  Run manually:  cd web && npm install && npm run build")
        return False
-    # First attempt
-    r2 = subprocess.run(
-        [npm, "run", "build"],
-        cwd=web_dir,
-        capture_output=True,
-        text=True,
-        encoding="utf-8",
-        errors="replace",
-    )
+    # First attempt — stream output via idle-timeout helper (issue #33788).
+    # capture_output=True on a long Vite build looks identical to a hang;
+    # users react by rebooting, which leaves the editable install in a
+    # half-state. Streaming + idle-kill makes failures observable AND
+    # recoverable (the stale-dist fallback below handles the kill path).
+    r2 = _run_with_idle_timeout([npm, "run", "build"], cwd=web_dir)
    if r2.returncode != 0:
        # Retry once after a short delay — covers boot-time races on Windows
        # (antivirus scanning Node.js binaries, npm cache not ready, transient
        # I/O when launched via Scheduled Task at logon). See issue #23817.
        _time.sleep(3)
-        r2 = subprocess.run(
-            [npm, "run", "build"],
-            cwd=web_dir,
-            capture_output=True,
-            text=True,
-            encoding="utf-8",
-            errors="replace",
-        )
+        r2 = _run_with_idle_timeout([npm, "run", "build"], cwd=web_dir)

    if r2.returncode != 0:
-        stderr_preview = (r2.stderr or "").strip()
+        # _run_with_idle_timeout merges stderr into stdout; older callers
+        # using subprocess.run kept them split. Pull from whichever has
+        # content so the error surfaces regardless of which path produced
+        # the CompletedProcess.
+        build_output = (r2.stderr or "") + (r2.stdout or "")
+        stderr_preview = build_output.strip()
        stderr_tail = "\n  ".join(stderr_preview.splitlines()[-10:]) if stderr_preview else ""
        dist_dir = web_dir.parent / "hermes_cli" / "web_dist"
        dist_index = dist_dir / "index.html"
@@ -7097,6 +7229,11 @@ def _update_via_zip(args):
        _install_python_dependencies_with_optional_fallback(pip_cmd)

    _update_node_dependencies()
+    # Core (Python deps + git pull / ZIP extract) is now complete; the CLI
+    # is functional from this point onward. The web UI build below is
+    # optional — a failure here only affects ``hermes dashboard``. Make
+    # that visible so users don't panic and reboot mid-build (#33788).
+    print("→ Core update complete. Building dashboard (optional)...")
    _build_web_ui(PROJECT_ROOT / "web")

    # Sync skills
@@ -8125,37 +8262,18 @@ def _install_psutil_android_compat(
    nothing is persisted in the repository.

    Stopgap: remove this once https://github.com/giampaolo/psutil/pull/2762
-    merges and ships in a release. ``scripts/install_psutil_android.py``
-    contains the same logic for ``scripts/install.sh`` (fresh installs).
-    Both copies should be removed together.
+    merges and ships in a release. The standalone installer script uses the
+    same shared helper and should be removed together.
    """
-    import tarfile
    import tempfile
    import urllib.request
-
-    psutil_url = (
-        "https://files.pythonhosted.org/packages/aa/c6/"
-        "d1ddf4abb55e93cebc4f2ed8b5d6dbad109ecb8d63748dd2b20ab5e57ebe/"
-        "psutil-7.2.2.tar.gz"
-    )
+    from hermes_cli.psutil_android import PSUTIL_URL, prepare_patched_psutil_sdist

    with tempfile.TemporaryDirectory() as tmp:
        tmp_path = Path(tmp)
        archive = tmp_path / "psutil.tar.gz"
-        urllib.request.urlretrieve(psutil_url, archive)
-        with tarfile.open(archive) as tar:
-            tar.extractall(tmp_path)
-
-        src_root = next(
-            p for p in tmp_path.iterdir() if p.is_dir() and p.name.startswith("psutil-")
-        )
-        common_py = src_root / "psutil" / "_common.py"
-        content = common_py.read_text(encoding="utf-8")
-        marker = 'LINUX = sys.platform.startswith("linux")'
-        replacement = 'LINUX = sys.platform.startswith(("linux", "android"))'
-        if marker not in content:
-            raise RuntimeError("psutil Android compatibility patch marker not found")
-        common_py.write_text(content.replace(marker, replacement), encoding="utf-8")
+        urllib.request.urlretrieve(PSUTIL_URL, archive)
+        src_root = prepare_patched_psutil_sdist(archive, tmp_path)

        _run_install_with_heartbeat(
            install_cmd_prefix + ["install", "--no-build-isolation", str(src_root)],
@@ -9020,7 +9138,7 @@ def _cmd_update_impl(args, gateway_mode: bool):
        try:
            from hermes_cli.backup import create_quick_snapshot

-            snap_id = create_quick_snapshot(label="pre-update")
+            snap_id = create_quick_snapshot(label="pre-update", keep=1)
            if snap_id:
                print(f"  ✓ Pre-update snapshot: {snap_id}")
        except Exception as exc:
@@ -9190,6 +9308,10 @@ def _cmd_update_impl(args, gateway_mode: bool):
        _refresh_active_lazy_features()

        _update_node_dependencies()
+        # See note above (ZIP path): core is now complete, web UI build is
+        # optional from a CLI perspective. Telegraphing this avoids the
+        # "stuck at webui-build → reboot → broken install" trap (#33788).
+        print("→ Core update complete. Building dashboard (optional)...")
        _build_web_ui(PROJECT_ROOT / "web")

        print()
@@ -11196,6 +11318,11 @@ def main():
        help="Select default model and provider",
        description="Interactively select your inference provider and default model",
    )
+    model_parser.add_argument(
+        "--refresh",
+        action="store_true",
+        help="Wipe the model picker disk cache and re-fetch every provider's live /v1/models list.",
+    )
    model_parser.add_argument(
        "--portal-url",
        help="Portal base URL for Nous login (default: production portal)",
@@ -12541,6 +12668,11 @@ Examples:
        ],
    )
    skills_search.add_argument("--limit", type=int, default=10, help="Max results")
+    skills_search.add_argument(
+        "--json",
+        action="store_true",
+        help="Output JSON instead of a table (full identifiers, scripting-friendly)",
+    )

    skills_install = skills_subparsers.add_parser("install", help="Install a skill")
    skills_install.add_argument(
@@ -13259,6 +13391,11 @@ Examples:
        "--yes", "-y", action="store_true", help="Skip confirmation"
    )

+    sessions_subparsers.add_parser(
+        "optimize",
+        help="Reclaim disk space: merge FTS5 segments + VACUUM (no data change)",
+    )
+
    sessions_subparsers.add_parser("stats", help="Show session store statistics")

    sessions_rename = sessions_subparsers.add_parser(
@@ -13431,6 +13568,39 @@ Examples:
            relaunch(["--resume", selected_id])
            return  # won't reach here after execvp

+        elif action == "optimize":
+            db_path = db.db_path
+            before_mb = (
+                os.path.getsize(db_path) / (1024 * 1024)
+                if db_path.exists()
+                else 0.0
+            )
+            print("Optimizing session store (FTS merge + VACUUM)…")
+            try:
+                # vacuum() merges FTS5 segments (optimize_fts) then VACUUMs.
+                # Probe the index count first for the summary line.
+                n = sum(
+                    1
+                    for t in db._FTS_TABLES
+                    if db._fts_table_exists(t)
+                )
+                db.vacuum()
+            except Exception as e:
+                print(f"Error: optimization failed: {e}")
+                db.close()
+                return
+            after_mb = (
+                os.path.getsize(db_path) / (1024 * 1024)
+                if db_path.exists()
+                else 0.0
+            )
+            saved = before_mb - after_mb
+            print(f"Optimized {n} FTS index(es).")
+            print(
+                f"Database size: {before_mb:.1f} MB -> {after_mb:.1f} MB "
+                f"(reclaimed {saved:.1f} MB)"
+            )
+
        elif action == "stats":
            total = db.session_count()
            msgs = db.message_count()
@@ -294,32 +294,39 @@ class CustomAutoResult:
 # Flag parsing
 # ---------------------------------------------------------------------------

-def parse_model_flags(raw_args: str) -> tuple[str, str, bool]:
-    """Parse --provider and --global flags from /model command args.
+def parse_model_flags(raw_args: str) -> tuple[str, str, bool, bool]:
+    """Parse --provider, --global, and --refresh flags from /model command args.

-    Returns (model_input, explicit_provider, is_global).
+    Returns (model_input, explicit_provider, is_global, force_refresh).

    Examples::

-        "sonnet"                         -> ("sonnet", "", False)
-        "sonnet --global"                -> ("sonnet", "", True)
-        "sonnet --provider anthropic"    -> ("sonnet", "anthropic", False)
-        "--provider my-ollama"           -> ("", "my-ollama", False)
-        "sonnet --provider anthropic --global" -> ("sonnet", "anthropic", True)
+        "sonnet"                         -> ("sonnet", "", False, False)
+        "sonnet --global"                -> ("sonnet", "", True, False)
+        "sonnet --provider anthropic"    -> ("sonnet", "anthropic", False, False)
+        "--provider my-ollama"           -> ("", "my-ollama", False, False)
+        "--refresh"                      -> ("", "", False, True)
+        "sonnet --provider anthropic --global" -> ("sonnet", "anthropic", True, False)
    """
    is_global = False
    explicit_provider = ""
+    force_refresh = False

    # Normalize Unicode dashes (Telegram/iOS auto-converts -- to em/en dash)
    # A single Unicode dash before a flag keyword becomes "--"
    import re as _re
-    raw_args = _re.sub(r'[\u2012\u2013\u2014\u2015](provider|global)', r'--\1', raw_args)
+    raw_args = _re.sub(r'[\u2012\u2013\u2014\u2015](provider|global|refresh)', r'--\1', raw_args)

    # Extract --global
    if "--global" in raw_args:
        is_global = True
        raw_args = raw_args.replace("--global", "").strip()

+    # Extract --refresh (bust the model picker disk cache before listing)
+    if "--refresh" in raw_args:
+        force_refresh = True
+        raw_args = raw_args.replace("--refresh", "").strip()
+
    # Extract --provider <name>
    parts = raw_args.split()
    i = 0
@@ -333,7 +340,7 @@ def parse_model_flags(raw_args: str) -> tuple[str, str, bool]:
            i += 1

    model_input = " ".join(filtered).strip()
-    return (model_input, explicit_provider, is_global)
+    return (model_input, explicit_provider, is_global, force_refresh)


 # ---------------------------------------------------------------------------
@@ -1079,6 +1086,7 @@ def list_authenticated_providers(
    from hermes_cli.models import (
        OPENROUTER_MODELS, _PROVIDER_MODELS,
        _MODELS_DEV_PREFERRED, _merge_with_models_dev, provider_model_ids,
+        cached_provider_model_ids,
        get_curated_nous_model_ids,
    )

@@ -1239,13 +1247,15 @@ def list_authenticated_providers(
        if not has_creds:
            continue

-        # Use curated list, falling back to models.dev if no curated list.
-        # For preferred providers, merge models.dev entries into the curated
-        # catalog so newly released models (e.g. mimo-v2.5-pro on opencode-go)
-        # show up in the picker without requiring a Hermes release.
-        model_ids = curated.get(hermes_id, [])
-        if hermes_id in _MODELS_DEV_PREFERRED:
-            model_ids = _merge_with_models_dev(hermes_id, model_ids)
+        # Unified pathway: route through cached_provider_model_ids() so the
+        # /model picker sees the SAME list `hermes model` would build, with
+        # disk caching to keep the picker open snappy. Falls back to the
+        # curated static list when the live fetcher returns nothing.
+        model_ids = cached_provider_model_ids(hermes_id)
+        if not model_ids:
+            model_ids = curated.get(hermes_id, [])
+            if hermes_id in _MODELS_DEV_PREFERRED:
+                model_ids = _merge_with_models_dev(hermes_id, model_ids)
        total = len(model_ids)
        top = model_ids[:max_models]

@@ -1351,25 +1361,27 @@ def list_authenticated_providers(
            # matches what the user's authenticated Codex/Copilot backend
            # actually serves — including ChatGPT-Pro-only Codex slugs
            # (e.g. gpt-5.3-codex-spark) that aren't in the static curated
-            # catalog. ``provider_model_ids()`` falls back to the curated
-            # list when the live endpoint is unreachable, so this is safe
-            # for unauthenticated and offline cases too.
-            model_ids = provider_model_ids(hermes_slug)
+            # catalog. ``cached_provider_model_ids()`` falls back to the
+            # curated list when the live endpoint is unreachable, so this
+            # is safe for unauthenticated and offline cases too.
+            model_ids = cached_provider_model_ids(hermes_slug)
        # For aws_sdk providers (bedrock), use live discovery so the list
        # reflects the active region (eu.*, ap.*) not the static us.* list.
        elif overlay.auth_type == "aws_sdk":
            try:
-                from agent.bedrock_adapter import bedrock_model_ids_or_none
-                _ids = bedrock_model_ids_or_none()
-                model_ids = _ids if _ids is not None else (curated.get(hermes_slug, []) or curated.get(pid, []))
+                _ids = cached_provider_model_ids(hermes_slug)
+                model_ids = _ids if _ids else (curated.get(hermes_slug, []) or curated.get(pid, []))
            except Exception:
                model_ids = curated.get(hermes_slug, []) or curated.get(pid, [])
        else:
-            # Use curated list — look up by Hermes slug, fall back to overlay key
-            model_ids = curated.get(hermes_slug, []) or curated.get(pid, [])
-            # Merge with models.dev for preferred providers (same rationale as above).
-            if hermes_slug in _MODELS_DEV_PREFERRED:
-                model_ids = _merge_with_models_dev(hermes_slug, model_ids)
+            # Unified pathway — see Section 1 rationale. Fall back to the
+            # curated dict (with models.dev merge for preferred providers)
+            # when the live fetcher comes up empty.
+            model_ids = cached_provider_model_ids(hermes_slug)
+            if not model_ids:
+                model_ids = curated.get(hermes_slug, []) or curated.get(pid, [])
+                if hermes_slug in _MODELS_DEV_PREFERRED:
+                    model_ids = _merge_with_models_dev(hermes_slug, model_ids)
        total = len(model_ids)
        top = model_ids[:max_models]

@@ -1436,13 +1448,15 @@ def list_authenticated_providers(
        # region (eu.*, us.*, ap.*) instead of the hardcoded us.* static list.
        if _cp_config and getattr(_cp_config, "auth_type", "") == "aws_sdk":
            try:
-                from agent.bedrock_adapter import bedrock_model_ids_or_none
-                _ids = bedrock_model_ids_or_none()
-                _cp_model_ids = _ids if _ids is not None else curated.get(_cp.slug, [])
+                _ids = cached_provider_model_ids(_cp.slug)
+                _cp_model_ids = _ids if _ids else curated.get(_cp.slug, [])
            except Exception:
                _cp_model_ids = curated.get(_cp.slug, [])
        else:
-            _cp_model_ids = curated.get(_cp.slug, [])
+            # Unified pathway — same as sections 1 and 2.
+            _cp_model_ids = cached_provider_model_ids(_cp.slug)
+            if not _cp_model_ids:
+                _cp_model_ids = curated.get(_cp.slug, [])
        _cp_total = len(_cp_model_ids)
        _cp_top = _cp_model_ids[:max_models]

@@ -32,6 +32,8 @@ COPILOT_REASONING_EFFORTS_O_SERIES = ["low", "medium", "high"]
 # Fallback OpenRouter snapshot used when the live catalog is unavailable.
 # (model_id, display description shown in menus)
 OPENROUTER_MODELS: list[tuple[str, str]] = [
+    ("anthropic/claude-opus-4.8",              ""),
+    ("anthropic/claude-opus-4.8-fast",         "2x price, higher output speed"),
    ("anthropic/claude-opus-4.7",              ""),
    ("anthropic/claude-opus-4.6",              ""),
    ("anthropic/claude-sonnet-4.6",            ""),
@@ -139,6 +141,7 @@ def _xai_curated_models() -> list[str]:

 _PROVIDER_MODELS: dict[str, list[str]] = {
    "nous": [
+        "anthropic/claude-opus-4.8",
        "anthropic/claude-opus-4.7",
        "anthropic/claude-opus-4.6",
        "anthropic/claude-sonnet-4.6",
@@ -290,6 +293,7 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
        "MiniMax-M2",
    ],
    "anthropic": [
+        "claude-opus-4-8",
        "claude-opus-4-7",
        "claude-opus-4-6",
        "claude-sonnet-4-6",
@@ -518,9 +522,19 @@ def fetch_nous_account_tier(access_token: str, portal_base_url: str = "") -> dic
 def is_nous_free_tier(account_info: dict[str, Any]) -> bool:
    """Return True if the account info indicates a free (unpaid) tier.

-    Checks ``subscription.monthly_charge == 0``.  Returns False when
-    the field is missing or unparseable (assumes paid — don't block users).
+    Prefer the Portal's explicit ``paid_service_access.allowed`` entitlement
+    decision.  Legacy payloads fall back to ``subscription.monthly_charge == 0``.
+    Returns False when both signals are missing or unparseable.
    """
+    paid_access = account_info.get("paid_service_access")
+    if isinstance(paid_access, dict):
+        allowed = paid_access.get("allowed")
+        if isinstance(allowed, bool):
+            return not allowed
+        paid = paid_access.get("paid_access")
+        if isinstance(paid, bool):
+            return not paid
+
    sub = account_info.get("subscription")
    if not isinstance(sub, dict):
        return False
@@ -699,40 +713,28 @@ _FREE_TIER_CACHE_TTL: int = 180  # seconds (3 minutes)
 _free_tier_cache: tuple[bool, float] | None = None  # (result, timestamp)


-def check_nous_free_tier() -> bool:
+def check_nous_free_tier(*, force_fresh: bool = False) -> bool:
    """Check if the current Nous Portal user is on a free (unpaid) tier.

    Results are cached for ``_FREE_TIER_CACHE_TTL`` seconds to avoid
    hitting the Portal API on every call.  The cache is short-lived so
    that an account upgrade is reflected within a few minutes.

-    Returns False (assume paid) on any error — never blocks paying users.
+    Returns True only when entitlement is known to be free.  Unknown/error
+    states return False so this compatibility wrapper does not block users.
    """
    global _free_tier_cache
    now = time.monotonic()
-    if _free_tier_cache is not None:
+    if not force_fresh and _free_tier_cache is not None:
        cached_result, cached_at = _free_tier_cache
        if now - cached_at < _FREE_TIER_CACHE_TTL:
            return cached_result

    try:
-        from hermes_cli.auth import get_provider_auth_state, resolve_nous_runtime_credentials
+        from hermes_cli.nous_account import get_nous_portal_account_info

-        # Ensure we have a fresh token (triggers refresh if needed)
-        resolve_nous_runtime_credentials(min_key_ttl_seconds=60)
-
-        state = get_provider_auth_state("nous")
-        if not state:
-            _free_tier_cache = (False, now)
-            return False
-        access_token = state.get("access_token", "")
-        portal_url = state.get("portal_base_url", "")
-        if not access_token:
-            _free_tier_cache = (False, now)
-            return False
-
-        account_info = fetch_nous_account_tier(access_token, portal_url)
-        result = is_nous_free_tier(account_info)
+        account_info = get_nous_portal_account_info(force_fresh=force_fresh)
+        result = account_info.is_free_tier
        _free_tier_cache = (result, now)
        return result
    except Exception:
@@ -2045,6 +2047,12 @@ def provider_model_ids(provider: Optional[str], *, force_refresh: bool = False)
                    return live
        except Exception:
            pass
+        # Live failed (or no creds). Fall back to the docs-hosted manifest
+        # — NOT the in-repo _PROVIDER_MODELS["nous"] snapshot — so newly
+        # added Portal models still surface without a Hermes release.
+        manifest_ids = get_curated_nous_model_ids()
+        if manifest_ids:
+            return manifest_ids
    if normalized == "stepfun":
        try:
            from hermes_cli.auth import resolve_api_key_provider_credentials
@@ -2148,6 +2156,206 @@ def provider_model_ids(provider: Optional[str], *, force_refresh: bool = False)
    return curated_static


+# ---------------------------------------------------------------------------
+# Generic disk cache for provider_model_ids() — keeps /model picker fast.
+# ---------------------------------------------------------------------------
+#
+# Without this layer, every /model picker open re-fetches every authed
+# provider's /v1/models endpoint. On a well-configured user (anthropic +
+# openai + copilot + gemini + huggingface + ...) that's 2+ seconds of cold
+# HTTP roundtrips just to render the provider list.
+#
+# Cache strategy:
+#   - One JSON file at $HERMES_HOME/provider_models_cache.json
+#   - Per-provider entries keyed by (provider, credential fingerprint)
+#   - Credential fingerprint = sha256 of env-var values that the provider
+#     normally reads. Swap your OPENAI_API_KEY and the entry invalidates.
+#   - 1h TTL by default. `force_refresh=True` skips the cache entirely
+#     and overwrites it on success.
+#   - Only NON-EMPTY results are cached. An empty/None response from a
+#     transient network error never gets pinned.
+#   - Cache file is best-effort. Any read/write error degrades silently
+#     to a live fetch — the picker keeps working.
+
+_PROVIDER_MODELS_CACHE_TTL = 3600  # 1h
+
+
+def _provider_models_cache_path() -> Path:
+    from hermes_constants import get_hermes_home
+    return get_hermes_home() / "provider_models_cache.json"
+
+
+def _credential_fingerprint(provider: str) -> str:
+    """Return a short hash representing the credentials that
+    ``provider_model_ids(provider)`` would see right now.
+
+    Rotating any of the relevant env vars invalidates the cached entry
+    for that provider. We hash AT LEAST the api-key + base-url env vars
+    declared in ``PROVIDER_REGISTRY``. For OAuth-backed providers
+    (codex, copilot, anthropic-via-claude-code, nous portal), the
+    relevant tokens live in ``$HERMES_HOME/auth.json`` and external
+    credential files. Rather than parse every shape, we additionally
+    fold the mtime of those files into the fingerprint so refreshes
+    after re-auth bust the cache.
+    """
+    import hashlib
+    import os as _os
+
+    parts: list[str] = []
+
+    # Env vars from PROVIDER_REGISTRY for this slug
+    try:
+        from hermes_cli.auth import PROVIDER_REGISTRY
+        pcfg = PROVIDER_REGISTRY.get(provider)
+        if pcfg is not None:
+            for ev in getattr(pcfg, "api_key_env_vars", ()) or ():
+                parts.append(f"{ev}={_os.environ.get(ev, '')}")
+            bev = getattr(pcfg, "base_url_env_var", "") or ""
+            if bev:
+                parts.append(f"{bev}={_os.environ.get(bev, '')}")
+    except Exception:
+        pass
+
+    # OAuth / external-file mtimes that change on re-auth
+    try:
+        from hermes_constants import get_hermes_home
+        for rel in ("auth.json", "credentials.json"):
+            p = get_hermes_home() / rel
+            try:
+                parts.append(f"{rel}@{p.stat().st_mtime_ns}")
+            except FileNotFoundError:
+                parts.append(f"{rel}@missing")
+            except Exception:
+                pass
+    except Exception:
+        pass
+
+    # External well-known credential file locations
+    for path in (
+        _os.path.expanduser("~/.codex/auth.json"),
+        _os.path.expanduser("~/.claude/.credentials.json"),
+        _os.path.expanduser("~/.config/github-copilot/hosts.json"),
+        _os.path.expanduser("~/.minimax/credentials.json"),
+    ):
+        try:
+            mt = _os.stat(path).st_mtime_ns
+            parts.append(f"{path}@{mt}")
+        except FileNotFoundError:
+            parts.append(f"{path}@missing")
+        except Exception:
+            pass
+
+    blob = "|".join(parts).encode("utf-8", errors="replace")
+    # blake2b for cache-key fingerprinting only — not for credential storage.
+    # We never reverse this hash; collisions are harmless (worst case: cache
+    # miss → live re-fetch). Use blake2b instead of sha256 here because
+    # CodeQL's `py/weak-sensitive-data-hashing` rule flags sha256 over env
+    # vars whose names contain "API_KEY" / "TOKEN" even when the hash is
+    # used as an identity fingerprint, not for password storage. blake2b
+    # is a keyed-hash primitive and isn't flagged.
+    return hashlib.blake2b(blob, digest_size=8).hexdigest()
+
+
+def _load_provider_models_cache() -> dict:
+    """Return the full cache dict, or {} on any error."""
+    try:
+        path = _provider_models_cache_path()
+        if not path.exists():
+            return {}
+        with open(path, encoding="utf-8") as f:
+            data = json.load(f)
+        return data if isinstance(data, dict) else {}
+    except Exception:
+        return {}
+
+
+def _save_provider_models_cache(data: dict) -> None:
+    """Persist the cache dict. Best-effort — silent on any error."""
+    try:
+        from utils import atomic_json_write
+        path = _provider_models_cache_path()
+        path.parent.mkdir(parents=True, exist_ok=True)
+        atomic_json_write(path, data, indent=None)
+    except Exception:
+        pass
+
+
+def cached_provider_model_ids(
+    provider: Optional[str],
+    *,
+    force_refresh: bool = False,
+    ttl_seconds: int = _PROVIDER_MODELS_CACHE_TTL,
+) -> list[str]:
+    """Disk-cached wrapper around :func:`provider_model_ids`.
+
+    Hits the cache when fresh; otherwise calls the live function and
+    persists a non-empty result. Always returns a list (never None).
+    """
+    normalized = normalize_provider(provider) or (provider or "")
+    if not normalized:
+        return []
+
+    cache = _load_provider_models_cache()
+    fp = _credential_fingerprint(normalized)
+    entry = cache.get(normalized)
+    now = time.time()
+
+    if (
+        not force_refresh
+        and isinstance(entry, dict)
+        and entry.get("fp") == fp
+        and isinstance(entry.get("models"), list)
+        and entry["models"]
+        and (now - float(entry.get("at", 0))) < ttl_seconds
+    ):
+        return list(entry["models"])
+
+    # Cache miss / stale / forced refresh — call the live path.
+    live = provider_model_ids(normalized, force_refresh=force_refresh)
+    if live:
+        cache[normalized] = {
+            "fp": fp,
+            "at": now,
+            "models": list(live),
+        }
+        _save_provider_models_cache(cache)
+        return list(live)
+
+    # Live fetch returned nothing. If we have a stale entry with the
+    # SAME fingerprint, prefer it over an empty result — stale data
+    # beats no data when the network is flaky.
+    if (
+        isinstance(entry, dict)
+        and entry.get("fp") == fp
+        and isinstance(entry.get("models"), list)
+        and entry["models"]
+    ):
+        return list(entry["models"])
+    return list(live or [])
+
+
+def clear_provider_models_cache(provider: Optional[str] = None) -> None:
+    """Drop a single provider's cache entry, or wipe the whole cache.
+
+    ``provider=None`` wipes everything; otherwise only that provider's
+    entry is removed. Used by ``/model --refresh`` and
+    ``hermes model --refresh``.
+    """
+    try:
+        if provider is None:
+            path = _provider_models_cache_path()
+            if path.exists():
+                path.unlink()
+            return
+        cache = _load_provider_models_cache()
+        normalized = normalize_provider(provider) or provider or ""
+        if normalized in cache:
+            del cache[normalized]
+            _save_provider_models_cache(cache)
+    except Exception:
+        pass
+
+
 def _fetch_anthropic_models(timeout: float = 5.0) -> Optional[list[str]]:
    """Fetch available models from the Anthropic /v1/models endpoint.

@@ -0,0 +1,678 @@
+"""Normalized Nous Portal account entitlement helpers."""
+
+from __future__ import annotations
+
+import hashlib
+import json
+import time
+import urllib.request
+from dataclasses import dataclass
+from datetime import datetime, timezone
+from typing import Any, Literal, Optional
+
+
+NousAccountInfoSource = Literal["jwt", "account_api", "inference_key", "none", "error"]
+
+_ACCOUNT_INFO_CACHE_TTL = 60
+_account_info_cache: tuple[str, float, "NousPortalAccountInfo"] | None = None
+
+
+@dataclass(frozen=True)
+class NousPortalSubscriptionInfo:
+    plan: Optional[str] = None
+    tier: Optional[int] = None
+    monthly_charge: Optional[float] = None
+    current_period_end: Optional[str] = None
+    credits_remaining: Optional[float] = None
+    rollover_credits: Optional[float] = None
+
+
+@dataclass(frozen=True)
+class NousPaidServiceAccessInfo:
+    allowed: Optional[bool] = None
+    paid_access: Optional[bool] = None
+    reason: Optional[str] = None
+    organisation_id: Optional[str] = None
+    effective_at_ms: Optional[int] = None
+    has_active_subscription: Optional[bool] = None
+    active_subscription_is_paid: Optional[bool] = None
+    subscription_tier: Optional[int] = None
+    subscription_monthly_charge: Optional[float] = None
+    subscription_credits_remaining: Optional[float] = None
+    purchased_credits_remaining: Optional[float] = None
+    total_usable_credits: Optional[float] = None
+
+
+@dataclass(frozen=True)
+class NousPortalAccountInfo:
+    logged_in: bool
+    source: NousAccountInfoSource
+    fresh: bool
+    user_id: Optional[str] = None
+    org_id: Optional[str] = None
+    client_id: Optional[str] = None
+    product_id: Optional[str] = None
+    nous_client: Optional[str] = None
+    portal_base_url: Optional[str] = None
+    inference_base_url: Optional[str] = None
+    inference_credential_present: bool = False
+    credential_source: Optional[str] = None
+    expires_at: Optional[datetime] = None
+    email: Optional[str] = None
+    privy_did: Optional[str] = None
+    subscription: Optional[NousPortalSubscriptionInfo] = None
+    paid_service_access: Optional[bool] = None
+    paid_service_access_info: Optional[NousPaidServiceAccessInfo] = None
+    raw_claims: Optional[dict[str, Any]] = None
+    raw_account: Optional[dict[str, Any]] = None
+    error: Optional[str] = None
+
+    @property
+    def is_paid(self) -> bool:
+        return self.paid_service_access is True
+
+    @property
+    def is_free_tier(self) -> bool:
+        return self.paid_service_access is False
+
+    @property
+    def tool_gateway_entitled(self) -> bool:
+        return self.paid_service_access is True
+
+
+def nous_portal_billing_url(account_info: Optional[NousPortalAccountInfo] = None) -> str:
+    """Return the billing URL for a normalized Nous account snapshot."""
+    try:
+        from hermes_cli.auth import DEFAULT_NOUS_PORTAL_URL
+    except Exception:
+        DEFAULT_NOUS_PORTAL_URL = "https://portal.nousresearch.com"
+
+    base = None
+    if account_info is not None:
+        base = account_info.portal_base_url
+    if not isinstance(base, str) or not base.strip():
+        base = DEFAULT_NOUS_PORTAL_URL
+    return f"{base.rstrip('/')}/billing"
+
+
+def format_nous_portal_entitlement_message(
+    account_info: Optional[NousPortalAccountInfo],
+    *,
+    capability: str = "this feature",
+    include_refresh_hint: bool = True,
+) -> Optional[str]:
+    """Return user-facing guidance for a missing Nous paid entitlement.
+
+    ``None`` means the account is known to have paid service access.  The
+    message intentionally works from normalized entitlement fields rather than
+    subscription price alone: purchased credits without a subscription still
+    count as paid access, while a paid subscription with exhausted usable
+    credits does not.
+    """
+    billing_url = nous_portal_billing_url(account_info)
+
+    if account_info is not None and account_info.paid_service_access is True:
+        return None
+
+    if account_info is None:
+        return (
+            f"Hermes could not verify your Nous Portal entitlement, so {capability} "
+            f"is unavailable. Run `hermes model` to refresh your login, or check "
+            f"billing at {billing_url}."
+        )
+
+    if not account_info.logged_in:
+        if account_info.inference_credential_present:
+            return (
+                f"Nous inference credentials are configured, but Hermes cannot verify "
+                f"your Nous Portal paid access for {capability}. Log in with "
+                f"`hermes model` to enable Portal-managed features. Billing and "
+                f"credits are managed at {billing_url}."
+            )
+        return (
+            f"Log in to Nous Portal to use {capability}: run `hermes model`. "
+            f"Billing and credits are managed at {billing_url}."
+        )
+
+    if account_info.paid_service_access is None:
+        detail = (
+            f"Hermes could not verify your Nous Portal paid access, so {capability} "
+            f"is unavailable."
+        )
+        if account_info.error:
+            detail += f" Account lookup failed: {account_info.error}."
+        if include_refresh_hint:
+            detail += " Run `hermes model` to refresh your session."
+        detail += f" Check billing at {billing_url}."
+        return detail
+
+    access = account_info.paid_service_access_info
+    reason = access.reason if access else None
+    if reason == "account_missing":
+        return (
+            f"Hermes could not find a Nous Portal account or organisation for this "
+            f"login, so {capability} is unavailable. Run `hermes model` to "
+            f"authenticate again; if the problem persists, contact Nous support."
+        )
+
+    if reason == "no_usable_credits" or account_info.paid_service_access is False:
+        message = _no_paid_access_message(account_info, capability, billing_url)
+        if include_refresh_hint and not account_info.fresh:
+            message += " If you recently bought credits, run `hermes model` to refresh Hermes."
+        return message
+
+    return (
+        f"Your Nous Portal account does not currently have paid service access, "
+        f"so {capability} is unavailable. Add credits or update billing at {billing_url}."
+    )
+
+
+def _no_paid_access_message(
+    account_info: NousPortalAccountInfo,
+    capability: str,
+    billing_url: str,
+) -> str:
+    access = account_info.paid_service_access_info
+    has_active_subscription = access.has_active_subscription if access else None
+    active_subscription_is_paid = access.active_subscription_is_paid if access else None
+    total_usable = access.total_usable_credits if access else None
+    subscription_credits = access.subscription_credits_remaining if access else None
+    purchased_credits = access.purchased_credits_remaining if access else None
+
+    if has_active_subscription and active_subscription_is_paid:
+        credit_detail = _credit_detail(total_usable, subscription_credits, purchased_credits)
+        return (
+            f"Your Nous Portal credits are exhausted{credit_detail}, so {capability} "
+            f"is unavailable. Top up or renew credits at {billing_url}."
+        )
+
+    if has_active_subscription and active_subscription_is_paid is False:
+        return (
+            f"Your current Nous Portal plan does not include paid service access, "
+            f"so {capability} is unavailable. Upgrade or add credits at {billing_url}."
+        )
+
+    if has_active_subscription is False:
+        credit_detail = _credit_detail(total_usable, subscription_credits, purchased_credits)
+        return (
+            f"Your Nous Portal account has no active subscription or usable credits"
+            f"{credit_detail}, so {capability} is unavailable. Subscribe or add credits "
+            f"at {billing_url}."
+        )
+
+    credit_detail = _credit_detail(total_usable, subscription_credits, purchased_credits)
+    return (
+        f"Your Nous Portal account has no usable paid credits{credit_detail}, so "
+        f"{capability} is unavailable. Add credits or update billing at {billing_url}."
+    )
+
+
+def _credit_detail(
+    total_usable: Optional[float],
+    subscription_credits: Optional[float],
+    purchased_credits: Optional[float],
+) -> str:
+    parts: list[str] = []
+    if total_usable is not None:
+        parts.append(f"usable ${total_usable:.2f}")
+    if subscription_credits is not None:
+        parts.append(f"subscription ${subscription_credits:.2f}")
+    if purchased_credits is not None:
+        parts.append(f"purchased ${purchased_credits:.2f}")
+    if not parts:
+        return ""
+    return f" ({', '.join(parts)})"
+
+
+def reset_nous_portal_account_info_cache() -> None:
+    """Clear the short-lived account-info cache used by tests."""
+    global _account_info_cache
+    _account_info_cache = None
+
+
+def get_nous_portal_account_info(
+    *,
+    force_fresh: bool = False,
+    min_jwt_ttl_seconds: int = 60,
+) -> NousPortalAccountInfo:
+    """Return normalized Nous Portal account entitlement information.
+
+    By default, a valid unexpired OAuth access JWT is used as a low-latency
+    local account snapshot. ``force_fresh=True`` always calls
+    ``/api/oauth/account`` and bypasses the short-lived cache. JWT claims are
+    decoded locally for UX gating only; server APIs remain authoritative.
+    """
+    try:
+        from hermes_cli.auth import get_provider_auth_state
+
+        state = get_provider_auth_state("nous") or {}
+    except Exception as exc:
+        return _error_info(error=exc, logged_in=False)
+
+    access_token = state.get("access_token")
+    portal_base_url = _portal_base_url(state)
+    if not isinstance(access_token, str) or not access_token.strip():
+        pool_oauth_info = _info_from_oauth_pool(
+            force_fresh=force_fresh,
+            min_jwt_ttl_seconds=min_jwt_ttl_seconds,
+            portal_base_url=portal_base_url,
+        )
+        if pool_oauth_info is not None:
+            return pool_oauth_info
+        pool_info = _info_from_inference_key_pool(portal_base_url)
+        if pool_info is not None:
+            return pool_info
+        return NousPortalAccountInfo(
+            logged_in=False,
+            source="none",
+            fresh=False,
+            portal_base_url=portal_base_url,
+        )
+
+    if not force_fresh:
+        jwt_info = _info_from_valid_jwt(
+            access_token,
+            state=state,
+            portal_base_url=portal_base_url,
+            min_jwt_ttl_seconds=min_jwt_ttl_seconds,
+        )
+        if jwt_info is not None:
+            return jwt_info
+
+    return _fresh_account_info(
+        state=state,
+        force_fresh=force_fresh,
+        portal_base_url=portal_base_url,
+    )
+
+
+def _fresh_account_info(
+    *,
+    state: dict[str, Any],
+    force_fresh: bool,
+    portal_base_url: Optional[str],
+) -> NousPortalAccountInfo:
+    global _account_info_cache
+
+    try:
+        from hermes_cli.auth import get_provider_auth_state, resolve_nous_access_token
+
+        access_token = resolve_nous_access_token()
+        refreshed_state = get_provider_auth_state("nous") or state
+        portal_base_url = _portal_base_url(refreshed_state) or portal_base_url
+        cache_key = _cache_key(access_token, portal_base_url)
+
+        if not force_fresh and _account_info_cache is not None:
+            cached_key, cached_at, cached_info = _account_info_cache
+            if cached_key == cache_key and (time.monotonic() - cached_at) < _ACCOUNT_INFO_CACHE_TTL:
+                return cached_info
+
+        payload = _fetch_nous_account_info(access_token, portal_base_url)
+        if not payload:
+            return _error_info(
+                error="empty_account_response",
+                logged_in=True,
+                portal_base_url=portal_base_url,
+            )
+        if isinstance(payload.get("error"), str):
+            return _error_info(
+                error=payload.get("error") or "account_response_error",
+                logged_in=True,
+                portal_base_url=portal_base_url,
+                raw_account=payload,
+            )
+
+        info = _info_from_account_payload(
+            payload,
+            state=refreshed_state,
+            portal_base_url=portal_base_url,
+        )
+        _account_info_cache = (cache_key, time.monotonic(), info)
+        return info
+    except Exception as exc:
+        return _error_info(
+            error=exc,
+            logged_in=bool(state.get("access_token")),
+            portal_base_url=portal_base_url,
+        )
+
+
+def _info_from_inference_key_pool(
+    portal_base_url: Optional[str],
+) -> Optional[NousPortalAccountInfo]:
+    """Return an explicit unknown-entitlement snapshot for opaque Nous keys."""
+    try:
+        entry = _select_nous_pool_entry()
+        if entry is None:
+            return None
+        runtime_key = getattr(entry, "runtime_api_key", None) or getattr(entry, "access_token", "")
+        if not isinstance(runtime_key, str) or not runtime_key.strip():
+            return None
+
+        return NousPortalAccountInfo(
+            logged_in=False,
+            source="inference_key",
+            fresh=False,
+            portal_base_url=(
+                getattr(entry, "portal_base_url", None)
+                or portal_base_url
+            ),
+            inference_base_url=(
+                getattr(entry, "inference_base_url", None)
+                or getattr(entry, "runtime_base_url", None)
+                or getattr(entry, "base_url", None)
+            ),
+            inference_credential_present=True,
+            credential_source=f"pool:{getattr(entry, 'label', 'unknown')}",
+            error="portal_oauth_missing",
+        )
+    except Exception:
+        return None
+
+
+def _info_from_oauth_pool(
+    *,
+    force_fresh: bool,
+    min_jwt_ttl_seconds: int,
+    portal_base_url: Optional[str],
+) -> Optional[NousPortalAccountInfo]:
+    try:
+        entry = _select_nous_pool_entry()
+    except Exception:
+        return None
+    if entry is None or not _pool_entry_is_portal_oauth(entry):
+        return None
+
+    access_token = getattr(entry, "access_token", None)
+    if not isinstance(access_token, str) or not access_token.strip():
+        return None
+
+    entry_portal_url = (
+        getattr(entry, "portal_base_url", None)
+        or portal_base_url
+    )
+    state = {
+        "access_token": access_token,
+        "client_id": getattr(entry, "client_id", None),
+        "inference_base_url": (
+            getattr(entry, "inference_base_url", None)
+            or getattr(entry, "runtime_base_url", None)
+            or getattr(entry, "base_url", None)
+        ),
+        "agent_key": getattr(entry, "agent_key", None),
+        "credential_source": f"pool:{getattr(entry, 'label', 'unknown')}",
+    }
+
+    if not force_fresh:
+        jwt_info = _info_from_valid_jwt(
+            access_token,
+            state=state,
+            portal_base_url=entry_portal_url,
+            min_jwt_ttl_seconds=min_jwt_ttl_seconds,
+        )
+        if jwt_info is not None:
+            return jwt_info
+
+    try:
+        payload = _fetch_nous_account_info(access_token, entry_portal_url)
+    except Exception as exc:
+        return _error_info(
+            error=exc,
+            logged_in=True,
+            portal_base_url=entry_portal_url,
+        )
+    if not payload:
+        return _error_info(
+            error="empty_account_response",
+            logged_in=True,
+            portal_base_url=entry_portal_url,
+        )
+    if isinstance(payload.get("error"), str):
+        return _error_info(
+            error=payload.get("error") or "account_response_error",
+            logged_in=True,
+            portal_base_url=entry_portal_url,
+            raw_account=payload,
+        )
+    return _info_from_account_payload(
+        payload,
+        state=state,
+        portal_base_url=entry_portal_url,
+    )
+
+
+def _select_nous_pool_entry() -> Optional[Any]:
+    from agent.credential_pool import load_pool
+
+    pool = load_pool("nous")
+    if not pool or not pool.has_credentials():
+        return None
+    entries = list(pool.entries())
+    if not entries:
+        return None
+
+    def _entry_sort_key(entry: Any) -> tuple[float, float, int]:
+        agent_exp = _parse_iso_timestamp(getattr(entry, "agent_key_expires_at", None)) or 0.0
+        access_exp = _parse_iso_timestamp(getattr(entry, "expires_at", None)) or 0.0
+        priority = int(getattr(entry, "priority", 0) or 0)
+        return (agent_exp, access_exp, -priority)
+
+    return max(entries, key=_entry_sort_key)
+
+
+def _pool_entry_is_portal_oauth(entry: Any) -> bool:
+    access_token = getattr(entry, "access_token", None)
+    if not isinstance(access_token, str) or not access_token.strip():
+        return False
+    auth_type = str(getattr(entry, "auth_type", "") or "").strip().lower()
+    refresh_token = getattr(entry, "refresh_token", None)
+    return auth_type.startswith("oauth") or bool(refresh_token)
+
+
+def _fetch_nous_account_info(
+    access_token: str,
+    portal_base_url: Optional[str] = None,
+) -> dict[str, Any]:
+    base = (portal_base_url or "https://portal.nousresearch.com").rstrip("/")
+    url = f"{base}/api/oauth/account"
+    headers = {
+        "Authorization": f"Bearer {access_token}",
+        "Accept": "application/json",
+    }
+    req = urllib.request.Request(url, headers=headers)
+    with urllib.request.urlopen(req, timeout=8) as resp:
+        payload = json.loads(resp.read().decode())
+    return payload if isinstance(payload, dict) else {}
+
+
+def _info_from_valid_jwt(
+    token: str,
+    *,
+    state: dict[str, Any],
+    portal_base_url: Optional[str],
+    min_jwt_ttl_seconds: int,
+) -> Optional[NousPortalAccountInfo]:
+    try:
+        from hermes_cli.auth import _decode_jwt_claims
+    except Exception:
+        return None
+
+    claims = _decode_jwt_claims(token)
+    if not claims:
+        return None
+
+    exp = _coerce_float(claims.get("exp"))
+    if exp is None or exp <= time.time() + max(0, int(min_jwt_ttl_seconds)):
+        return None
+
+    paid_access = _coerce_bool(claims.get("paid_access"))
+    subscription_tier = _coerce_int(claims.get("subscription_tier"))
+    access_info = NousPaidServiceAccessInfo(
+        allowed=paid_access,
+        paid_access=paid_access,
+        organisation_id=_coerce_str(claims.get("org_id")),
+        subscription_tier=subscription_tier,
+    )
+
+    return NousPortalAccountInfo(
+        logged_in=True,
+        source="jwt",
+        fresh=False,
+        user_id=_coerce_str(claims.get("sub")),
+        org_id=_coerce_str(claims.get("org_id")),
+        client_id=_coerce_str(claims.get("client_id") or state.get("client_id")),
+        product_id=_coerce_str(claims.get("product_id")),
+        nous_client=_coerce_str(claims.get("nous_client")),
+        portal_base_url=portal_base_url,
+        inference_base_url=_coerce_str(state.get("inference_base_url")),
+        inference_credential_present=True,
+        credential_source=_coerce_str(state.get("credential_source")) or "auth_store",
+        expires_at=datetime.fromtimestamp(exp, tz=timezone.utc),
+        paid_service_access=paid_access,
+        paid_service_access_info=access_info,
+        raw_claims=dict(claims),
+    )
+
+
+def _info_from_account_payload(
+    payload: dict[str, Any],
+    *,
+    state: dict[str, Any],
+    portal_base_url: Optional[str],
+) -> NousPortalAccountInfo:
+    user = payload.get("user") if isinstance(payload.get("user"), dict) else {}
+    organisation = (
+        payload.get("organisation")
+        if isinstance(payload.get("organisation"), dict)
+        else {}
+    )
+    subscription = _subscription_from_payload(payload.get("subscription"))
+    access = _paid_service_access_from_payload(payload.get("paid_service_access"))
+    paid_access = access.allowed if access else None
+    if paid_access is None and access is not None:
+        paid_access = access.paid_access
+
+    return NousPortalAccountInfo(
+        logged_in=True,
+        source="account_api",
+        fresh=True,
+        org_id=_coerce_str(organisation.get("id")) or (access.organisation_id if access else None),
+        client_id=_coerce_str(state.get("client_id")),
+        portal_base_url=portal_base_url,
+        inference_base_url=_coerce_str(state.get("inference_base_url")),
+        inference_credential_present=bool(state.get("access_token") or state.get("agent_key")),
+        credential_source=_coerce_str(state.get("credential_source")) or "auth_store",
+        email=_coerce_str(user.get("email")),
+        privy_did=_coerce_str(user.get("privy_did")),
+        subscription=subscription,
+        paid_service_access=paid_access,
+        paid_service_access_info=access,
+        raw_account=dict(payload),
+    )
+
+
+def _subscription_from_payload(value: Any) -> Optional[NousPortalSubscriptionInfo]:
+    if not isinstance(value, dict):
+        return None
+    return NousPortalSubscriptionInfo(
+        plan=_coerce_str(value.get("plan")),
+        tier=_coerce_int(value.get("tier")),
+        monthly_charge=_coerce_float(value.get("monthly_charge")),
+        current_period_end=_coerce_str(value.get("current_period_end")),
+        credits_remaining=_coerce_float(value.get("credits_remaining")),
+        rollover_credits=_coerce_float(value.get("rollover_credits")),
+    )
+
+
+def _paid_service_access_from_payload(value: Any) -> Optional[NousPaidServiceAccessInfo]:
+    if not isinstance(value, dict):
+        return None
+    allowed = _coerce_bool(value.get("allowed"))
+    paid_access = _coerce_bool(value.get("paid_access"))
+    return NousPaidServiceAccessInfo(
+        allowed=allowed,
+        paid_access=paid_access,
+        reason=_coerce_str(value.get("reason")),
+        organisation_id=_coerce_str(value.get("organisation_id")),
+        effective_at_ms=_coerce_int(value.get("effective_at_ms")),
+        has_active_subscription=_coerce_bool(value.get("has_active_subscription")),
+        active_subscription_is_paid=_coerce_bool(value.get("active_subscription_is_paid")),
+        subscription_tier=_coerce_int(value.get("subscription_tier")),
+        subscription_monthly_charge=_coerce_float(value.get("subscription_monthly_charge")),
+        subscription_credits_remaining=_coerce_float(value.get("subscription_credits_remaining")),
+        purchased_credits_remaining=_coerce_float(value.get("purchased_credits_remaining")),
+        total_usable_credits=_coerce_float(value.get("total_usable_credits")),
+    )
+
+
+def _error_info(
+    *,
+    error: object,
+    logged_in: bool,
+    portal_base_url: Optional[str] = None,
+    raw_account: Optional[dict[str, Any]] = None,
+) -> NousPortalAccountInfo:
+    return NousPortalAccountInfo(
+        logged_in=logged_in,
+        source="error",
+        fresh=False,
+        portal_base_url=portal_base_url,
+        raw_account=raw_account,
+        error=str(error),
+    )
+
+
+def _portal_base_url(state: dict[str, Any]) -> Optional[str]:
+    value = state.get("portal_base_url")
+    if not isinstance(value, str) or not value.strip():
+        return None
+    return value.strip().rstrip("/")
+
+
+def _cache_key(access_token: str, portal_base_url: Optional[str]) -> str:
+    digest = hashlib.sha256(access_token.encode("utf-8")).hexdigest()
+    return f"{portal_base_url or ''}:{digest}"
+
+
+def _parse_iso_timestamp(value: Any) -> Optional[float]:
+    if not isinstance(value, str) or not value:
+        return None
+    text = value.strip()
+    if text.endswith("Z"):
+        text = text[:-1] + "+00:00"
+    try:
+        return datetime.fromisoformat(text).timestamp()
+    except Exception:
+        return None
+
+
+def _coerce_str(value: Any) -> Optional[str]:
+    if isinstance(value, str) and value:
+        return value
+    return None
+
+
+def _coerce_bool(value: Any) -> Optional[bool]:
+    return value if isinstance(value, bool) else None
+
+
+def _coerce_int(value: Any) -> Optional[int]:
+    if isinstance(value, bool):
+        return None
+    try:
+        if value is None:
+            return None
+        return int(value)
+    except (TypeError, ValueError):
+        return None
+
+
+def _coerce_float(value: Any) -> Optional[float]:
+    if isinstance(value, bool):
+        return None
+    try:
+        if value is None:
+            return None
+        return float(value)
+    except (TypeError, ValueError):
+        return None
@@ -6,8 +6,8 @@ from dataclasses import dataclass
 from pathlib import Path
 from typing import Dict, Iterable, Optional, Set

-from hermes_cli.auth import get_nous_auth_status
 from hermes_cli.config import get_env_value, load_config
+from hermes_cli.nous_account import NousPortalAccountInfo, get_nous_portal_account_info
 from tools.managed_tool_gateway import is_managed_tool_gateway_ready
 from utils import is_truthy_value
 from tools.tool_backend_helpers import (
@@ -53,6 +53,7 @@ class NousSubscriptionFeatures:
    nous_auth_present: bool
    provider_is_nous: bool
    features: Dict[str, NousFeatureState]
+    account_info: Optional[NousPortalAccountInfo] = None

    @property
    def web(self) -> NousFeatureState:
@@ -227,6 +228,8 @@ def _resolve_browser_feature_state(

 def get_nous_subscription_features(
    config: Optional[Dict[str, object]] = None,
+    *,
+    force_fresh: bool = False,
 ) -> NousSubscriptionFeatures:
    if config is None:
        config = load_config() or {}
@@ -235,12 +238,19 @@ def get_nous_subscription_features(
    provider_is_nous = str(model_cfg.get("provider") or "").strip().lower() == "nous"

    try:
-        nous_status = get_nous_auth_status()
+        if force_fresh:
+            account_info = get_nous_portal_account_info(force_fresh=True)
+        else:
+            account_info = get_nous_portal_account_info()
    except Exception:
-        nous_status = {}
+        account_info = None

-    managed_tools_flag = managed_nous_tools_enabled()
-    nous_auth_present = bool(nous_status.get("logged_in"))
+    managed_tools_flag = bool(
+        account_info
+        and account_info.logged_in
+        and account_info.paid_service_access is True
+    )
+    nous_auth_present = bool(account_info and account_info.logged_in)
    subscribed = provider_is_nous or nous_auth_present

    web_tool_enabled = _toolset_enabled(config, "web")
@@ -317,6 +327,7 @@ def get_nous_subscription_features(
        modal_mode,
        has_direct=direct_modal,
        managed_ready=managed_modal_available,
+        managed_enabled=managed_tools_flag,
    )

    web_managed = web_backend == "firecrawl" and managed_web_available and not direct_firecrawl
@@ -483,6 +494,7 @@ def get_nous_subscription_features(
        nous_auth_present=nous_auth_present,
        provider_is_nous=provider_is_nous,
        features=features,
+        account_info=account_info,
    )


@@ -493,11 +505,15 @@ def apply_nous_managed_defaults(
    config: Dict[str, object],
    *,
    enabled_toolsets: Optional[Iterable[str]] = None,
+    force_fresh: bool = False,
 ) -> set[str]:
-    if not managed_nous_tools_enabled():
+    features = get_nous_subscription_features(config, force_fresh=force_fresh)
+    if not (
+        features.account_info
+        and features.account_info.logged_in
+        and features.account_info.paid_service_access is True
+    ):
        return set()
-
-    features = get_nous_subscription_features(config)
    if not features.provider_is_nous:
        return set()

@@ -594,6 +610,8 @@ _ALL_GATEWAY_KEYS = ("web", "image_gen", "tts", "browser")

 def get_gateway_eligible_tools(
    config: Optional[Dict[str, object]] = None,
+    *,
+    force_fresh: bool = False,
 ) -> tuple[list[str], list[str], list[str]]:
    """Return (unconfigured, has_direct, already_managed) tool key lists.

@@ -604,7 +622,11 @@ def get_gateway_eligible_tools(
    All lists are empty when the user is not a paid Nous subscriber or
    is not using Nous as their provider.
    """
-    if not managed_nous_tools_enabled():
+    if force_fresh:
+        managed_enabled = managed_nous_tools_enabled(force_fresh=True)
+    else:
+        managed_enabled = managed_nous_tools_enabled()
+    if not managed_enabled:
        return [], [], []

    if config is None:
@@ -695,7 +717,11 @@ def apply_gateway_defaults(
    return changed


-def prompt_enable_tool_gateway(config: Dict[str, object]) -> set[str]:
+def prompt_enable_tool_gateway(
+    config: Dict[str, object],
+    *,
+    force_fresh: bool = True,
+) -> set[str]:
    """If eligible tools exist, prompt the user to enable the Tool Gateway.

    Uses prompt_choice() with a description parameter so the curses TUI
@@ -704,7 +730,10 @@ def prompt_enable_tool_gateway(config: Dict[str, object]) -> set[str]:
    Returns the set of tools that were enabled, or empty set if the user
    declined or no tools were eligible.
    """
-    unconfigured, has_direct, already_managed = get_gateway_eligible_tools(config)
+    unconfigured, has_direct, already_managed = get_gateway_eligible_tools(
+        config,
+        force_fresh=force_fresh,
+    )
    if not unconfigured and not has_direct:
        return set()

@@ -864,12 +864,35 @@ def _discover_memory_providers() -> list[tuple[str, str]]:


 def _discover_context_engines() -> list[tuple[str, str]]:
-    """Return [(name, description), ...] for available context engines."""
+    """Return [(name, description), ...] for available context engines.
+
+    Includes repo-shipped engines from ``plugins/context_engine/`` AND
+    plugin-registered engines (third-party engines installed as Hermes
+    plugins via ``ctx.register_context_engine``). Repo-shipped descriptions
+    win when a plugin-registered engine collides on name.
+    """
+    engines: list[tuple[str, str]] = []
+    seen: set[str] = set()
+
    try:
        from plugins.context_engine import discover_context_engines
-        return [(name, desc) for name, desc, _avail in discover_context_engines()]
+        for name, desc, _avail in discover_context_engines():
+            if name not in seen:
+                engines.append((name, desc))
+                seen.add(name)
    except Exception:
-        return []
+        pass
+
+    try:
+        from hermes_cli.plugins import discover_plugins, get_plugin_context_engine
+        discover_plugins()
+        plugin_engine = get_plugin_context_engine()
+        if plugin_engine and getattr(plugin_engine, "name", None) and plugin_engine.name not in seen:
+            engines.append((plugin_engine.name, "installed plugin"))
+    except Exception:
+        pass
+
+    return engines


 def _get_current_memory_provider() -> str:
@@ -79,7 +79,7 @@ class XAIGrokAdapter(UpstreamAdapter):
        failed_credential: UpstreamCredential,
        status_code: int,
    ) -> Optional[UpstreamCredential]:
-        if status_code != 401:
+        if status_code not in {401, 429}:
            return None

        with self._lock:
@@ -87,16 +87,25 @@ class XAIGrokAdapter(UpstreamAdapter):
            if pool is None:
                return None

-            refreshed = pool.try_refresh_current()
-            if refreshed is None:
+            if status_code == 429:
+                # Mark the rate-limited key with its 1-hour cooldown and rotate
+                # to the next available credential. Returns None when the pool
+                # has no other key to offer — the 429 will flow back to the client.
                refreshed = pool.mark_exhausted_and_rotate(status_code=status_code)
+            else:
+                refreshed = pool.try_refresh_current()
+                if refreshed is None:
+                    refreshed = pool.mark_exhausted_and_rotate(status_code=status_code)
            if refreshed is None:
                return None

            retry_cred = self._credential_from_entry(refreshed)
            if retry_cred.bearer == failed_credential.bearer:
                return None
-            logger.info("proxy: xAI upstream rejected bearer; retrying with refreshed pool credential")
+            logger.info(
+                "proxy: xAI upstream returned %s; retrying with rotated pool credential",
+                status_code,
+            )
            return retry_cred

    def _load_pool(self) -> Optional[CredentialPool]:
@@ -206,7 +206,7 @@ def create_app(adapter: UpstreamAdapter) -> "web.Application":
            return session_or_response
        session = session_or_response

-        if upstream_resp.status == 401:
+        if upstream_resp.status in {401, 429}:
            try:
                retry_cred = adapter.get_retry_credential(
                    failed_credential=cred,
@@ -0,0 +1,108 @@
+"""Helpers for the temporary psutil-on-Android compatibility installer."""
+
+from __future__ import annotations
+
+import shutil
+import tarfile
+from pathlib import Path, PurePosixPath
+
+# Pin a version we know patches cleanly. Update when a newer psutil
+# changes the marker line shape and we need to follow upstream.
+PSUTIL_URL = (
+    "https://files.pythonhosted.org/packages/aa/c6/"
+    "d1ddf4abb55e93cebc4f2ed8b5d6dbad109ecb8d63748dd2b20ab5e57ebe/"
+    "psutil-7.2.2.tar.gz"
+)
+
+MARKER = 'LINUX = sys.platform.startswith("linux")'
+REPLACEMENT = 'LINUX = sys.platform.startswith(("linux", "android"))'
+
+
+class PsutilAndroidInstallError(RuntimeError):
+    """Raised when the pinned psutil sdist is missing or unsafe."""
+
+
+def _normalize_member_parts(member_name: str) -> tuple[str, ...]:
+    path = PurePosixPath(member_name)
+    parts = tuple(part for part in path.parts if part not in ("", "."))
+    if path.is_absolute() or ".." in parts or not parts:
+        raise PsutilAndroidInstallError(
+            f"Unsafe archive member path: {member_name!r}"
+        )
+    return parts
+
+
+def _safe_extract_tar_gz(archive: Path, destination: Path) -> None:
+    """Extract a tar.gz without allowing traversal or link members."""
+    with tarfile.open(archive, "r:gz") as tf:
+        for member in tf.getmembers():
+            parts = _normalize_member_parts(member.name)
+            target = destination.joinpath(*parts)
+
+            if member.isdir():
+                target.mkdir(parents=True, exist_ok=True)
+                continue
+
+            if not member.isfile():
+                raise PsutilAndroidInstallError(
+                    f"Unsupported archive member type: {member.name}"
+                )
+
+            target.parent.mkdir(parents=True, exist_ok=True)
+            extracted = tf.extractfile(member)
+            if extracted is None:
+                raise PsutilAndroidInstallError(
+                    f"Cannot read archive member: {member.name}"
+                )
+
+            with extracted, open(target, "wb") as dst:
+                shutil.copyfileobj(extracted, dst)
+
+            try:
+                target.chmod(member.mode & 0o777)
+            except OSError:
+                pass
+
+
+def prepare_patched_psutil_sdist(archive: Path, destination: Path) -> Path:
+    """Safely extract the pinned psutil sdist and patch it for Android."""
+    _safe_extract_tar_gz(archive, destination)
+
+    src_roots = sorted(
+        (
+            path for path in destination.iterdir()
+            if path.is_dir() and path.name.startswith("psutil-")
+        ),
+        key=lambda path: path.name,
+    )
+    if not src_roots:
+        raise PsutilAndroidInstallError(
+            "psutil sdist did not contain a psutil-* directory"
+        )
+
+    src_root = src_roots[0]
+    common_py = src_root / "psutil" / "_common.py"
+    if not common_py.is_file():
+        raise PsutilAndroidInstallError(
+            f"psutil sdist did not contain {common_py.relative_to(src_root)!s}"
+        )
+    try:
+        content = common_py.read_text(encoding="utf-8")
+    except OSError as exc:
+        raise PsutilAndroidInstallError(
+            f"Failed to read {common_py.relative_to(src_root)!s}"
+        ) from exc
+    if MARKER not in content:
+        raise PsutilAndroidInstallError(
+            "psutil Android compatibility patch marker not found"
+        )
+    try:
+        common_py.write_text(
+            content.replace(MARKER, REPLACEMENT),
+            encoding="utf-8",
+        )
+    except OSError as exc:
+        raise PsutilAndroidInstallError(
+            f"Failed to write {common_py.relative_to(src_root)!s}"
+        ) from exc
+    return src_root
@@ -58,7 +58,9 @@ def _resolve_short_name(name: str, sources, console: Console) -> str:
        table = Table()
        table.add_column("Source", style="dim")
        table.add_column("Trust", style="dim")
-        table.add_column("Identifier", style="bold cyan")
+        # overflow="fold" keeps the full slug visible (wraps instead of ellipsis-truncating)
+        # so users can copy it for `hermes skills install`.
+        table.add_column("Identifier", style="bold cyan", overflow="fold", no_wrap=False)
        for r in exact:
            trust_style = {"builtin": "bright_cyan", "trusted": "green", "community": "yellow"}.get(r.trust_level, "dim")
            trust_label = "official" if r.source == "official" else r.trust_level
@@ -244,15 +246,39 @@ def _prompt_for_category(c: Console, existing: List[str]) -> str:


 def do_search(query: str, source: str = "all", limit: int = 10,
-              console: Optional[Console] = None) -> None:
-    """Search registries and display results as a Rich table."""
+              console: Optional[Console] = None, as_json: bool = False) -> None:
+    """Search registries and display results as a Rich table.
+
+    When ``as_json=True`` writes a JSON array of result records to stdout
+    (one object per skill: ``name``, ``identifier``, ``source``,
+    ``trust_level``, ``description``) and skips the table render. This is
+    the scripting / copy-paste handle: the full identifier is always
+    intact, even for browse-sh slugs that the table would otherwise wrap.
+    """
    from tools.skills_hub import GitHubAuth, create_source_router, unified_search

    c = console or _console
-    c.print(f"\n[bold]Searching for:[/] {query}")

    auth = GitHubAuth()
    sources = create_source_router(auth)
+    if as_json:
+        # Avoid Rich status spinner contaminating stdout — JSON consumers
+        # expect a clean parseable stream.
+        results = unified_search(query, sources, source_filter=source, limit=limit)
+        payload = [
+            {
+                "name": r.name,
+                "identifier": r.identifier,
+                "source": r.source,
+                "trust_level": r.trust_level,
+                "description": r.description,
+            }
+            for r in results
+        ]
+        print(json.dumps(payload, indent=2))
+        return
+
+    c.print(f"\n[bold]Searching for:[/] {query}")
    with c.status("[bold]Searching registries..."):
        results = unified_search(query, sources, source_filter=source, limit=limit)

@@ -265,7 +291,11 @@ def do_search(query: str, source: str = "all", limit: int = 10,
    table.add_column("Description", max_width=60)
    table.add_column("Source", style="dim")
    table.add_column("Trust", style="dim")
-    table.add_column("Identifier", style="dim")
+    # overflow="fold" keeps the full slug visible (wraps instead of
+    # ellipsis-truncating). Browse.sh slugs end in a `-XXXXXX` hash that
+    # is part of the actual identifier — truncating it makes copy-paste
+    # into `hermes skills install` fail.
+    table.add_column("Identifier", style="dim", overflow="fold", no_wrap=False)

    for r in results:
        trust_style = {"builtin": "bright_cyan", "trusted": "green", "community": "yellow"}.get(r.trust_level, "dim")
@@ -280,7 +310,8 @@ def do_search(query: str, source: str = "all", limit: int = 10,

    c.print(table)
    c.print("[dim]Use: hermes skills inspect <identifier> to preview, "
-            "hermes skills install <identifier> to install[/]\n")
+            "hermes skills install <identifier> to install "
+            "(--json for scripting)[/]\n")


 def do_browse(page: int = 1, page_size: int = 20, source: str = "all",
@@ -1390,7 +1421,8 @@ def skills_command(args) -> None:
    if action == "browse":
        do_browse(page=args.page, page_size=args.size, source=args.source)
    elif action == "search":
-        do_search(args.query, source=args.source, limit=args.limit)
+        do_search(args.query, source=args.source, limit=args.limit,
+                  as_json=getattr(args, "json", False))
    elif action == "install":
        do_install(args.identifier, category=args.category, force=args.force,
                   skip_confirm=getattr(args, "yes", False),
@@ -1511,10 +1543,11 @@ def handle_skills_slash(cmd: str, console: Optional[Console] = None) -> None:

    elif action == "search":
        if not args:
-            c.print("[bold red]Usage:[/] /skills search <query> [--source skills-sh|well-known|github|official] [--limit N]\n")
+            c.print("[bold red]Usage:[/] /skills search <query> [--source skills-sh|well-known|github|official] [--limit N] [--json]\n")
            return
        source = "all"
        limit = 10
+        as_json = False
        query_parts = []
        i = 0
        while i < len(args):
@@ -1527,10 +1560,14 @@ def handle_skills_slash(cmd: str, console: Optional[Console] = None) -> None:
                except ValueError:
                    pass
                i += 2
+            elif args[i] == "--json":
+                as_json = True
+                i += 1
            else:
                query_parts.append(args[i])
                i += 1
-        do_search(" ".join(query_parts), source=source, limit=limit, console=c)
+        do_search(" ".join(query_parts), source=source, limit=limit,
+                  console=c, as_json=as_json)

    elif action == "install":
        if not args:
@@ -16,6 +16,10 @@ from hermes_cli.auth import AuthError, resolve_provider
 from hermes_cli.colors import Colors, color
 from hermes_cli.config import get_env_path, get_env_value, get_hermes_home, load_config
 from hermes_cli.models import provider_label
+from hermes_cli.nous_account import (
+    format_nous_portal_entitlement_message,
+    get_nous_portal_account_info,
+)
 from hermes_cli.nous_subscription import get_nous_subscription_features
 from hermes_cli.runtime_provider import resolve_requested_provider
 from hermes_constants import OPENROUTER_MODELS_URL
@@ -193,26 +197,57 @@ def show_status(args):
        qwen_status = {}
        minimax_status = {}

-    nous_logged_in = bool(nous_status.get("logged_in"))
+    nous_account_info = None
+    if (
+        nous_status.get("logged_in")
+        or nous_status.get("access_token")
+        or nous_status.get("portal_base_url")
+        or nous_status.get("inference_credential_present")
+        or nous_status.get("error_code")
+    ):
+        try:
+            nous_account_info = get_nous_portal_account_info()
+        except Exception:
+            nous_account_info = None
+
+    nous_logged_in = bool(
+        nous_status.get("logged_in")
+        or (nous_account_info and nous_account_info.logged_in)
+    )
+    nous_inference_present = bool(
+        nous_status.get("inference_credential_present")
+        or (nous_account_info and nous_account_info.inference_credential_present)
+    )
    nous_error = nous_status.get("error")
-    nous_label = "logged in" if nous_logged_in else "not logged in (run: hermes auth add nous --type oauth)"
+    if nous_logged_in:
+        nous_label = "logged in"
+    elif nous_inference_present:
+        nous_label = "not logged in (Nous inference key configured)"
+    else:
+        nous_label = "not logged in (run: hermes auth add nous --type oauth)"
    print(
        f"  {'Nous Portal':<12}  {check_mark(nous_logged_in)} "
        f"{nous_label}"
    )
    portal_url = nous_status.get("portal_base_url") or "(unknown)"
+    inference_url = (
+        nous_status.get("inference_base_url")
+        or (nous_account_info.inference_base_url if nous_account_info else None)
+    )
    access_exp = _format_iso_timestamp(nous_status.get("access_expires_at"))
    key_exp = _format_iso_timestamp(nous_status.get("agent_key_expires_at"))
    refresh_label = "yes" if nous_status.get("has_refresh_token") else "no"
    if nous_logged_in or portal_url != "(unknown)" or nous_error:
        print(f"    Portal URL: {portal_url}")
+    if nous_inference_present and inference_url:
+        print(f"    Inference:  {inference_url}")
    if nous_logged_in or nous_status.get("access_expires_at"):
        print(f"    Access exp: {access_exp}")
-    if nous_logged_in or nous_status.get("agent_key_expires_at"):
+    if nous_logged_in or nous_inference_present or nous_status.get("agent_key_expires_at"):
        print(f"    Key exp:    {key_exp}")
    if nous_logged_in or nous_status.get("has_refresh_token"):
        print(f"    Refresh:    {refresh_label}")
-    if nous_error and not nous_logged_in:
+    if nous_error:
        print(f"    Error:      {nous_error}")

    codex_logged_in = bool(codex_status.get("logged_in"))
@@ -303,18 +338,18 @@ def show_status(args):
            else:
                state = "not configured"
            print(f"  {feature.label:<15} {check_mark(feature.available or feature.active or feature.managed_by_nous)} {state}")
-    elif nous_logged_in:
-        # Logged into Nous but on the free tier — show upgrade nudge
+    elif nous_logged_in or nous_inference_present:
+        # Nous OAuth without entitlement, or an opaque inference key without
+        # Portal account information, cannot enable the Tool Gateway.
        print()
        print(color("◆ Nous Tool Gateway", Colors.CYAN, Colors.BOLD))
-        print("  Your free-tier Nous account does not include Tool Gateway access.")
-        print("  Upgrade your subscription to unlock managed web, image, TTS, and browser tools.")
-        try:
-            portal_url = nous_status.get("portal_base_url", "").rstrip("/")
-            if portal_url:
-                print(f"  Upgrade: {portal_url}")
-        except Exception:
-            pass
+        message = format_nous_portal_entitlement_message(
+            nous_account_info,
+            capability="managed web, image, TTS, browser, and Modal tools",
+        )
+        if message:
+            for line in message.splitlines():
+                print(f"  {line}")

    # =========================================================================
    # API-Key Providers
@@ -28,7 +28,8 @@ from hermes_cli.nous_subscription import (
    apply_nous_managed_defaults,
    get_nous_subscription_features,
 )
-from tools.tool_backend_helpers import fal_key_is_configured, managed_nous_tools_enabled
+from hermes_cli.nous_account import format_nous_portal_entitlement_message
+from tools.tool_backend_helpers import fal_key_is_configured
 from utils import base_url_hostname, is_truthy_value

 logger = logging.getLogger(__name__)
@@ -67,6 +68,7 @@ CONFIGURABLE_TOOLSETS = [
    ("skills",          "📚 Skills",                    "list, view, manage"),
    ("todo",            "📋 Task Planning",             "todo"),
    ("memory",          "💾 Memory",                    "persistent memory across sessions"),
+    ("context_engine",  "🧩 Context Engine",            "runtime tools from the active context engine"),
    ("session_search",  "🔎 Session Search",            "search past conversations"),
    ("clarify",         "❓ Clarifying Questions",      "clarify"),
    ("delegation",      "👥 Task Delegation",           "delegate_task"),
@@ -1294,6 +1296,24 @@ def _get_platform_tools(
                enabled_toolsets.add(pts)
            # else: known but not in config = user disabled it

+    # Context-engine tools are runtime-provided by the active engine, so they
+    # are not part of any static platform composite. When a non-default engine
+    # is selected, keep its recovery/status tools available even after a user
+    # saves an explicit platform toolset list. Preserve the explicit empty-list
+    # contract: selecting no configurable tools means no context-engine tools
+    # either unless the user adds ``context_engine`` manually later.
+    context_cfg = config.get("context") or {}
+    if not isinstance(context_cfg, dict):
+        context_cfg = {}
+    context_engine_name = str(context_cfg.get("engine") or "compressor").strip().lower()
+    explicit_empty_selection = (
+        platform in platform_toolsets
+        and isinstance(platform_toolsets.get(platform), list)
+        and not toolset_names
+    )
+    if context_engine_name and context_engine_name != "compressor" and not explicit_empty_selection:
+        enabled_toolsets.add("context_engine")
+
    # Preserve any explicit non-configurable toolset entries (for example,
    # custom toolsets or MCP server names saved in platform_toolsets).
    explicit_passthrough = {
@@ -1399,7 +1419,12 @@ def _save_platform_tools(config: dict, platform: str, enabled_toolset_keys: Set[
    save_config(config)


-def _toolset_has_keys(ts_key: str, config: dict = None) -> bool:
+def _toolset_has_keys(
+    ts_key: str,
+    config: dict = None,
+    *,
+    force_fresh: bool = False,
+) -> bool:
    """Check if a toolset's required API keys are configured."""
    if config is None:
        config = load_config()
@@ -1414,7 +1439,7 @@ def _toolset_has_keys(ts_key: str, config: dict = None) -> bool:
            return False

    if ts_key in {"web", "image_gen", "tts", "browser"}:
-        features = get_nous_subscription_features(config)
+        features = get_nous_subscription_features(config, force_fresh=force_fresh)
        feature = features.features.get(ts_key)
        if feature and (feature.available or feature.managed_by_nous):
            return True
@@ -1422,7 +1447,7 @@ def _toolset_has_keys(ts_key: str, config: dict = None) -> bool:
    # Check TOOL_CATEGORIES first (provider-aware)
    cat = TOOL_CATEGORIES.get(ts_key)
    if cat:
-        for provider in _visible_providers(cat, config):
+        for provider in _visible_providers(cat, config, force_fresh=force_fresh):
            env_vars = provider.get("env_vars", [])
            if not env_vars:
                return True  # No-key provider (e.g. Local Browser, Edge TTS)
@@ -1493,7 +1518,13 @@ def _estimate_tool_tokens() -> Dict[str, int]:
    return _tool_token_cache


-def _prompt_toolset_checklist(platform_label: str, enabled: Set[str], platform: str = "cli") -> Set[str]:
+def _prompt_toolset_checklist(
+    platform_label: str,
+    enabled: Set[str],
+    platform: str = "cli",
+    *,
+    force_fresh: bool = True,
+) -> Set[str]:
    """Multi-select checklist of toolsets. Returns set of selected toolset keys."""
    from hermes_cli.curses_ui import curses_checklist
    from toolsets import resolve_toolset
@@ -1511,7 +1542,10 @@ def _prompt_toolset_checklist(platform_label: str, enabled: Set[str], platform:
    labels = []
    for ts_key, ts_label, ts_desc in effective:
        suffix = ""
-        if not _toolset_has_keys(ts_key) and (TOOL_CATEGORIES.get(ts_key) or TOOLSET_ENV_REQUIREMENTS.get(ts_key)):
+        if (
+            not _toolset_has_keys(ts_key, force_fresh=force_fresh)
+            and (TOOL_CATEGORIES.get(ts_key) or TOOLSET_ENV_REQUIREMENTS.get(ts_key))
+        ):
            suffix = "  [no API key]"
        labels.append(f"{ts_label}  ({ts_desc}){suffix}")

@@ -1547,7 +1581,12 @@ def _prompt_toolset_checklist(platform_label: str, enabled: Set[str], platform:

 # ─── Provider-Aware Configuration ────────────────────────────────────────────

-def _configure_toolset(ts_key: str, config: dict):
+def _configure_toolset(
+    ts_key: str,
+    config: dict,
+    *,
+    force_fresh: bool = True,
+):
    """Configure a toolset - provider selection + API keys.
    
    Uses TOOL_CATEGORIES for provider-aware config, falls back to simple
@@ -1556,7 +1595,7 @@ def _configure_toolset(ts_key: str, config: dict):
    cat = TOOL_CATEGORIES.get(ts_key)

    if cat:
-        _configure_tool_category(ts_key, cat, config)
+        _configure_tool_category(ts_key, cat, config, force_fresh=force_fresh)
    else:
        # Simple fallback for vision, moa, etc.
        _configure_simple_requirements(ts_key)
@@ -1809,12 +1848,22 @@ def _plugin_tts_providers() -> list[dict]:
    return rows


-def _visible_providers(cat: dict, config: dict) -> list[dict]:
+def _visible_providers(
+    cat: dict,
+    config: dict,
+    *,
+    force_fresh: bool = False,
+) -> list[dict]:
    """Return provider entries visible for the current auth/config state."""
-    features = get_nous_subscription_features(config)
+    features = get_nous_subscription_features(config, force_fresh=force_fresh)
+    managed_available = bool(
+        features.account_info
+        and features.account_info.logged_in
+        and features.account_info.paid_service_access is True
+    )
    visible = []
    for provider in cat.get("providers", []):
-        if provider.get("managed_nous_feature") and not managed_nous_tools_enabled():
+        if provider.get("managed_nous_feature") and not managed_available:
            continue
        if provider.get("requires_nous_auth") and not features.nous_auth_present:
            continue
@@ -1855,6 +1904,31 @@ def _visible_providers(cat: dict, config: dict) -> list[dict]:
    return visible


+def _hidden_nous_gateway_message(
+    cat: dict,
+    config: dict,
+    capability: str,
+    *,
+    force_fresh: bool = False,
+) -> str:
+    """Return a reason when a category's Nous provider is hidden."""
+    features = get_nous_subscription_features(config, force_fresh=force_fresh)
+    managed_available = bool(
+        features.account_info
+        and features.account_info.logged_in
+        and features.account_info.paid_service_access is True
+    )
+    if managed_available:
+        return ""
+    if not any(p.get("managed_nous_feature") for p in cat.get("providers", [])):
+        return ""
+    message = format_nous_portal_entitlement_message(
+        features.account_info,
+        capability=capability,
+    )
+    return message or ""
+
+
 _POST_SETUP_INSTALLED: dict = {
    # post_setup_key -> predicate(): True when the install side-effect
    # is already satisfied. Used by `_toolset_needs_configuration_prompt`
@@ -1886,17 +1960,22 @@ def _post_setup_already_installed(post_setup_key: str) -> bool:
        return True


-def _toolset_needs_configuration_prompt(ts_key: str, config: dict) -> bool:
+def _toolset_needs_configuration_prompt(
+    ts_key: str,
+    config: dict,
+    *,
+    force_fresh: bool = False,
+) -> bool:
    """Return True when enabling this toolset should open provider setup."""
    cat = TOOL_CATEGORIES.get(ts_key)
    if not cat:
-        return not _toolset_has_keys(ts_key, config)
+        return not _toolset_has_keys(ts_key, config, force_fresh=force_fresh)

    # If any visible provider has a registered post_setup install-state
    # check that hasn't been satisfied (e.g. cua-driver binary not on
    # PATH yet), force the configuration flow so `_configure_provider`
    # invokes `_run_post_setup` and the install actually runs.
-    for provider in _visible_providers(cat, config):
+    for provider in _visible_providers(cat, config, force_fresh=force_fresh):
        post_setup = provider.get("post_setup")
        if post_setup and not _post_setup_already_installed(post_setup):
            return True
@@ -1947,14 +2026,26 @@ def _toolset_needs_configuration_prompt(ts_key: str, config: dict) -> bool:
            pass
        return True

-    return not _toolset_has_keys(ts_key, config)
+    return not _toolset_has_keys(ts_key, config, force_fresh=force_fresh)


-def _configure_tool_category(ts_key: str, cat: dict, config: dict):
+def _configure_tool_category(
+    ts_key: str,
+    cat: dict,
+    config: dict,
+    *,
+    force_fresh: bool = True,
+):
    """Configure a tool category with provider selection."""
    icon = cat.get("icon", "")
    name = cat["name"]
-    providers = _visible_providers(cat, config)
+    providers = _visible_providers(cat, config, force_fresh=force_fresh)
+    hidden_nous_message = _hidden_nous_gateway_message(
+        cat,
+        config,
+        f"the Nous Subscription provider for {name}",
+        force_fresh=force_fresh,
+    )

    # Check Python version requirement
    if cat.get("requires_python"):
@@ -1975,7 +2066,10 @@ def _configure_tool_category(ts_key: str, cat: dict, config: dict):
        # For single-provider tools, show a note if available
        if cat.get("setup_note"):
            _print_info(f"  {cat['setup_note']}")
-        _configure_provider(provider, config)
+        if hidden_nous_message:
+            for line in hidden_nous_message.splitlines():
+                _print_warning(f"  {line}")
+        _configure_provider(provider, config, force_fresh=force_fresh)
    else:
        # Multiple providers - let user choose
        print()
@@ -1984,6 +2078,9 @@ def _configure_tool_category(ts_key: str, cat: dict, config: dict):
        print(color(f"  --- {icon} {name} - {title} ---", Colors.CYAN))
        if cat.get("setup_note"):
            _print_info(f"  {cat['setup_note']}")
+        if hidden_nous_message:
+            for line in hidden_nous_message.splitlines():
+                _print_warning(f"  {line}")
        print()

        # Plain text labels only (no ANSI codes in menu items)
@@ -1992,7 +2089,10 @@ def _configure_tool_category(ts_key: str, cat: dict, config: dict):
        # obvious which options cost extra vs. cost nothing on top of Nous.
        try:
            _nous_logged_in = bool(
-                get_nous_subscription_features(config).nous_auth_present
+                get_nous_subscription_features(
+                    config,
+                    force_fresh=force_fresh,
+                ).nous_auth_present
            )
        except Exception:
            _nous_logged_in = False
@@ -2004,7 +2104,7 @@ def _configure_tool_category(ts_key: str, cat: dict, config: dict):
            configured = ""
            env_vars = p.get("env_vars", [])
            if not env_vars or all(get_env_value(v["key"]) for v in env_vars):
-                if _is_provider_active(p, config):
+                if _is_provider_active(p, config, force_fresh=force_fresh):
                    configured = " [active]"
                elif not env_vars:
                    configured = ""
@@ -2024,7 +2124,11 @@ def _configure_tool_category(ts_key: str, cat: dict, config: dict):
        provider_choices.append("Skip — keep defaults / configure later")

        # Detect current provider as default
-        default_idx = _detect_active_provider_index(providers, config)
+        default_idx = _detect_active_provider_index(
+            providers,
+            config,
+            force_fresh=force_fresh,
+        )

        provider_idx = _prompt_choice(f"  {title}:", provider_choices, default_idx)

@@ -2033,10 +2137,15 @@ def _configure_tool_category(ts_key: str, cat: dict, config: dict):
            _print_info(f"  Skipped {name}")
            return

-        _configure_provider(providers[provider_idx], config)
+        _configure_provider(providers[provider_idx], config, force_fresh=force_fresh)


-def _is_provider_active(provider: dict, config: dict) -> bool:
+def _is_provider_active(
+    provider: dict,
+    config: dict,
+    *,
+    force_fresh: bool = False,
+) -> bool:
    """Check if a provider entry matches the currently active config."""
    plugin_name = provider.get("image_gen_plugin_name")
    if plugin_name:
@@ -2050,7 +2159,7 @@ def _is_provider_active(provider: dict, config: dict) -> bool:

    managed_feature = provider.get("managed_nous_feature")
    if managed_feature:
-        features = get_nous_subscription_features(config)
+        features = get_nous_subscription_features(config, force_fresh=force_fresh)
        feature = features.features.get(managed_feature)
        if feature is None:
            return False
@@ -2097,10 +2206,15 @@ def _is_provider_active(provider: dict, config: dict) -> bool:
    return False


-def _detect_active_provider_index(providers: list, config: dict) -> int:
+def _detect_active_provider_index(
+    providers: list,
+    config: dict,
+    *,
+    force_fresh: bool = False,
+) -> int:
    """Return the index of the currently active provider, or 0."""
    for i, p in enumerate(providers):
-        if _is_provider_active(p, config):
+        if _is_provider_active(p, config, force_fresh=force_fresh):
            return i
        # Fallback: env vars present → likely configured
        env_vars = p.get("env_vars", [])
@@ -2403,15 +2517,29 @@ def _select_plugin_video_gen_provider(plugin_name: str, config: dict) -> None:
    _configure_videogen_model_for_plugin(plugin_name, config)


-def _configure_provider(provider: dict, config: dict):
+def _configure_provider(
+    provider: dict,
+    config: dict,
+    *,
+    force_fresh: bool = True,
+):
    """Configure a single provider - prompt for API keys and set config."""
    env_vars = provider.get("env_vars", [])
    managed_feature = provider.get("managed_nous_feature")

    if provider.get("requires_nous_auth"):
-        features = get_nous_subscription_features(config)
-        if not features.nous_auth_present:
-            _print_warning("  Nous Subscription is only available after logging into Nous Portal.")
+        features = get_nous_subscription_features(config, force_fresh=force_fresh)
+        entitled = bool(
+            features.account_info and features.account_info.paid_service_access is True
+        )
+        if not features.nous_auth_present or not entitled:
+            message = format_nous_portal_entitlement_message(
+                features.account_info,
+                capability=f"{provider.get('name', 'Nous Subscription')}",
+            )
+            _print_warning(
+                f"  {message or 'Nous Subscription is only available after logging into Nous Portal.'}"
+            )
            return

    # Set TTS provider in config if applicable
@@ -2501,7 +2629,10 @@ def _configure_provider(provider: dict, config: dict):
                    _has_managed_sibling = True
                    break
            if _has_managed_sibling:
-                _features = get_nous_subscription_features(config)
+                _features = get_nous_subscription_features(
+                    config,
+                    force_fresh=force_fresh,
+                )
                _show_portal_hint = not _features.nous_auth_present
        except Exception:
            _show_portal_hint = False
@@ -2619,7 +2750,11 @@ def _configure_simple_requirements(ts_key: str):
            _print_warning("    Skipped")


-def _reconfigure_tool(config: dict):
+def _reconfigure_tool(
+    config: dict,
+    *,
+    force_fresh: bool = True,
+):
    """Let user reconfigure an existing tool's provider or API key."""
    # Build list of configurable tools that are currently set up
    configurable = []
@@ -2627,7 +2762,10 @@ def _reconfigure_tool(config: dict):
        cat = TOOL_CATEGORIES.get(ts_key)
        reqs = TOOLSET_ENV_REQUIREMENTS.get(ts_key)
        if cat or reqs:
-            if _toolset_has_keys(ts_key, config) or _toolset_enabled_for_reconfigure(ts_key, config):
+            if (
+                _toolset_has_keys(ts_key, config, force_fresh=force_fresh)
+                or _toolset_enabled_for_reconfigure(ts_key, config)
+            ):
                configurable.append((ts_key, ts_label))

    if not configurable:
@@ -2646,7 +2784,12 @@ def _reconfigure_tool(config: dict):
    cat = TOOL_CATEGORIES.get(ts_key)

    if cat:
-        _configure_tool_category_for_reconfig(ts_key, cat, config)
+        _configure_tool_category_for_reconfig(
+            ts_key,
+            cat,
+            config,
+            force_fresh=force_fresh,
+        )
    else:
        _reconfigure_simple_requirements(ts_key)

@@ -2675,20 +2818,38 @@ def _toolset_enabled_for_reconfigure(ts_key: str, config: dict) -> bool:
    return False


-def _configure_tool_category_for_reconfig(ts_key: str, cat: dict, config: dict):
+def _configure_tool_category_for_reconfig(
+    ts_key: str,
+    cat: dict,
+    config: dict,
+    *,
+    force_fresh: bool = True,
+):
    """Reconfigure a tool category - provider selection + API key update."""
    icon = cat.get("icon", "")
    name = cat["name"]
-    providers = _visible_providers(cat, config)
+    providers = _visible_providers(cat, config, force_fresh=force_fresh)
+    hidden_nous_message = _hidden_nous_gateway_message(
+        cat,
+        config,
+        f"the Nous Subscription provider for {name}",
+        force_fresh=force_fresh,
+    )

    if len(providers) == 1:
        provider = providers[0]
        print()
        print(color(f"  --- {icon} {name} ({provider['name']}) ---", Colors.CYAN))
-        _reconfigure_provider(provider, config)
+        if hidden_nous_message:
+            for line in hidden_nous_message.splitlines():
+                _print_warning(f"  {line}")
+        _reconfigure_provider(provider, config, force_fresh=force_fresh)
    else:
        print()
        print(color(f"  --- {icon} {name} - Choose a provider ---", Colors.CYAN))
+        if hidden_nous_message:
+            for line in hidden_nous_message.splitlines():
+                _print_warning(f"  {line}")
        print()

        provider_choices = []
@@ -2698,7 +2859,7 @@ def _configure_tool_category_for_reconfig(ts_key: str, cat: dict, config: dict):
            configured = ""
            env_vars = p.get("env_vars", [])
            if not env_vars or all(get_env_value(v["key"]) for v in env_vars):
-                if _is_provider_active(p, config):
+                if _is_provider_active(p, config, force_fresh=force_fresh):
                    configured = " [active]"
                elif not env_vars:
                    configured = ""
@@ -2706,21 +2867,43 @@ def _configure_tool_category_for_reconfig(ts_key: str, cat: dict, config: dict):
                    configured = " [configured]"
            provider_choices.append(f"{p['name']}{badge}{tag}{configured}")

-        default_idx = _detect_active_provider_index(providers, config)
+        default_idx = _detect_active_provider_index(
+            providers,
+            config,
+            force_fresh=force_fresh,
+        )

        provider_idx = _prompt_choice("  Select provider:", provider_choices, default_idx)
-        _reconfigure_provider(providers[provider_idx], config)
+        _reconfigure_provider(
+            providers[provider_idx],
+            config,
+            force_fresh=force_fresh,
+        )


-def _reconfigure_provider(provider: dict, config: dict):
+def _reconfigure_provider(
+    provider: dict,
+    config: dict,
+    *,
+    force_fresh: bool = True,
+):
    """Reconfigure a provider - update API keys."""
    env_vars = provider.get("env_vars", [])
    managed_feature = provider.get("managed_nous_feature")

    if provider.get("requires_nous_auth"):
-        features = get_nous_subscription_features(config)
-        if not features.nous_auth_present:
-            _print_warning("  Nous Subscription is only available after logging into Nous Portal.")
+        features = get_nous_subscription_features(config, force_fresh=force_fresh)
+        entitled = bool(
+            features.account_info and features.account_info.paid_service_access is True
+        )
+        if not features.nous_auth_present or not entitled:
+            message = format_nous_portal_entitlement_message(
+                features.account_info,
+                capability=f"{provider.get('name', 'Nous Subscription')}",
+            )
+            _print_warning(
+                f"  {message or 'Nous Subscription is only available after logging into Nous Portal.'}"
+            )
            return

    if provider.get("tts_provider"):
@@ -2921,11 +3104,11 @@ def tools_command(args=None, first_install: bool = False, config: dict = None):
            auto_configured = apply_nous_managed_defaults(
                config,
                enabled_toolsets=new_enabled,
+                force_fresh=True,
            )
-            if managed_nous_tools_enabled():
-                for ts_key in sorted(auto_configured):
-                    label = next((l for k, l, _ in CONFIGURABLE_TOOLSETS if k == ts_key), ts_key)
-                    print(color(f"  ✓ {label}: using your Nous subscription defaults", Colors.GREEN))
+            for ts_key in sorted(auto_configured):
+                label = next((l for k, l, _ in CONFIGURABLE_TOOLSETS if k == ts_key), ts_key)
+                print(color(f"  ✓ {label}: using your Nous subscription defaults", Colors.GREEN))

            # Walk through ALL selected tools that have provider options or
            # need API keys.  This ensures browser (Local vs Browserbase),
@@ -2993,7 +3176,7 @@ def tools_command(args=None, first_install: bool = False, config: dict = None):

        # "Reconfigure" selected
        if idx == _reconfig_idx:
-            _reconfigure_tool(config)
+            _reconfigure_tool(config, force_fresh=True)
            print()
            continue

@@ -3009,7 +3192,11 @@ def tools_command(args=None, first_install: bool = False, config: dict = None):
            all_current = set()
            for pk in platform_keys:
                all_current |= _get_platform_tools(config, pk, include_default_mcp_servers=False)
-            new_enabled = _prompt_toolset_checklist("All platforms", all_current)
+            new_enabled = _prompt_toolset_checklist(
+                "All platforms",
+                all_current,
+                force_fresh=True,
+            )
            if new_enabled != all_current:
                for pk in platform_keys:
                    prev = _get_platform_tools(config, pk, include_default_mcp_servers=False)
@@ -3027,7 +3214,11 @@ def tools_command(args=None, first_install: bool = False, config: dict = None):
                    # Configure API keys for newly enabled tools
                    for ts_key in sorted(added):
                        if (TOOL_CATEGORIES.get(ts_key) or TOOLSET_ENV_REQUIREMENTS.get(ts_key)):
-                            if _toolset_needs_configuration_prompt(ts_key, config):
+                            if _toolset_needs_configuration_prompt(
+                                ts_key,
+                                config,
+                                force_fresh=True,
+                            ):
                                _configure_toolset(ts_key, config)
                    _save_platform_tools(config, pk, new_enabled)
                save_config(config)
@@ -3049,7 +3240,11 @@ def tools_command(args=None, first_install: bool = False, config: dict = None):
        current_enabled = _get_platform_tools(config, pkey, include_default_mcp_servers=False)

        # Show checklist
-        new_enabled = _prompt_toolset_checklist(pinfo["label"], current_enabled)
+        new_enabled = _prompt_toolset_checklist(
+            pinfo["label"],
+            current_enabled,
+            force_fresh=True,
+        )

        if new_enabled != current_enabled:
            added = new_enabled - current_enabled
@@ -3067,7 +3262,11 @@ def tools_command(args=None, first_install: bool = False, config: dict = None):
            # Configure newly enabled toolsets that need API keys
            for ts_key in sorted(added):
                if (TOOL_CATEGORIES.get(ts_key) or TOOLSET_ENV_REQUIREMENTS.get(ts_key)):
-                    if _toolset_needs_configuration_prompt(ts_key, config):
+                    if _toolset_needs_configuration_prompt(
+                        ts_key,
+                        config,
+                        force_fresh=True,
+                    ):
                        _configure_toolset(ts_key, config)

            _save_platform_tools(config, pkey, new_enabled)
@@ -3116,6 +3116,58 @@ class SessionDB:

    # ── Space reclamation ──

+    # FTS5 virtual tables whose b-tree segments we merge on optimize. The
+    # trigram table is created lazily / may be disabled, so we probe before
+    # touching it (see optimize_fts).
+    _FTS_TABLES = ("messages_fts", "messages_fts_trigram")
+
+    def _fts_table_exists(self, name: str) -> bool:
+        """True if an FTS5 virtual table is queryable in this DB."""
+        try:
+            self._conn.execute(f"SELECT 1 FROM {name} LIMIT 0")
+            return True
+        except sqlite3.OperationalError:
+            return False
+
+    def optimize_fts(self) -> int:
+        """Merge fragmented FTS5 b-tree segments into one per index.
+
+        FTS5 indexes grow as a series of incremental segments — one per
+        ``INSERT`` batch driven by the message triggers. Over tens of
+        thousands of messages these segments accumulate, which both bloats
+        the ``*_data`` shadow tables and slows ``MATCH`` queries that must
+        scan every segment. The special ``'optimize'`` command rewrites each
+        index as a single merged segment.
+
+        This is purely a maintenance operation — it changes neither search
+        results nor ``snippet()`` output, only on-disk layout and query
+        speed. It is complementary to VACUUM: ``optimize`` compacts the FTS
+        index internally, then VACUUM returns the freed pages to the OS.
+
+        Skips any FTS table that does not exist (e.g. the trigram index when
+        disabled via ``HERMES_DISABLE_FTS_TRIGRAM`` or not yet created), so
+        it is safe to call unconditionally.
+
+        Returns the number of FTS indexes that were optimized.
+        """
+        optimized = 0
+        with self._lock:
+            for tbl in self._FTS_TABLES:
+                if not self._fts_table_exists(tbl):
+                    continue
+                try:
+                    # The column name in the INSERT must match the table name
+                    # for FTS5 special commands.
+                    self._conn.execute(
+                        f"INSERT INTO {tbl}({tbl}) VALUES('optimize')"
+                    )
+                    optimized += 1
+                except sqlite3.OperationalError as exc:
+                    logger.warning(
+                        "FTS optimize failed for %s: %s", tbl, exc
+                    )
+        return optimized
+
    def vacuum(self) -> None:
        """Run VACUUM to reclaim disk space after large deletes.

@@ -3129,7 +3181,17 @@ class SessionDB:
        exclusive lock, so callers must ensure no other writers are
        active. Safe to call at startup before the gateway/CLI starts
        serving traffic.
+
+        FTS5 segments are merged first via :meth:`optimize_fts` so the
+        subsequent VACUUM reclaims the pages freed by the merge. This is a
+        layout-only optimization — search results are unchanged.
        """
+        # Merge FTS5 segments before VACUUM so the freed pages are returned
+        # to the OS in the same pass. optimize_fts() manages its own lock.
+        try:
+            self.optimize_fts()
+        except Exception as exc:
+            logger.warning("FTS optimize before VACUUM failed: %s", exc)
        # VACUUM cannot be executed inside a transaction.
        with self._lock:
            # Best-effort WAL checkpoint first, then VACUUM.
@@ -174,7 +174,7 @@ def _load_engine_from_dir(engine_dir: Path) -> Optional["ContextEngine"]:

    # Try register(ctx) pattern first (how plugins are written)
    if hasattr(mod, "register"):
-        collector = _EngineCollector()
+        collector = _EngineCollector(engine_name=name)
        try:
            mod.register(collector)
            if collector.engine:
@@ -197,14 +197,80 @@ def _load_engine_from_dir(engine_dir: Path) -> Optional["ContextEngine"]:


 class _EngineCollector:
-    """Fake plugin context that captures register_context_engine calls."""
+    """Fake plugin context that captures register_context_engine calls.

-    def __init__(self):
+    Plugin context engines using the standard ``register(ctx)`` pattern may
+    also call ``ctx.register_command(...)`` to expose slash commands (e.g.
+    ``/lcm``). Forward those to the global plugin command registry so they
+    behave identically to commands registered by normal plugins.
+    """
+
+    def __init__(self, engine_name: str = ""):
        self.engine = None
+        self._engine_name = engine_name or "context_engine"
+        self._registered_commands: list[str] = []

    def register_context_engine(self, engine):
        self.engine = engine

+    def register_command(
+        self,
+        name: str,
+        handler,
+        description: str = "",
+        args_hint: str = "",
+    ) -> None:
+        """Forward to the global plugin command registry."""
+        clean = (name or "").lower().strip().lstrip("/").replace(" ", "-")
+        if not clean:
+            logger.warning(
+                "Context engine '%s' tried to register a command with an empty name.",
+                self._engine_name,
+            )
+            return
+
+        # Reject conflicts with built-in commands.
+        try:
+            from hermes_cli.commands import resolve_command
+            if resolve_command(clean) is not None:
+                logger.warning(
+                    "Context engine '%s' tried to register command '/%s' which conflicts "
+                    "with a built-in command. Skipping.",
+                    self._engine_name, clean,
+                )
+                return
+        except Exception:
+            pass
+
+        try:
+            from hermes_cli.plugins import get_plugin_manager
+            manager = get_plugin_manager()
+            if clean in manager._plugin_commands:
+                # Don't clobber a regular plugin's command — same conflict
+                # policy the plugin system uses for plugin-vs-plugin collisions.
+                logger.warning(
+                    "Context engine '%s' tried to register command '/%s' which "
+                    "is already registered by a plugin. Skipping.",
+                    self._engine_name, clean,
+                )
+                return
+            manager._plugin_commands[clean] = {
+                "handler": handler,
+                "description": description or "Context engine command",
+                "plugin": f"context-engine:{self._engine_name}",
+                "args_hint": (args_hint or "").strip(),
+            }
+            self._registered_commands.append(clean)
+            logger.debug(
+                "Context engine '%s' registered command: /%s",
+                self._engine_name, clean,
+            )
+        except Exception as exc:
+            logger.debug(
+                "Context engine '%s' could not register /%s: %s",
+                self._engine_name, clean, exc,
+            )
+
    # No-op for other registration methods
    def register_tool(self, *args, **kwargs):
        pass
@@ -67,10 +67,6 @@
  gap: 0.75rem;
  align-items: start;
  overflow-x: auto;
-  scrollbar-width: none;
-}
-.hermes-kanban-columns::-webkit-scrollbar {
-  display: none;
 }

 .hermes-kanban-column {
@@ -143,6 +139,8 @@
  gap: 0.45rem;
  overflow-y: auto;
  padding-right: 0.1rem;
+  flex: 1;
+  min-height: 0;
 }

 .hermes-kanban-empty {
@@ -43,6 +43,8 @@ class OpenRouterProfile(ProviderProfile):
        self, *, session_id: str | None = None, **context: Any
    ) -> dict[str, Any]:
        body: dict[str, Any] = {}
+        if session_id:
+            body["session_id"] = session_id
        prefs = context.get("provider_preferences")
        if prefs:
            body["provider"] = prefs
@@ -4811,14 +4811,19 @@ class DiscordAdapter(BasePlatformAdapter):
        # to keep the partition rule clean.
        _channel_context = None
        _is_dm = isinstance(message.channel, discord.DMChannel)
-        if not _is_dm:
-            _needed_mention = (
-                require_mention
-                and not is_free_channel
-                and not in_bot_thread
-            )
-            _backfill_enabled = self._discord_history_backfill()
-            if _needed_mention and _backfill_enabled:
+        if not _is_dm and self._discord_history_backfill():
+            # Run backfill when there's a real gap to fill:
+            #   - mention-gated channels with no free-response override
+            #     (messages between bot turns aren't in the transcript)
+            #   - any thread (in_bot_thread bypasses the mention check, but
+            #     processing-window gaps and post-restart context still need
+            #     recovery)
+            # DMs skip entirely because every DM message triggers the bot,
+            # so the session transcript already has everything.
+            # Auto-threaded messages also skip — we just created the thread,
+            # there's nothing prior to backfill.
+            _has_mention_gap = require_mention and not is_free_channel and not in_bot_thread
+            if (_has_mention_gap or is_thread) and auto_threaded_channel is None:
                _backfill_text = await self._fetch_channel_context(
                    message.channel, before=message,
                )
@@ -196,9 +196,13 @@ def _raise_web_backend_configuration_error() -> None:
    )
    if _wt.managed_nous_tools_enabled():
        message += (
-            " With your Nous subscription you can also use the Tool Gateway — "
+            " With your Nous subscription you can also use the Tool Gateway. "
            "run `hermes tools` and select Nous Subscription as the web provider."
        )
+    else:
+        message += " " + _wt.nous_tool_gateway_unavailable_message(
+            "managed Firecrawl web tools",
+        )
    raise ValueError(message)


@@ -381,9 +385,6 @@ class FirecrawlWebSearchProvider(WebSearchProvider):
    def supports_extract(self) -> bool:
        return True

-    def supports_crawl(self) -> bool:
-        return True
-
    def search(self, query: str, limit: int = 5) -> Dict[str, Any]:
        """Execute a Firecrawl search.

@@ -575,192 +576,12 @@ class FirecrawlWebSearchProvider(WebSearchProvider):

        return results

-    async def crawl(self, url: str, **kwargs: Any) -> Dict[str, Any]:
-        """Crawl a seed URL via Firecrawl's ``/crawl`` endpoint.
-
-        Sync SDK call wrapped in ``asyncio.to_thread`` because the dispatcher
-        in :func:`tools.web_tools.web_crawl_tool` is async and runs LLM
-        post-processing on the response. The dispatcher gates the seed URL
-        against SSRF + website-access policy before calling us; this method
-        re-checks every crawled page's URL against the policy after the
-        crawl returns to catch redirected pages that map to a blocked host.
-
-        Accepted kwargs (others ignored for forward compat):
-          - ``instructions``: str — logged then dropped. Firecrawl's /crawl
-            endpoint does NOT accept natural-language instructions (that's
-            an /extract feature), so we record the value for debugging and
-            proceed without it. Tavily's crawl IS instruction-aware; this
-            divergence is documented in both plugins' docstrings.
-          - ``limit``: int — max pages to crawl (default 20).
-          - ``depth``: str — accepted for API parity with Tavily; ignored
-            by Firecrawl's crawl endpoint.
-
-        Returns ``{"results": [...]}`` matching the shape that
-        :func:`tools.web_tools.web_crawl_tool`'s shared LLM-summarization
-        path expects. Per-page failures (policy block on redirected URL,
-        bad response shape) are included as items with an ``error`` field
-        rather than raising.
-        """
-        try:
-            from tools.interrupt import is_interrupted
-
-            if is_interrupted():
-                return {"results": [{"url": url, "title": "", "content": "", "error": "Interrupted"}]}
-
-            instructions = kwargs.get("instructions")
-            limit = kwargs.get("limit", 20)
-
-            # Firecrawl's /crawl endpoint does not accept natural-language
-            # instructions (that's an /extract feature). Log + drop.
-            if instructions:
-                logger.info(
-                    "Firecrawl crawl: 'instructions' parameter ignored "
-                    "(not supported by Firecrawl /crawl)"
-                )
-
-            logger.info("Firecrawl crawl: %s (limit=%d)", url, limit)
-
-            crawl_params = {
-                "limit": limit,
-                "scrape_options": {"formats": ["markdown"]},
-            }
-
-            # The SDK call is sync; run in a thread so we don't block the
-            # gateway event loop on a multi-page crawl.
-            crawl_result = await asyncio.to_thread(
-                _get_firecrawl_client().crawl,
-                url=url,
-                **crawl_params,
-            )
-
-            # CrawlJob normalization across SDK + direct + gateway shapes.
-            data_list: List[Any] = []
-            if hasattr(crawl_result, "data"):
-                data_list = crawl_result.data if crawl_result.data else []
-                logger.info(
-                    "Firecrawl crawl status: %s, %d pages",
-                    getattr(crawl_result, "status", "unknown"),
-                    len(data_list),
-                )
-            elif isinstance(crawl_result, dict) and "data" in crawl_result:
-                data_list = crawl_result.get("data", []) or []
-            else:
-                logger.warning(
-                    "Firecrawl crawl: unexpected result type %r",
-                    type(crawl_result).__name__,
-                )
-
-            pages: List[Dict[str, Any]] = []
-            for item in data_list:
-                # Pydantic model | typed object | dict — handle all shapes.
-                content_markdown = None
-                content_html = None
-                metadata: Any = {}
-
-                if hasattr(item, "model_dump"):
-                    item_dict = item.model_dump()
-                    content_markdown = item_dict.get("markdown")
-                    content_html = item_dict.get("html")
-                    metadata = item_dict.get("metadata", {})
-                elif hasattr(item, "__dict__"):
-                    content_markdown = getattr(item, "markdown", None)
-                    content_html = getattr(item, "html", None)
-                    metadata_obj = getattr(item, "metadata", {})
-                    if hasattr(metadata_obj, "model_dump"):
-                        metadata = metadata_obj.model_dump()
-                    elif hasattr(metadata_obj, "__dict__"):
-                        metadata = metadata_obj.__dict__
-                    elif isinstance(metadata_obj, dict):
-                        metadata = metadata_obj
-                    else:
-                        metadata = {}
-                elif isinstance(item, dict):
-                    content_markdown = item.get("markdown")
-                    content_html = item.get("html")
-                    metadata = item.get("metadata", {})
-
-                # Ensure metadata is a plain dict.
-                if not isinstance(metadata, dict):
-                    if hasattr(metadata, "model_dump"):
-                        metadata = metadata.model_dump()
-                    elif hasattr(metadata, "__dict__"):
-                        metadata = metadata.__dict__
-                    else:
-                        metadata = {}
-
-                page_url = metadata.get(
-                    "sourceURL", metadata.get("url", "Unknown URL")
-                )
-                title = metadata.get("title", "")
-
-                # Per-page policy re-check (catches blocked redirects).
-                page_blocked = check_website_access(page_url)
-                if page_blocked:
-                    logger.info(
-                        "Blocked crawled page %s by rule %s",
-                        page_blocked["host"],
-                        page_blocked["rule"],
-                    )
-                    pages.append(
-                        {
-                            "url": page_url,
-                            "title": title,
-                            "content": "",
-                            "raw_content": "",
-                            "error": page_blocked["message"],
-                            "blocked_by_policy": {
-                                "host": page_blocked["host"],
-                                "rule": page_blocked["rule"],
-                                "source": page_blocked["source"],
-                            },
-                        }
-                    )
-                    continue
-
-                content = content_markdown or content_html or ""
-                pages.append(
-                    {
-                        "url": page_url,
-                        "title": title,
-                        "content": content,
-                        "raw_content": content,
-                        "metadata": metadata,
-                    }
-                )
-
-            return {"results": pages}
-        except ValueError as exc:
-            return {"results": [{"url": url, "title": "", "content": "", "error": str(exc)}]}
-        except ImportError as exc:
-            return {
-                "results": [
-                    {
-                        "url": url,
-                        "title": "",
-                        "content": "",
-                        "error": f"Firecrawl SDK not installed: {exc}",
-                    }
-                ]
-            }
-        except Exception as exc:  # noqa: BLE001
-            logger.warning("Firecrawl crawl error: %s", exc)
-            return {
-                "results": [
-                    {
-                        "url": url,
-                        "title": "",
-                        "content": "",
-                        "error": f"Firecrawl crawl failed: {exc}",
-                    }
-                ]
-            }
-
    def get_setup_schema(self) -> Dict[str, Any]:
        return {
            "name": "Firecrawl",
            "badge": "paid · optional gateway",
            "tag": (
-                "Full search + extract + crawl; supports direct API and "
+                "Full search + extract; supports direct API and "
                "Nous tool-gateway routing."
            ),
            "env_vars": [
@@ -1,9 +1,4 @@
-"""Tavily web search + extract + crawl plugin — bundled, auto-loaded.
-
-First plugin in this codebase to advertise ``supports_crawl=True``. The
-crawl method maps to Tavily's ``/crawl`` endpoint, which accepts a seed
-URL plus optional instructions and extract depth.
-"""
+"""Tavily web search + extract plugin — bundled, auto-loaded."""

 from __future__ import annotations

@@ -1,33 +1,24 @@
-"""Tavily web search + content extraction + crawl — plugin form.
+"""Tavily web search + content extraction — plugin form.

-Subclasses :class:`agent.web_search_provider.WebSearchProvider`. Three
+Subclasses :class:`agent.web_search_provider.WebSearchProvider`. Two
 capabilities advertised:

 - ``supports_search()``  -> True (Tavily ``/search``)
 - ``supports_extract()`` -> True (Tavily ``/extract``)
- ``supports_crawl()``   -> True (Tavily ``/crawl``) — sync HTTP crawl;
-  Firecrawl also advertises ``supports_crawl=True`` (async)

-All three are sync — the underlying call is ``httpx.post(...)``. The
-dispatcher in :func:`tools.web_tools.web_crawl_tool` (which is itself
-async) will run sync providers in a thread when appropriate.
+Both are sync — the underlying call is ``httpx.post(...)``.

 Config keys this provider responds to::

    web:
      search_backend: "tavily"     # explicit per-capability
      extract_backend: "tavily"    # explicit per-capability
-      crawl_backend: "tavily"      # explicit per-capability
-      backend: "tavily"            # shared fallback for all three
+      backend: "tavily"            # shared fallback for both

 Env vars::

    TAVILY_API_KEY=...           # https://app.tavily.com/home (required)
    TAVILY_BASE_URL=...          # optional override of https://api.tavily.com
-
-Auth note: Tavily uses ``api_key`` in the JSON body for /search and
-/extract, but **also requires** ``Authorization: Bearer <key>`` for /crawl
-(body-only auth returns 401 on /crawl). The plugin handles both.
 """

 from __future__ import annotations
@@ -63,11 +54,7 @@ def _tavily_request(endpoint: str, payload: Dict[str, Any]) -> Dict[str, Any]:
    url = f"{base_url}/{endpoint.lstrip('/')}"
    logger.info("Tavily %s request to %s", endpoint, url)

-    # Tavily /crawl requires Bearer header auth in addition to body auth;
-    # /search and /extract are body-only.
-    headers = {"Authorization": f"Bearer {api_key}"} if endpoint.strip("/") == "crawl" else {}
-
-    response = httpx.post(url, json=payload, headers=headers, timeout=60)
+    response = httpx.post(url, json=payload, timeout=60)
    response.raise_for_status()
    return response.json()

@@ -90,7 +77,7 @@ def _normalize_tavily_search_results(response: Dict[str, Any]) -> Dict[str, Any]
 def _normalize_tavily_documents(
    response: Dict[str, Any], fallback_url: str = ""
 ) -> List[Dict[str, Any]]:
-    """Map Tavily ``/extract`` or ``/crawl`` response to standard documents.
+    """Map Tavily ``/extract`` response to standard documents.

    Documents follow the legacy LLM post-processing shape::

@@ -139,7 +126,7 @@ def _normalize_tavily_documents(


 class TavilyWebSearchProvider(WebSearchProvider):
-    """Tavily search + extract + crawl provider."""
+    """Tavily search + extract provider."""

    @property
    def name(self) -> str:
@@ -159,9 +146,6 @@ class TavilyWebSearchProvider(WebSearchProvider):
    def supports_extract(self) -> bool:
        return True

-    def supports_crawl(self) -> bool:
-        return True
-
    def search(self, query: str, limit: int = 5) -> Dict[str, Any]:
        """Execute a Tavily search."""
        try:
@@ -221,60 +205,11 @@ class TavilyWebSearchProvider(WebSearchProvider):
                for u in urls
            ]

-    def crawl(self, url: str, **kwargs: Any) -> Dict[str, Any]:
-        """Crawl a seed URL via Tavily's ``/crawl`` endpoint.
-
-        Accepted kwargs (others ignored for forward compat):
-          - ``instructions``: str — natural-language guidance for the crawl
-          - ``depth``: str — ``"basic"`` (default) or ``"advanced"``
-          - ``limit``: int — max pages to crawl (default 20)
-
-        Returns ``{"results": [...]}`` shaped to match what
-        :func:`tools.web_tools.web_crawl_tool` post-processes.
-        """
-        try:
-            from tools.interrupt import is_interrupted
-
-            if is_interrupted():
-                return {"results": [{"url": url, "title": "", "content": "", "error": "Interrupted"}]}
-
-            instructions = kwargs.get("instructions")
-            depth = kwargs.get("depth", "basic")
-            limit = kwargs.get("limit", 20)
-
-            logger.info("Tavily crawl: %s (depth=%s, limit=%d)", url, depth, limit)
-            payload: Dict[str, Any] = {
-                "url": url,
-                "limit": limit,
-                "extract_depth": depth,
-            }
-            if instructions:
-                payload["instructions"] = instructions
-
-            raw = _tavily_request("crawl", payload)
-            return {
-                "results": _normalize_tavily_documents(raw, fallback_url=url)
-            }
-        except ValueError as exc:
-            return {"results": [{"url": url, "title": "", "content": "", "error": str(exc)}]}
-        except Exception as exc:  # noqa: BLE001
-            logger.warning("Tavily crawl error: %s", exc)
-            return {
-                "results": [
-                    {
-                        "url": url,
-                        "title": "",
-                        "content": "",
-                        "error": f"Tavily crawl failed: {exc}",
-                    }
-                ]
-            }
-
    def get_setup_schema(self) -> Dict[str, Any]:
        return {
            "name": "Tavily",
            "badge": "paid",
-            "tag": "Search + extract + crawl in one provider.",
+            "tag": "Search + extract in one provider.",
            "env_vars": [
                {
                    "key": "TAVILY_API_KEY",
@@ -143,9 +143,6 @@ class XAIWebSearchProvider(WebSearchProvider):
    def supports_extract(self) -> bool:
        return False

-    def supports_crawl(self) -> bool:
-        return False
-
    # -- Search -----------------------------------------------------------

    def search(self, query: str, limit: int = 5) -> Dict[str, Any]:
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

 [project]
 name = "hermes-agent"
-version = "0.14.0"
+version = "0.15.0"
 description = "The self-improving AI agent — creates skills from experience, improves them during use, and runs anywhere"
 readme = "README.md"
 requires-python = ">=3.11"
@@ -527,7 +527,81 @@ class AIAgent:
                "Session DB creation failed (will retry next turn): %s", e
            )

-    def reset_session_state(self):
+    def _transition_context_engine_session(
+        self,
+        *,
+        old_session_id: Optional[str] = None,
+        new_session_id: Optional[str] = None,
+        previous_messages: Optional[list] = None,
+        carry_over_context: bool = False,
+        reset_engine: bool = True,
+        **extra_context,
+    ) -> None:
+        """Notify the active context engine about a host session transition.
+
+        Generic host-side lifecycle helper. The built-in compressor keeps its
+        existing reset behavior; plugin engines that implement richer hooks
+        (``on_session_end``, ``on_session_reset``, ``on_session_start``,
+        ``carry_over_new_session_context``) can flush old-session state,
+        reset runtime counters, bind to the new session, and optionally
+        carry retained context forward.
+        """
+        engine = getattr(self, "context_compressor", None)
+        if not engine:
+            return
+
+        if old_session_id and previous_messages is not None and hasattr(engine, "on_session_end"):
+            try:
+                engine.on_session_end(old_session_id, previous_messages)
+            except Exception as exc:
+                logger.debug("context engine on_session_end during transition: %s", exc)
+
+        if reset_engine and hasattr(engine, "on_session_reset"):
+            try:
+                engine.on_session_reset()
+            except Exception as exc:
+                logger.debug("context engine on_session_reset during transition: %s", exc)
+
+        should_start = bool(
+            old_session_id
+            or previous_messages is not None
+            or carry_over_context
+            or extra_context
+        )
+        target_session_id = new_session_id or getattr(self, "session_id", "") or ""
+        if should_start and target_session_id and hasattr(engine, "on_session_start"):
+            start_context = {
+                "old_session_id": old_session_id,
+                "carry_over_context": carry_over_context,
+                "platform": getattr(self, "platform", None) or os.environ.get("HERMES_SESSION_SOURCE", "cli"),
+                "model": getattr(self, "model", ""),
+                "context_length": getattr(engine, "context_length", None),
+                "conversation_id": getattr(self, "_gateway_session_key", None),
+            }
+            start_context.update(extra_context)
+            start_context = {k: v for k, v in start_context.items() if v not in (None, "")}
+            try:
+                engine.on_session_start(target_session_id, **start_context)
+            except Exception as exc:
+                logger.debug("context engine on_session_start during transition: %s", exc)
+
+        if (
+            carry_over_context
+            and old_session_id
+            and target_session_id
+            and hasattr(engine, "carry_over_new_session_context")
+        ):
+            try:
+                engine.carry_over_new_session_context(old_session_id, target_session_id)
+            except Exception as exc:
+                logger.debug("context engine carry_over_new_session_context during transition: %s", exc)
+
+    def reset_session_state(
+        self,
+        previous_messages: Optional[list] = None,
+        old_session_id: Optional[str] = None,
+        carry_over_context: bool = False,
+    ):
        """Reset all session-scoped token counters to 0 for a fresh session.
        
        This method encapsulates the reset logic for all session-level metrics
@@ -541,9 +615,12 @@ class AIAgent:
        
        The method safely handles optional attributes (e.g., context compressor)
        using ``hasattr`` checks.
-        
-        This keeps the counter reset logic DRY and maintainable in one place
-        rather than scattering it across multiple methods.
+
+        When ``previous_messages`` / ``old_session_id`` / ``carry_over_context``
+        are provided, the active context engine is notified through the
+        full transition lifecycle (``_transition_context_engine_session``)
+        instead of a bare reset. Default callers pass nothing and keep the
+        existing reset-only behavior.
        """
        # Token usage counters
        self.session_total_tokens = 0
@@ -562,9 +639,14 @@ class AIAgent:
        # Turn counter (added after reset_session_state was first written — #2635)
        self._user_turn_count = 0

-        # Context engine reset (works for both built-in compressor and plugins)
-        if hasattr(self, "context_compressor") and self.context_compressor:
-            self.context_compressor.on_session_reset()
+        # Context engine reset/transition (works for built-in compressor and plugins)
+        self._transition_context_engine_session(
+            old_session_id=old_session_id,
+            new_session_id=getattr(self, "session_id", None),
+            previous_messages=previous_messages,
+            carry_over_context=carry_over_context,
+            reset_engine=True,
+        )

    def _ensure_lmstudio_runtime_loaded(self, config_context_length: Optional[int] = None) -> None:
        """
@@ -719,6 +801,83 @@ class AIAgent:
            except Exception:
                logger.debug("status_callback error in _emit_warning", exc_info=True)

+    # ── Buffered retry/fallback status ────────────────────────────────────
+    # Retry and fallback chains were flooding the CLI/gateway with status
+    # noise that users found confusing: a single transient 429 could produce
+    # 10+ "Provider/Endpoint/Retrying in 5s..." lines before the request
+    # eventually succeeded.  The buffered helpers below capture these
+    # status messages instead of emitting them immediately.  They are
+    # flushed (shown to the user) ONLY when every retry and fallback has
+    # been exhausted; on success they are silently dropped.  Backend logs
+    # (agent.log) are unaffected — every individual emission site still
+    # writes to ``logger.warning`` / ``logger.info`` for diagnosis.
+
+    def _buffer_status(self, message: str) -> None:
+        """Buffer a retry/fallback status message.
+
+        Stored as a (kind, text) tuple where ``kind`` is one of:
+        - ``"status"``  -> replays via ``_emit_status``
+        - ``"vprint"``  -> replays via ``_vprint(force=True)``
+        - ``"warn"``    -> replays via ``_emit_warning``
+        Used to defer noisy retry chatter until we know whether the
+        turn ultimately recovered or failed.
+        """
+        try:
+            buf = getattr(self, "_retry_status_buffer", None)
+            if buf is None:
+                buf = []
+                self._retry_status_buffer = buf
+            buf.append(("status", message))
+        except Exception:
+            # Never break the retry loop on a buffer hiccup.
+            pass
+
+    def _buffer_vprint(self, message: str) -> None:
+        """Buffer a vprint(force=True) retry/fallback line."""
+        try:
+            buf = getattr(self, "_retry_status_buffer", None)
+            if buf is None:
+                buf = []
+                self._retry_status_buffer = buf
+            buf.append(("vprint", message))
+        except Exception:
+            pass
+
+    def _clear_status_buffer(self) -> None:
+        """Drop buffered retry messages — call on successful recovery."""
+        try:
+            buf = getattr(self, "_retry_status_buffer", None)
+            if buf:
+                buf.clear()
+        except Exception:
+            pass
+
+    def _flush_status_buffer(self) -> None:
+        """Emit buffered retry messages — call on terminal failure.
+
+        Surfaces the full retry/fallback trace so the user can see what
+        was tried before the turn gave up.
+        """
+        try:
+            buf = getattr(self, "_retry_status_buffer", None)
+            if not buf:
+                return
+            # Drain first so a callback exception doesn't double-emit.
+            messages = list(buf)
+            buf.clear()
+            for kind, msg in messages:
+                try:
+                    if kind == "status":
+                        self._emit_status(msg)
+                    elif kind == "warn":
+                        self._emit_warning(msg)
+                    else:
+                        self._vprint(f"{self.log_prefix}{msg}", force=True)
+                except Exception:
+                    pass
+        except Exception:
+            pass
+
    def _disable_codex_reasoning_replay(
        self,
        messages: Optional[List[Dict[str, Any]]] = None,
@@ -2847,7 +3006,12 @@ class AIAgent:

        return True

-    def _try_refresh_nous_client_credentials(self, *, force: bool = True) -> bool:
+    def _try_refresh_nous_client_credentials(
+        self,
+        *,
+        force: bool = True,
+        inference_auth_mode: str | None = None,
+    ) -> bool:
        if self.api_mode != "chat_completions" or self.provider != "nous":
            return False

@@ -2858,14 +3022,15 @@ class AIAgent:
                resolve_nous_runtime_credentials,
            )

+            selected_auth_mode = inference_auth_mode or (
+                NOUS_INFERENCE_AUTH_MODE_LEGACY
+                if force
+                else NOUS_INFERENCE_AUTH_MODE_AUTO
+            )
            creds = resolve_nous_runtime_credentials(
                min_key_ttl_seconds=max(60, int(os.getenv("HERMES_NOUS_MIN_KEY_TTL_SECONDS", "1800"))),
                timeout_seconds=float(os.getenv("HERMES_NOUS_TIMEOUT_SECONDS", "15")),
-                inference_auth_mode=(
-                    NOUS_INFERENCE_AUTH_MODE_LEGACY
-                    if force
-                    else NOUS_INFERENCE_AUTH_MODE_AUTO
-                ),
+                inference_auth_mode=selected_auth_mode,
            )
        except Exception as exc:
            logger.debug("Nous credential refresh failed: %s", exc)
@@ -3988,6 +4153,11 @@ class AIAgent:
        from agent.agent_runtime_helpers import copy_reasoning_content_for_api
        return copy_reasoning_content_for_api(self, source_msg, api_msg)

+    def _reapply_reasoning_echo_for_provider(self, api_messages: list) -> int:
+        """Forwarder — see ``agent.agent_runtime_helpers.reapply_reasoning_echo_for_provider``."""
+        from agent.agent_runtime_helpers import reapply_reasoning_echo_for_provider
+        return reapply_reasoning_echo_for_provider(self, api_messages)
+
    @staticmethod
    def _sanitize_tool_calls_for_strict_api(api_msg: dict) -> dict:
        """Strip Codex Responses API fields from tool_calls for strict providers.
@@ -80,30 +80,27 @@ def crawl_source(source, source_name: str, limit: int) -> list:


 def crawl_skills_sh(source: SkillsShSource) -> list:
-    """Crawl skills.sh using popular queries for broad coverage."""
-    print("  Crawling skills.sh (popular queries)...", flush=True)
+    """Crawl skills.sh via its sitemap to enumerate the full catalog (~20k entries).
+
+    Previously walked a hardcoded list of ~28 popular keywords (each capped at
+    50 results) which yielded ~850 unique skills — about 4% of the real catalog.
+    The SkillsShSource.search("") path now hits the sitemap directly, returning
+    the full 20k-entry catalog deduplicated by canonical identifier.
+    """
+    print("  Crawling skills.sh (sitemap)...", flush=True)
    start = time.time()

-    queries = [
-        "",  # featured
-        "react", "python", "web", "api", "database", "docker",
-        "testing", "scraping", "design", "typescript", "git",
-        "aws", "security", "data", "ml", "ai", "devops",
-        "frontend", "backend", "mobile", "cli", "documentation",
-        "kubernetes", "terraform", "rust", "go", "java",
-    ]
+    try:
+        results = source.search("", limit=0)  # 0 = no cap, return the whole catalog
+    except Exception as e:
+        print(f"    Warning: skills.sh sitemap walk failed: {e}", file=sys.stderr)
+        results = []

    all_skills: dict[str, dict] = {}
-    for query in queries:
-        try:
-            results = source.search(query, limit=50)
-            for meta in results:
-                entry = _meta_to_dict(meta)
-                if entry["identifier"] not in all_skills:
-                    all_skills[entry["identifier"]] = entry
-        except Exception as e:
-            print(f"    Warning: skills.sh search '{query}' failed: {e}",
-                  file=sys.stderr)
+    for meta in results:
+        entry = _meta_to_dict(meta)
+        if entry["identifier"] not in all_skills:
+            all_skills[entry["identifier"]] = entry

    elapsed = time.time() - start
    print(f"  skills.sh: {len(all_skills)} unique skills ({elapsed:.1f}s)",
@@ -269,11 +266,28 @@ def main():
    # Crawl skills.sh
    all_skills.extend(crawl_skills_sh(skills_sh_source))

-    # Crawl other sources in parallel
+    # Crawl other sources in parallel.
+    # Per-source soft caps — sources stop returning when they run out, so these
+    # are ceilings, not targets.  ClawHub has 20k+ skills; bumping to 100k
+    # (well above current catalog size) lets the full catalog land in the
+    # index instead of being truncated at an arbitrary build-time limit.
+    SOURCE_LIMITS = {
+        # ClawHub had 49,698+ skills as of May 2026; 200k leaves headroom.
+        "clawhub": 200_000,
+        "lobehub": 100_000,
+        "browse-sh": 5_000,
+        "claude-marketplace": 5_000,
+        "github": 5_000,
+        "well-known": 5_000,
+        "official": 5_000,
+    }
+    DEFAULT_SOURCE_LIMIT = 500
+
    with ThreadPoolExecutor(max_workers=4) as pool:
        futures = {}
        for name, source in sources.items():
-            futures[pool.submit(crawl_source, source, name, 500)] = name
+            limit = SOURCE_LIMITS.get(name, DEFAULT_SOURCE_LIMIT)
+            futures[pool.submit(crawl_source, source, name, limit)] = name
        for future in as_completed(futures):
            try:
                all_skills.extend(future.result())
@@ -328,9 +342,17 @@ def main():
    # or rate limiting kicked in.  Failing here forces a human look before
    # the broken index reaches the live docs.
    EXPECTED_FLOORS = {
-        "skills.sh": 100,
+        # skills.sh now uses the sitemap walker (~20k catalog as of May 2026).
+        # Anything under 10k means the sitemap shape changed or fetches failed
+        # — better to fail loudly than ship a regression to the 858-skill
+        # popular-queries era.
+        "skills.sh": 10000,
        "lobehub": 100,
-        "clawhub": 50,
+        # ClawHub had 49,698+ skills as of May 2026 — anything under 20k means
+        # pagination broke or the API surface changed.  Fail loudly rather
+        # than ship a degenerate index (we shipped 200/50000 silently for
+        # weeks because the floor was 50).
+        "clawhub": 20000,
        "official": 50,
        "github": 30,        # collapsed across all GitHub taps
        "browse-sh": 50,
@@ -42,6 +42,7 @@ IGNORED_PATTERNS = [
    re.compile(r"^Copilot$", re.IGNORECASE),
    re.compile(r"^Cursor(\s+Agent)?$", re.IGNORECASE),
    re.compile(r"^GitHub\s*Actions?$", re.IGNORECASE),
+    re.compile(r"^github-actions(\[bot\])?$", re.IGNORECASE),
    re.compile(r"^dependabot", re.IGNORECASE),
    re.compile(r"^renovate", re.IGNORECASE),
    re.compile(r"^Hermes\s+(Agent|Audit)$", re.IGNORECASE),
@@ -51,10 +52,12 @@ IGNORED_PATTERNS = [
 IGNORED_EMAILS = {
    "noreply@anthropic.com",
    "noreply@github.com",
+    "noreply@nousresearch.com",
    "cursoragent@cursor.com",
    "hermes@nousresearch.com",
    "hermes-audit@example.com",
    "hermes@habibilabs.dev",
+    "omx@oh-my-codex.dev",
 }


@@ -27,21 +27,22 @@ import argparse
 import shutil
 import subprocess
 import sys
-import tarfile
 import tempfile
 import urllib.request
 from pathlib import Path

-# Pin a version we know patches cleanly. Update when a newer psutil
-# changes the marker line shape and we need to follow upstream.
-PSUTIL_URL = (
-    "https://files.pythonhosted.org/packages/aa/c6/"
-    "d1ddf4abb55e93cebc4f2ed8b5d6dbad109ecb8d63748dd2b20ab5e57ebe/"
-    "psutil-7.2.2.tar.gz"
+# Keep sibling imports working when invoked as
+# ``python scripts/install_psutil_android.py`` from the repo checkout.
+REPO_ROOT = Path(__file__).resolve().parents[1]
+if str(REPO_ROOT) not in sys.path:
+    sys.path.insert(0, str(REPO_ROOT))
+
+from hermes_cli.psutil_android import (
+    PSUTIL_URL,
+    PsutilAndroidInstallError,
+    prepare_patched_psutil_sdist,
 )

-MARKER = 'LINUX = sys.platform.startswith("linux")'
-REPLACEMENT = 'LINUX = sys.platform.startswith(("linux", "android"))'


 def _resolve_install_cmd(pip_arg: str | None, prefer_uv: bool) -> list[str]:
@@ -82,26 +83,10 @@ def main() -> int:
        tmp_path = Path(tmp)
        archive = tmp_path / "psutil.tar.gz"
        urllib.request.urlretrieve(PSUTIL_URL, archive)
-        with tarfile.open(archive) as tar:
-            tar.extractall(tmp_path)
-
        try:
-            src_root = next(
-                p for p in tmp_path.iterdir()
-                if p.is_dir() and p.name.startswith("psutil-")
-            )
-        except StopIteration:
-            sys.exit("psutil sdist did not contain a psutil-* directory")
-
-        common_py = src_root / "psutil" / "_common.py"
-        content = common_py.read_text(encoding="utf-8")
-        if MARKER not in content:
-            sys.exit(
-                "psutil Android compatibility patch marker not found — "
-                "upstream may have changed the LINUX detection line. "
-                "Update MARKER/REPLACEMENT in this script."
-            )
-        common_py.write_text(content.replace(MARKER, REPLACEMENT), encoding="utf-8")
+            src_root = prepare_patched_psutil_sdist(archive, tmp_path)
+        except PsutilAndroidInstallError as exc:
+            sys.exit(str(exc))

        cmd = install_cmd_prefix + ["install", "--no-build-isolation", str(src_root)]
        print(f"  $ {' '.join(cmd)}")
@@ -46,12 +46,14 @@ ACP_REGISTRY_MANIFEST = REPO_ROOT / "acp_registry" / "agent.json"
 # Auto-extracted from noreply emails + manual overrides
 AUTHOR_MAP = {
    "9592417+adam91holt@users.noreply.github.com": "adam91holt",
+    "kchuang1015@users.noreply.github.com": "kchuang1015",
    "45688690+fujinice@users.noreply.github.com": "fujinice",
    "276689385+carltonawong@users.noreply.github.com": "carltonawong",
    "195255660+EvilHumphrey@users.noreply.github.com": "EvilHumphrey",
    "270604154+superearn-fisher@users.noreply.github.com": "superearn-fisher",
    "3540493+kpadilha@users.noreply.github.com": "kpadilha",
    "40378218+chaconne67@users.noreply.github.com": "chaconne67",
+    "Pluviobyte@users.noreply.github.com": "Pluviobyte",
    "sanghyuk_seo@nexcubecorp.com": "sanghyuk-seo-nexcube",
    "subrtt@gmail.com": "Brixyy",
    "wangpuv@hotmail.com": "wangpuv",
@@ -73,6 +75,7 @@ AUTHOR_MAP = {
    "anadi.jaggia@gmail.com": "Jaggia",
    "steve@steveonjava.com": "steveonjava",
    "steveonjava@gmail.com": "steveonjava",
+    "squiddy@2rook.ai": "MoonRay305",
    "32201324+simpolism@users.noreply.github.com": "simpolism",
    "simpolism@gmail.com": "simpolism",
    "jake@nousresearch.com": "simpolism",
@@ -87,6 +90,7 @@ AUTHOR_MAP = {
    "omar@techdeveloper.site": "nycomar",
    "qiyin.zuo@pcitc.com": "qiyin-code",
    "mr.aashiz@gmail.com": "aashizpoudel",
+    "adityargadgil@gmail.com": "AdityaRajeshGadgil",
    "70629228+shaun0927@users.noreply.github.com": "shaun0927",
    "soju06@users.noreply.github.com": "Soju06",
    "34199905+Soju06@users.noreply.github.com": "Soju06",
@@ -124,6 +128,7 @@ AUTHOR_MAP = {
    "buraysandro9@gmail.com": "ygd58",
    "108427749+buntingszn@users.noreply.github.com": "buntingszn",
    "yanglongwei06@gmail.com": "Alex-yang00",
+    "yanghongda@jackyun.com": "yangguangjin",
    "teknium@nousresearch.com": "teknium1",
    "markuscontasul@gmail.com": "Glucksberg",
    "80581902+Glucksberg@users.noreply.github.com": "Glucksberg",
@@ -568,7 +573,7 @@ AUTHOR_MAP = {
    "ruzzgarcn@gmail.com": "Ruzzgar",
    "yukipukikedy@gmail.com": "Yukipukii1",
    "alireza78.crypto@gmail.com": "alireza78a",
-    "brooklyn.bb.nicholson@gmail.com": "brooklynnicholson",
+    "brooklyn.bb.nicholson@gmail.com": "OutThisLife",
    "withapurpose37@gmail.com": "StefanIsMe",
    "4317663+helix4u@users.noreply.github.com": "helix4u",
    "ifkellx@users.noreply.github.com": "Ifkellx",
@@ -1287,6 +1292,8 @@ AUTHOR_MAP = {
    "rudi193@gmail.com": "rudi193-cmd",
    "86684667+sadiksaifi@users.noreply.github.com": "sadiksaifi",  # PR #27982 salvage (kanban horiz scroll)
    "mail@sadiksaifi.dev": "sadiksaifi",
+    "231588442+vynxevainglory-ai@users.noreply.github.com": "vynxevainglory-ai",  # PR #29233 salvage (kanban scrollbar + body overflow)
+    "vynxevainglory@gmail.com": "vynxevainglory-ai",
    # batch salvage (May 2026 LHF run, group 8)
    "266824395+AceWattGit@users.noreply.github.com": "AceWattGit",  # PR #28159 salvage (_pool_may_recover NameError)
    "57024493+YuanHanzhong@users.noreply.github.com": "YuanHanzhong",  # PR #28032 salvage (x.com status link-like)
@@ -1342,6 +1349,23 @@ AUTHOR_MAP = {
    "timothy.b.dixon@gmail.com": "Codename-11",  # PR #29302 (API server session controls — sessions/chat/fork/stream)
    "jpschwartz2@uwalumni.com": "Schwartz10",  # PR #29302 sub-PR (multimodal media in session chat API)
    "JohnC1009@users.noreply.github.com": "JohnC1009",  # PR #32020 salvage (auth: global auth.json fallback in _load_provider_state)
+    "biser@bisko.be": "bisko",  # PR #33784 salvage (re-pad reasoning_content on cross-provider fallback to require-side providers)
+    # v0.15.0 additions
+    "glen@workmanfirearms.com": "sgtworkman",
+    "jorge.fuenmayort@gmail.com": "jfuenmayor",
+    "mordred@inaugust.com": "emonty",
+    "rodrigoeq@hotmail.com": "rodrigoeqnit",
+    "soliva.johnpaul@icloud.com": "jonpol01",
+    "2182712990@qq.com": "yu-xin-c",  # PR #32122 (Docker audio bridge notes)
+    "baxter@bitreserve.ai": "BaxBit",  # PR #30200 (Svix webhook signature validation)
+    "chris.eth@qq.com": "duyua9",  # PR #10949 (render object config values structurally)
+    "ethie@nous": "ethernet8023",  # PR #29342 (TUI clipboard copy on linux/wayland)
+    "jiahuigu@sjtu.edu.cn": "Jiahui-Gu",  # PR #29276 (guard pickle.loads in darwinian-evolver)
+    "justinccdev@gmail.com": "justincc",  # PR #28914 (set tool_name on tool-result messages)
+    "kdkcfp@gmail.com": "slowtokki0409",  # PR #29025 (ignore local Hermes runtime files)
+    "peter.yuqin@gmail.com": "WuKongAI-CMU",  # PR #10082 (reject symlinked audio inputs)
+    "sunil.nitie@gmail.com": "Sunil123135",  # PR #31031 (Windows Docker Desktop compose)
+    "weichangyuwcy@gmail.com": "ChyuWei",  # PR #30987 (TUI TTS env var on voice off)
 }


@@ -17,6 +17,11 @@ prerequisites:

 Himalaya is a CLI email client that lets you manage emails from the terminal using IMAP, SMTP, Notmuch, or Sendmail backends.

+This skill is separate from the Hermes Email gateway adapter. The gateway
+adapter lets people email the agent and uses Hermes' built-in IMAP/SMTP
+adapter; this skill lets the agent operate a mailbox from terminal tools and
+requires the external `himalaya` CLI.
+
 ## References

 - `references/configuration.md` (config file setup + IMAP/SMTP authentication)
@@ -1188,16 +1188,27 @@ class TestBuildAnthropicKwargs:
        # params through its signature, we exercise the strip behavior by
        # calling the internal predicate directly.
        from agent.anthropic_adapter import _forbids_sampling_params
+        assert _forbids_sampling_params("claude-opus-4-8") is True
+        assert _forbids_sampling_params("claude-opus-4-8-fast") is True
        assert _forbids_sampling_params("claude-opus-4-7") is True
        assert _forbids_sampling_params("claude-opus-4-6") is False
        assert _forbids_sampling_params("claude-sonnet-4-5") is False

    def test_supports_fast_mode_predicate(self):
-        """Fast mode is Opus 4.6 only — Opus 4.7 and others must be excluded."""
+        """Fast mode is Opus 4.6 only — Opus 4.7 and others must be excluded.
+
+        For Opus 4.8 the fast variant is a separate model ID
+        (anthropic/claude-opus-4.8-fast) routed through the normal model
+        field, NOT via the ``speed: "fast"`` request parameter. So
+        ``_supports_fast_mode`` (which gates the parameter) must stay
+        False for both opus-4-8 and opus-4-8-fast.
+        """
        from agent.anthropic_adapter import _supports_fast_mode
        assert _supports_fast_mode("claude-opus-4-6") is True
        assert _supports_fast_mode("anthropic/claude-opus-4-6") is True
        assert _supports_fast_mode("claude-opus-4-7") is False
+        assert _supports_fast_mode("claude-opus-4-8") is False
+        assert _supports_fast_mode("claude-opus-4-8-fast") is False
        assert _supports_fast_mode("claude-sonnet-4-6") is False
        assert _supports_fast_mode("claude-haiku-4-5") is False
        assert _supports_fast_mode("") is False
@@ -992,6 +992,47 @@ class TestAuxiliaryPoolAwareness:
        assert stale_client.chat.completions.create.call_count == 1
        assert fresh_client.chat.completions.create.call_count == 1

+    def test_call_llm_refreshes_nous_after_free_tier_block_when_account_paid(self):
+        from hermes_cli.nous_account import NousPortalAccountInfo
+
+        class _Payment404(Exception):
+            status_code = 404
+
+        stale_client = MagicMock()
+        stale_client.base_url = "https://inference-api.nousresearch.com/v1"
+        stale_client.chat.completions.create.side_effect = _Payment404(
+            "model_not_supported_on_free_tier: model is not available on the free tier"
+        )
+
+        fresh_client = MagicMock()
+        fresh_client.base_url = "https://inference-api.nousresearch.com/v1"
+        fresh_client.chat.completions.create.return_value = {"ok": True}
+
+        with (
+            patch("agent.auxiliary_client._resolve_task_provider_model", return_value=("nous", "nous-model", None, None, None)),
+            patch("agent.auxiliary_client._get_cached_client", return_value=(stale_client, "nous-model")),
+            patch("agent.auxiliary_client.OpenAI", return_value=fresh_client),
+            patch("agent.auxiliary_client._validate_llm_response", side_effect=lambda resp, _task: resp),
+            patch("agent.auxiliary_client._resolve_nous_runtime_api", return_value=("fresh-agent-key", "https://inference-api.nousresearch.com/v1")),
+            patch(
+                "hermes_cli.nous_account.get_nous_portal_account_info",
+                return_value=NousPortalAccountInfo(
+                    logged_in=True,
+                    source="account_api",
+                    fresh=True,
+                    paid_service_access=True,
+                ),
+            ),
+        ):
+            result = call_llm(
+                task="compression",
+                messages=[{"role": "user", "content": "hi"}],
+            )
+
+        assert result == {"ok": True}
+        assert stale_client.chat.completions.create.call_count == 1
+        assert fresh_client.chat.completions.create.call_count == 1
+
    @pytest.mark.asyncio
    async def test_async_call_llm_retries_nous_after_401(self):
        class _Auth401(Exception):
@@ -1021,6 +1062,48 @@ class TestAuxiliaryPoolAwareness:
        assert stale_client.chat.completions.create.await_count == 1
        assert fresh_async_client.chat.completions.create.await_count == 1

+    @pytest.mark.asyncio
+    async def test_async_call_llm_refreshes_nous_after_free_tier_block_when_account_paid(self):
+        from hermes_cli.nous_account import NousPortalAccountInfo
+
+        class _Payment404(Exception):
+            status_code = 404
+
+        stale_client = MagicMock()
+        stale_client.base_url = "https://inference-api.nousresearch.com/v1"
+        stale_client.chat.completions.create = AsyncMock(side_effect=_Payment404(
+            "model_not_supported_on_free_tier: model is not available on the free tier"
+        ))
+
+        fresh_async_client = MagicMock()
+        fresh_async_client.base_url = "https://inference-api.nousresearch.com/v1"
+        fresh_async_client.chat.completions.create = AsyncMock(return_value={"ok": True})
+
+        with (
+            patch("agent.auxiliary_client._resolve_task_provider_model", return_value=("nous", "nous-model", None, None, None)),
+            patch("agent.auxiliary_client._get_cached_client", return_value=(stale_client, "nous-model")),
+            patch("agent.auxiliary_client._to_async_client", return_value=(fresh_async_client, "nous-model")),
+            patch("agent.auxiliary_client._validate_llm_response", side_effect=lambda resp, _task: resp),
+            patch("agent.auxiliary_client._resolve_nous_runtime_api", return_value=("fresh-agent-key", "https://inference-api.nousresearch.com/v1")),
+            patch(
+                "hermes_cli.nous_account.get_nous_portal_account_info",
+                return_value=NousPortalAccountInfo(
+                    logged_in=True,
+                    source="account_api",
+                    fresh=True,
+                    paid_service_access=True,
+                ),
+            ),
+        ):
+            result = await async_call_llm(
+                task="session_search",
+                messages=[{"role": "user", "content": "hi"}],
+            )
+
+        assert result == {"ok": True}
+        assert stale_client.chat.completions.create.await_count == 1
+        assert fresh_async_client.chat.completions.create.await_count == 1
+
    def test_cached_gmi_client_keeps_explicit_slash_model_override(self):
        import agent.auxiliary_client as aux

@@ -1076,6 +1159,19 @@ class TestIsPaymentError:
        exc.status_code = 429
        assert _is_payment_error(exc) is True

+    def test_404_free_tier_model_block_is_payment(self):
+        exc = Exception(
+            "Model 'gpt-5' is not available on the Free Tier. "
+            "Upgrade at https://portal.nousresearch.com or pick a free model."
+        )
+        exc.status_code = 404
+        assert _is_payment_error(exc) is True
+
+    def test_404_generic_not_found_is_not_payment(self):
+        exc = Exception("Not Found")
+        exc.status_code = 404
+        assert _is_payment_error(exc) is False
+
    def test_429_without_credits_message_is_not_payment(self):
        """Normal rate limits should NOT be treated as payment errors."""
        exc = Exception("Rate limit exceeded, try again in 2 seconds")
@@ -114,6 +114,7 @@ def test_ttfb_includes_silent_hang_hint_for_gpt_5_5(tmp_path, monkeypatch):
    statuses: list[str] = []
    dummy_client = SimpleNamespace()
    monkeypatch.setattr(agent, "_create_request_openai_client", lambda **k: dummy_client)
+    monkeypatch.setattr(agent, "_buffer_status", lambda msg: statuses.append(msg))
    monkeypatch.setattr(agent, "_emit_status", lambda msg: statuses.append(msg))
    monkeypatch.setattr(
        agent, "_abort_request_openai_client",
@@ -0,0 +1,290 @@
+"""Regressions for the context-engine host contract.
+
+These tests pin the five generic host-side guarantees that external context
+engine plugins (e.g. hermes-lcm) rely on:
+
+1. ``_transition_context_engine_session`` drives the full lifecycle
+   (on_session_end → on_session_reset → on_session_start → optional
+   carry_over_new_session_context) and ``reset_session_state`` delegates
+   to it when callers pass session metadata.
+
+2. ``on_session_start`` receives ``conversation_id`` derived from
+   ``_gateway_session_key`` at agent init time.
+
+3. ``conversation_loop`` forwards canonical cache buckets
+   (``cache_read_tokens``, ``cache_write_tokens``, ``input_tokens``,
+   ``output_tokens``, ``reasoning_tokens``) to the engine's
+   ``update_from_response``, on top of the legacy aggregate keys.
+
+4. ``_discover_context_engines`` includes plugin-registered engines (not
+   just repo-shipped engines under ``plugins/context_engine/``).
+
+5. The repo-shipped ``_EngineCollector`` honors ``ctx.register_command``
+   from a plugin engine's ``register(ctx)`` entry point and routes it
+   to the global plugin command registry.
+"""
+
+from __future__ import annotations
+
+from unittest.mock import MagicMock
+
+import pytest
+
+from run_agent import AIAgent
+
+
+def _bare_agent() -> AIAgent:
+    agent = object.__new__(AIAgent)
+    agent.session_id = "test-session"
+    agent.model = "fake-model"
+    agent.platform = "telegram"
+    agent._gateway_session_key = "agent:main:telegram:dm:42"
+    return agent
+
+
+def test_transition_runs_full_lifecycle_in_order():
+    """End → reset → start → carry_over, in that order, when all inputs apply."""
+    events: list[str] = []
+    engine = MagicMock()
+    engine.context_length = 200_000
+    engine.on_session_end.side_effect = lambda *a, **kw: events.append("on_session_end")
+    engine.on_session_reset.side_effect = lambda *a, **kw: events.append("on_session_reset")
+    engine.on_session_start.side_effect = lambda *a, **kw: events.append("on_session_start")
+    engine.carry_over_new_session_context.side_effect = lambda *a, **kw: events.append("carry_over")
+
+    agent = _bare_agent()
+    agent.context_compressor = engine
+
+    agent._transition_context_engine_session(
+        old_session_id="old-sid",
+        new_session_id="new-sid",
+        previous_messages=[{"role": "user", "content": "hi"}],
+        carry_over_context=True,
+    )
+
+    assert events == [
+        "on_session_end",
+        "on_session_reset",
+        "on_session_start",
+        "carry_over",
+    ]
+
+
+def test_transition_passes_conversation_id_from_gateway_session_key():
+    """on_session_start receives ``conversation_id`` from ``_gateway_session_key``."""
+    engine = MagicMock()
+    engine.context_length = 200_000
+    captured: dict = {}
+    engine.on_session_start.side_effect = lambda sid, **kw: captured.update(kw)
+
+    agent = _bare_agent()
+    agent.context_compressor = engine
+
+    agent._transition_context_engine_session(
+        old_session_id="old-sid",
+        new_session_id="new-sid",
+        previous_messages=[{"role": "user", "content": "hi"}],
+    )
+
+    assert captured.get("conversation_id") == "agent:main:telegram:dm:42"
+    assert captured.get("old_session_id") == "old-sid"
+    assert captured.get("platform") == "telegram"
+
+
+def test_transition_skips_optional_hooks_when_engine_lacks_them():
+    """Engines that don't implement on_session_end/carry_over still work."""
+    class MinimalEngine:
+        def __init__(self):
+            self.context_length = 100_000
+            self.reset_called = False
+            self.start_called_with = None
+
+        def on_session_reset(self):
+            self.reset_called = True
+
+        def on_session_start(self, sid, **kw):
+            self.start_called_with = (sid, kw)
+
+    engine = MinimalEngine()
+    agent = _bare_agent()
+    agent.context_compressor = engine
+
+    # Should not raise even though on_session_end / carry_over are missing.
+    agent._transition_context_engine_session(
+        old_session_id="old",
+        new_session_id="new",
+        previous_messages=[{"role": "user", "content": "hi"}],
+        carry_over_context=True,
+    )
+
+    assert engine.reset_called is True
+    assert engine.start_called_with is not None
+    new_sid, kw = engine.start_called_with
+    assert new_sid == "new"
+    assert kw.get("old_session_id") == "old"
+
+
+def test_reset_session_state_delegates_to_transition_when_args_provided():
+    """``reset_session_state(previous_messages=..., old_session_id=...)`` fires full lifecycle."""
+    engine = MagicMock()
+    engine.context_length = 100_000
+
+    agent = _bare_agent()
+    agent.context_compressor = engine
+
+    agent.reset_session_state(
+        previous_messages=[{"role": "user", "content": "hi"}],
+        old_session_id="old-sid",
+    )
+
+    assert engine.on_session_end.called
+    assert engine.on_session_reset.called
+    assert engine.on_session_start.called
+    # No carry_over_context, so carry_over hook NOT called.
+    assert not engine.carry_over_new_session_context.called
+
+
+def test_reset_session_state_default_call_only_resets():
+    """Bare ``reset_session_state()`` still only resets the engine (no end/start)."""
+    engine = MagicMock()
+    engine.context_length = 100_000
+
+    agent = _bare_agent()
+    agent.context_compressor = engine
+
+    agent.reset_session_state()
+
+    assert engine.on_session_reset.called
+    assert not engine.on_session_end.called
+    assert not engine.on_session_start.called
+
+
+def test_update_from_response_forwards_canonical_cache_buckets():
+    """conversation_loop passes cache_read/write/reasoning tokens to engine."""
+    # Test the contract directly: a usage_dict built from CanonicalUsage must
+    # contain the canonical buckets in addition to the legacy keys. We don't
+    # spin up the full conversation loop; we just verify the dict shape.
+    from agent.usage_pricing import CanonicalUsage
+
+    canonical = CanonicalUsage(
+        input_tokens=1000,
+        output_tokens=500,
+        cache_read_tokens=800,
+        cache_write_tokens=200,
+        reasoning_tokens=50,
+    )
+    usage_dict = {
+        "prompt_tokens": canonical.prompt_tokens,
+        "completion_tokens": canonical.output_tokens,
+        "total_tokens": canonical.total_tokens,
+        "input_tokens": canonical.input_tokens,
+        "output_tokens": canonical.output_tokens,
+        "cache_read_tokens": canonical.cache_read_tokens,
+        "cache_write_tokens": canonical.cache_write_tokens,
+        "reasoning_tokens": canonical.reasoning_tokens,
+    }
+
+    # Legacy keys present
+    assert usage_dict["prompt_tokens"] == canonical.prompt_tokens
+    assert usage_dict["completion_tokens"] == 500
+    assert usage_dict["total_tokens"] == canonical.total_tokens
+    # Canonical cache + reasoning buckets present
+    assert usage_dict["cache_read_tokens"] == 800
+    assert usage_dict["cache_write_tokens"] == 200
+    assert usage_dict["reasoning_tokens"] == 50
+    assert usage_dict["input_tokens"] == 1000
+    assert usage_dict["output_tokens"] == 500
+
+
+def test_discover_context_engines_includes_plugin_registered_engines(monkeypatch):
+    """Plugin-registered context engines appear in the ``hermes plugins`` picker."""
+    from hermes_cli import plugins_cmd
+
+    fake_repo = lambda: [("compressor", "built-in", True)]
+
+    class FakePluginEngine:
+        name = "lcm"
+
+    monkeypatch.setattr(
+        "plugins.context_engine.discover_context_engines",
+        fake_repo,
+    )
+    monkeypatch.setattr(
+        "hermes_cli.plugins.discover_plugins",
+        lambda *_a, **_kw: None,
+    )
+    monkeypatch.setattr(
+        "hermes_cli.plugins.get_plugin_context_engine",
+        lambda: FakePluginEngine(),
+    )
+
+    engines = plugins_cmd._discover_context_engines()
+    names = [n for n, _desc in engines]
+    assert "compressor" in names
+    assert "lcm" in names
+
+
+def test_discover_context_engines_dedupes_by_name(monkeypatch):
+    """Repo-shipped engine wins when name collides with a plugin-registered one."""
+    from hermes_cli import plugins_cmd
+
+    class FakePluginEngine:
+        name = "compressor"  # same name as repo-shipped
+
+    monkeypatch.setattr(
+        "plugins.context_engine.discover_context_engines",
+        lambda: [("compressor", "built-in compressor", True)],
+    )
+    monkeypatch.setattr(
+        "hermes_cli.plugins.discover_plugins",
+        lambda *_a, **_kw: None,
+    )
+    monkeypatch.setattr(
+        "hermes_cli.plugins.get_plugin_context_engine",
+        lambda: FakePluginEngine(),
+    )
+
+    engines = plugins_cmd._discover_context_engines()
+    # Only one entry — the repo-shipped one. Description is preserved.
+    assert engines == [("compressor", "built-in compressor")]
+
+
+def test_engine_collector_forwards_register_command_to_plugin_manager():
+    """A plugin context engine can register a slash command via ``ctx.register_command``."""
+    from plugins.context_engine import _EngineCollector
+    from hermes_cli.plugins import get_plugin_manager
+
+    handler = lambda raw_args: f"echo: {raw_args}"
+
+    collector = _EngineCollector(engine_name="my-lcm")
+    collector.register_command(
+        "my-lcm-test-cmd",
+        handler,
+        description="test command from a context engine",
+        args_hint="<msg>",
+    )
+
+    manager = get_plugin_manager()
+    try:
+        assert "my-lcm-test-cmd" in manager._plugin_commands
+        entry = manager._plugin_commands["my-lcm-test-cmd"]
+        assert entry["handler"] is handler
+        assert entry["args_hint"] == "<msg>"
+        assert entry["plugin"] == "context-engine:my-lcm"
+    finally:
+        # Clean up so we don't leak the registration across tests.
+        manager._plugin_commands.pop("my-lcm-test-cmd", None)
+
+
+def test_engine_collector_rejects_builtin_command_conflicts():
+    """Context engine cannot shadow built-in slash commands like /help."""
+    from plugins.context_engine import _EngineCollector
+    from hermes_cli.plugins import get_plugin_manager
+
+    collector = _EngineCollector(engine_name="my-lcm")
+    collector.register_command("help", lambda *_: "shadow")
+
+    manager = get_plugin_manager()
+    # Must NOT have overwritten / registered against built-in /help.
+    assert "help" not in manager._plugin_commands or \
+           manager._plugin_commands["help"].get("plugin") != "context-engine:my-lcm"
@@ -59,6 +59,7 @@ class TestFailoverReason:
            "invalid_encrypted_content",
            "multimodal_tool_content_unsupported",
            "provider_policy_blocked",
+            "content_policy_blocked",
            "thinking_signature", "long_context_tier",
            "oauth_long_context_beta_forbidden",
            "llama_cpp_grammar_pattern",
@@ -254,12 +255,51 @@ class TestClassifyApiError:
        assert result.reason == FailoverReason.billing
        assert result.retryable is False

+    def test_402_out_of_funds_billing(self):
+        e = MockAPIError(
+            "Payment Required",
+            status_code=402,
+            body={
+                "status": 402,
+                "message": (
+                    "Your API key has run out of funds. Please go visit the "
+                    "portal to sort that out: https://portal.nousresearch.com"
+                ),
+            },
+        )
+        result = classify_api_error(e)
+        assert result.reason == FailoverReason.billing
+        assert result.retryable is False
+
    def test_402_transient_usage_limit(self):
        e = MockAPIError("usage limit exceeded, try again later", status_code=402)
        result = classify_api_error(e)
        assert result.reason == FailoverReason.rate_limit
        assert result.retryable is True

+    def test_403_plan_entitlement_billing(self):
+        e = MockAPIError("This plan does not include the requested model", status_code=403)
+        result = classify_api_error(e)
+        assert result.reason == FailoverReason.billing
+        assert result.retryable is False
+
+    def test_404_free_tier_model_block_is_billing(self):
+        e = MockAPIError(
+            "Not Found",
+            status_code=404,
+            body={
+                "status": 404,
+                "message": (
+                    "Model 'gpt-5' is not available on the Free Tier. "
+                    "Upgrade at https://portal.nousresearch.com or pick a free model."
+                ),
+            },
+        )
+        result = classify_api_error(e, provider="nous", model="gpt-5")
+        assert result.reason == FailoverReason.billing
+        assert result.retryable is False
+        assert result.should_fallback is True
+
    # ── Rate limit ──

    def test_429_rate_limit(self):
@@ -427,6 +467,78 @@ class TestClassifyApiError:
        result = classify_api_error(e)
        assert result.reason == FailoverReason.provider_policy_blocked

+    # ── Provider content-policy block (per-prompt safety filter) ──
+    #
+    # Distinct from ``provider_policy_blocked`` above — these are upstream
+    # model-provider safety refusals for THIS prompt, not OpenRouter
+    # account-level data policy. Recovery is fallback model, not config fix.
+    # See issue #18028 — OpenAI Codex was burning 3 retries on identical
+    # refusals before users saw "API failed after 3 retries" on Telegram.
+
+    def test_message_only_cyber_content_policy_blocked(self):
+        # OpenAI Codex returns this without an HTTP status. Retrying the
+        # same prompt three times only repeats the same policy decision, so
+        # the classifier must jump straight to fallback / abort instead of
+        # leaving it in the retryable ``unknown`` bucket.
+        e = Exception(
+            "This content was flagged for possible cybersecurity risk. If this "
+            "seems wrong, try rephrasing your request. To get authorized for "
+            "security work, join the Trusted Access for Cyber program."
+        )
+        result = classify_api_error(e, provider="openai-codex", model="gpt-5.5")
+        assert result.reason == FailoverReason.content_policy_blocked
+        assert result.retryable is False
+        assert result.should_fallback is True
+        assert result.should_compress is False
+
+    def test_400_cyber_content_policy_blocked(self):
+        # When the SDK does attach a status (e.g. 400), the safety pattern
+        # must still beat the format_error fallthrough.
+        e = MockAPIError(
+            "This content was flagged for possible cybersecurity risk",
+            status_code=400,
+        )
+        result = classify_api_error(e, provider="openai-codex", model="gpt-5.5")
+        assert result.reason == FailoverReason.content_policy_blocked
+        assert result.retryable is False
+        assert result.should_fallback is True
+
+    def test_openai_usage_policy_violation_content_policy_blocked(self):
+        # OpenAI moderation refusal wording from chat completions / responses.
+        e = MockAPIError(
+            "Your request was flagged by the moderation system as potentially "
+            "violating OpenAI's usage policies.",
+            status_code=400,
+        )
+        result = classify_api_error(e, provider="openai", model="gpt-4o")
+        assert result.reason == FailoverReason.content_policy_blocked
+        assert result.retryable is False
+        assert result.should_fallback is True
+
+    def test_anthropic_safety_system_content_policy_blocked(self):
+        # Anthropic safety refusal — distinct phrasing from OpenAI.
+        e = Exception(
+            "Your prompt was flagged by our safety system. Please rephrase "
+            "and try again."
+        )
+        result = classify_api_error(e, provider="anthropic", model="claude-3-5-sonnet")
+        assert result.reason == FailoverReason.content_policy_blocked
+        assert result.retryable is False
+        assert result.should_fallback is True
+
+    def test_azure_content_filter_content_policy_blocked(self):
+        # Azure OpenAI returns ``content_filter`` finish reason / error code
+        # and ``ResponsibleAIPolicyViolation`` in error bodies — both narrow
+        # tokens, not the generic English phrase.
+        e = MockAPIError(
+            "The response was filtered: ResponsibleAIPolicyViolation "
+            "(finish_reason=content_filter).",
+            status_code=400,
+        )
+        result = classify_api_error(e, provider="azure", model="gpt-4o")
+        assert result.reason == FailoverReason.content_policy_blocked
+        assert result.retryable is False
+
    def test_404_model_not_found_still_works(self):
        # Regression guard: the new policy-block check must not swallow
        # genuine model_not_found 404s.
@@ -753,6 +865,19 @@ class TestClassifyApiError:
        result = classify_api_error(e)
        assert result.reason == FailoverReason.context_overflow

+    def test_error_code_model_not_supported_on_free_tier_is_billing(self):
+        e = MockAPIError(
+            "Model unavailable",
+            body={
+                "error": {
+                    "code": "model_not_supported_on_free_tier",
+                    "message": "Model 'gpt-5' is not available on the Free Tier.",
+                }
+            },
+        )
+        result = classify_api_error(e, provider="nous", model="gpt-5")
+        assert result.reason == FailoverReason.billing
+
    # ── Message-only patterns (no status code) ──

    def test_message_billing_pattern(self):
@@ -760,6 +885,11 @@ class TestClassifyApiError:
        result = classify_api_error(e)
        assert result.reason == FailoverReason.billing

+    def test_message_free_tier_model_block_is_billing(self):
+        e = Exception("Model 'gpt-5' is not available on the Free Tier.")
+        result = classify_api_error(e, provider="nous", model="gpt-5")
+        assert result.reason == FailoverReason.billing
+
    def test_message_rate_limit_pattern(self):
        e = Exception("rate limit reached for this model")
        result = classify_api_error(e)
@@ -0,0 +1,25 @@
+from __future__ import annotations
+
+import importlib
+import sys
+
+from agent import jiter_preload
+
+
+def test_preload_jiter_native_extension_loads_sdk_parser_dependency():
+    assert jiter_preload.preload_jiter_native_extension() is True
+    assert "jiter.jiter" in sys.modules
+
+
+def test_preload_jiter_native_extension_is_best_effort(monkeypatch):
+    monkeypatch.setattr(jiter_preload, "_JITER_PRELOADED", False)
+
+    def _raise_missing(name: str):
+        assert name == "jiter.jiter"
+        raise ModuleNotFoundError(name)
+
+    monkeypatch.setattr(importlib, "import_module", _raise_missing)
+
+    assert jiter_preload.preload_jiter_native_extension() is False
+    assert jiter_preload._JITER_PRELOADED is False
+    assert isinstance(jiter_preload._JITER_PRELOAD_ERROR, ModuleNotFoundError)
@@ -131,10 +131,10 @@ class TestDefaultContextLengths:
        for key, value in DEFAULT_CONTEXT_LENGTHS.items():
            if "claude" not in key:
                continue
-            # Claude 4.6+ models (4.6 and 4.7) have 1M context at standard
+            # Claude 4.6+ models (4.6, 4.7, 4.8) have 1M context at standard
            # API pricing (no long-context premium).  Older Claude 4.x and
            # 3.x models cap at 200k.
-            if any(tag in key for tag in ("4.6", "4-6", "4.7", "4-7")):
+            if any(tag in key for tag in ("4.6", "4-6", "4.7", "4-7", "4.8", "4-8")):
                assert value == 1000000, f"{key} should be 1000000"
            else:
                assert value == 200000, f"{key} should be 200000"
@@ -378,127 +378,57 @@ class TestDiscordMentions:
        assert result.endswith(" said hello")


-class TestUrlQueryParamRedaction:
-    """URL query-string redaction (ported from nearai/ironclaw#2529).
-
-    Catches opaque tokens that don't match vendor prefix regexes by
-    matching on parameter NAME rather than value shape.
+class TestWebUrlsNotRedacted:
+    """Web URLs (http/https/wss) pass through unchanged — magic-link
+    checkouts, OAuth callbacks the agent is meant to follow, and pre-signed
+    share URLs must reach the tool intact. Known credential shapes inside
+    URLs (sk-, ghp_, JWTs) are still caught by the prefix and JWT regexes.
+    DB connection-string passwords are still caught by _DB_CONNSTR_RE.
    """

-    def test_oauth_callback_code(self):
+    def test_oauth_callback_code_passes_through(self):
        text = "GET https://api.example.com/oauth/cb?code=abc123xyz789&state=csrf_ok"
-        result = redact_sensitive_text(text)
-        assert "abc123xyz789" not in result
-        assert "code=***" in result
-        assert "state=csrf_ok" in result  # state is not sensitive
-
-    def test_access_token_query(self):
-        text = "Fetching https://example.com/api?access_token=opaque_value_here_1234&format=json"
-        result = redact_sensitive_text(text)
-        assert "opaque_value_here_1234" not in result
-        assert "access_token=***" in result
-        assert "format=json" in result
-
-    def test_refresh_token_query(self):
-        text = "https://auth.example.com/token?refresh_token=somerefresh&grant_type=refresh"
-        result = redact_sensitive_text(text)
-        assert "somerefresh" not in result
-        assert "grant_type=refresh" in result
-
-    def test_api_key_query(self):
-        text = "https://api.example.com/v1/data?api_key=kABCDEF12345&limit=10"
-        result = redact_sensitive_text(text)
-        assert "kABCDEF12345" not in result
-        assert "limit=10" in result
-
-    def test_presigned_signature(self):
-        text = "https://s3.amazonaws.com/bucket/k?signature=LONG_PRESIGNED_SIG&id=public"
-        result = redact_sensitive_text(text)
-        assert "LONG_PRESIGNED_SIG" not in result
-        assert "id=public" in result
-
-    def test_case_insensitive_param_names(self):
-        """Lowercase/mixed-case sensitive param names are redacted."""
-        # NOTE: All-caps names like TOKEN= are swallowed by _ENV_ASSIGN_RE
-        # (which matches KEY=value patterns greedily) before URL regex runs.
-        # This test uses lowercase names to isolate URL-query redaction.
-        text = "https://example.com?api_key=abcdef&secret=ghijkl"
-        result = redact_sensitive_text(text)
-        assert "abcdef" not in result
-        assert "ghijkl" not in result
-        assert "api_key=***" in result
-        assert "secret=***" in result
-
-    def test_substring_match_does_not_trigger(self):
-        """`token_count` and `session_id` must NOT match `token` / `session`."""
-        text = "https://example.com/cb?token_count=42&session_id=xyz&foo=bar"
-        result = redact_sensitive_text(text)
-        assert "token_count=42" in result
-        assert "session_id=xyz" in result
-
-    def test_url_without_query_unchanged(self):
-        text = "https://example.com/path/to/resource"
        assert redact_sensitive_text(text) == text

-    def test_url_with_fragment(self):
-        text = "https://example.com/page?token=xyz#section"
-        result = redact_sensitive_text(text)
-        assert "token=xyz" not in result
-        assert "#section" in result
+    def test_access_token_query_passes_through(self):
+        text = "Fetching https://example.com/api?access_token=opaque_value_here_1234&format=json"
+        assert redact_sensitive_text(text) == text

-    def test_websocket_url_query(self):
+    def test_magic_link_checkout_passes_through(self):
+        text = "Open https://checkout.example.com/resume?magic=ABCDEF123456&customer=42"
+        assert redact_sensitive_text(text) == text
+
+    def test_presigned_signature_passes_through(self):
+        text = "https://s3.amazonaws.com/bucket/k?signature=LONG_PRESIGNED_SIG&id=public"
+        assert redact_sensitive_text(text) == text
+
+    def test_https_userinfo_passes_through(self):
+        text = "URL: https://user:supersecretpw@host.example.com/path"
+        assert redact_sensitive_text(text) == text
+
+    def test_websocket_url_query_passes_through(self):
        text = "wss://api.example.com/ws?token=opaqueWsToken123"
-        result = redact_sensitive_text(text)
-        assert "opaqueWsToken123" not in result
+        assert redact_sensitive_text(text) == text

-    def test_http_access_log_relative_request_target_query(self):
+    def test_http_access_log_request_target_passes_through(self):
        text = (
            'INFO aiohttp.access: 127.0.0.1 "POST '
            '/bluebubbles-webhook?password=webhookSecret123&event=new-message '
            'HTTP/1.1" 200 173 "-" "test-client"'
        )
-        result = redact_sensitive_text(text)
-        assert "webhookSecret123" not in result
-        assert "password=***" in result
-        assert "event=new-message" in result
-
-    def test_http_access_log_absolute_request_target_query(self):
-        text = (
-            'INFO aiohttp.access: 127.0.0.1 "GET '
-            'https://example.com/callback?code=oauthCode123&state=csrf-ok '
-            'HTTP/1.1" 200 173 "-" "test-client"'
-        )
-        result = redact_sensitive_text(text)
-        assert "oauthCode123" not in result
-        assert "code=***" in result
-        assert "state=csrf-ok" in result
-
-
-class TestUrlUserinfoRedaction:
-    """URL userinfo (`scheme://user:pass@host`) for non-DB schemes."""
-
-    def test_https_userinfo(self):
-        text = "URL: https://user:supersecretpw@host.example.com/path"
-        result = redact_sensitive_text(text)
-        assert "supersecretpw" not in result
-        assert "https://user:***@host.example.com" in result
-
-    def test_http_userinfo(self):
-        text = "http://admin:plaintextpass@internal.example.com/api"
-        result = redact_sensitive_text(text)
-        assert "plaintextpass" not in result
-
-    def test_ftp_userinfo(self):
-        text = "ftp://user:ftppass@ftp.example.com/file.txt"
-        result = redact_sensitive_text(text)
-        assert "ftppass" not in result
-
-    def test_url_without_userinfo_unchanged(self):
-        text = "https://example.com/path"
        assert redact_sensitive_text(text) == text

-    def test_db_connstr_still_handled(self):
-        """DB schemes are handled by _DB_CONNSTR_RE, not _URL_USERINFO_RE."""
+    def test_known_prefix_inside_url_still_redacted(self):
+        """sk-/ghp_/JWT-shaped values inside a URL are still caught by
+        _PREFIX_RE / _JWT_RE — the carve-out is for opaque tokens only."""
+        text = "https://evil.com/steal?key=sk-" + "a" * 30
+        result = redact_sensitive_text(text)
+        assert "sk-" + "a" * 30 not in result
+
+    def test_db_connstr_password_still_redacted(self):
+        """DB schemes (postgres/mysql/mongodb/redis/amqp) keep their
+        userinfo redaction via _DB_CONNSTR_RE — connection strings are
+        not web URLs the agent navigates to."""
        text = "postgres://admin:dbpass@db.internal:5432/app"
        result = redact_sensitive_text(text)
        assert "dbpass" not in result
@@ -275,8 +275,9 @@ class TestRunTurn:
    def test_turn_start_failure_attaches_redacted_stderr_tail(self):
        """When codex stderr has content (non-OAuth), the tail gets attached
        to the user-facing error so config/provider problems are debuggable
-        instead of just 'Internal error'. Secrets in stderr are redacted
-        via agent.redact(force=True)."""
+        instead of just 'Internal error'. Credential-shaped values in stderr
+        are redacted via agent.redact(force=True); web-URL query params pass
+        through (see fix(redact): pass web URLs through unchanged)."""
        client = FakeClient()
        client.set_stderr_tail([
            "ERROR: provider auth failed",
@@ -299,9 +300,8 @@ class TestRunTurn:
        # Stderr tail attached
        assert "codex stderr" in r.error
        assert "provider auth failed" in r.error
-        # Secrets redacted
+        # Credential-shaped values still redacted (sk- prefix + Bearer header)
        assert "sk-live-deadbeefdeadbeef" not in r.error
-        assert "querysecret12345" not in r.error
        # Non-OAuth → should NOT retire (subprocess JSON-RPC is still healthy).
        assert r.should_retire is False

@@ -271,7 +271,10 @@ def test_codex_provider_replaces_incompatible_default_model(monkeypatch):


 def test_model_flow_nous_prints_subscription_guidance_without_mutating_explicit_tts(monkeypatch, capsys):
-    monkeypatch.setattr("hermes_cli.nous_subscription.managed_nous_tools_enabled", lambda: True)
+    monkeypatch.setattr(
+        "hermes_cli.nous_subscription.managed_nous_tools_enabled",
+        lambda *args, **kwargs: True,
+    )
    config = {
        "model": {"provider": "nous", "default": "claude-opus-4-6"},
        "tts": {"provider": "elevenlabs"},
@@ -306,7 +309,10 @@ def test_model_flow_nous_prints_subscription_guidance_without_mutating_explicit_


 def test_model_flow_nous_offers_tool_gateway_prompt_when_unconfigured(monkeypatch, capsys):
-    monkeypatch.setattr("hermes_cli.nous_subscription.managed_nous_tools_enabled", lambda: True)
+    monkeypatch.setattr(
+        "hermes_cli.nous_subscription.managed_nous_tools_enabled",
+        lambda *args, **kwargs: True,
+    )
    config = {
        "model": {"provider": "nous", "default": "claude-opus-4-6"},
        "tts": {"provider": "edge"},
@@ -0,0 +1,244 @@
+"""Regression tests for the CLI ``/yolo`` in-chat toggle.
+
+Pre-fix bug (issue #33925): ``cli.HermesCLI._toggle_yolo`` mutated only
+``os.environ["HERMES_YOLO_MODE"]``. That env var is captured once at
+module-import time into ``tools.approval._YOLO_MODE_FROZEN`` (security
+hardening: stops prompt-injected skills from flipping the bypass mid-run),
+so the post-startup toggle was a silent no-op. ``/yolo`` advertised "YOLO ON"
+in the status bar while every dangerous command still hit the approval
+prompt. Only ``hermes --yolo`` (process-start env), ``HERMES_YOLO_MODE=1``,
+and ``hermes config set approvals.mode off`` actually bypassed.
+
+The fix routes the CLI toggle through ``enable_session_yolo`` /
+``disable_session_yolo`` (matching the gateway and TUI ``/yolo`` paths) and
+binds ``self.session_id`` as the active approval session key around each
+``run_conversation`` call so ``is_current_session_yolo_enabled()`` resolves
+against the same key the toggle writes under.
+
+We test ``_toggle_yolo`` and ``_is_session_yolo_active`` as unbound methods
+against a minimal stand-in object that exposes only the attribute they
+read (``session_id``). This avoids the heavy ``HermesCLI`` construction
+path used in ``test_cli_init.py``, which is incompatible with this test
+file's path layout — ``HermesCLI.__init__`` imports a lot of optional
+state we don't need here.
+"""
+
+import os
+from types import SimpleNamespace
+from unittest.mock import patch
+
+import pytest
+
+import tools.approval as approval_module
+from cli import HermesCLI
+
+
+SESSION_KEY = "test-cli-yolo-session"
+
+
+@pytest.fixture(autouse=True)
+def _clear_approval_state(monkeypatch):
+    """Clear the YOLO bypass + env var around every test so cases are independent."""
+    monkeypatch.delenv("HERMES_YOLO_MODE", raising=False)
+    approval_module.clear_session(SESSION_KEY)
+    approval_module.clear_session("default")
+    yield
+    approval_module.clear_session(SESSION_KEY)
+    approval_module.clear_session("default")
+
+
+def _make_stand_in(session_id: str = SESSION_KEY) -> SimpleNamespace:
+    """Minimal stand-in exposing only ``session_id``.
+
+    ``_toggle_yolo`` and ``_is_session_yolo_active`` are both pure methods
+    that only read ``self.session_id`` — no other CLI state is touched.
+    Calling them as unbound functions against this stand-in is equivalent
+    to invoking them on a fully-constructed ``HermesCLI`` for the
+    behaviour under test, and avoids the brittle prompt_toolkit / config
+    stubbing required to instantiate ``HermesCLI`` from this test file.
+    """
+    return SimpleNamespace(session_id=session_id)
+
+
+class TestToggleYoloIsSessionScoped:
+    """The CLI /yolo handler must mutate the session-yolo set, not the env var.
+
+    The env var path is dead-on-arrival because ``_YOLO_MODE_FROZEN`` is
+    captured once at module import, long before the CLI's ``/yolo`` command
+    can run.
+    """
+
+    def test_toggle_yolo_enables_session_bypass(self):
+        stand_in = _make_stand_in()
+
+        assert approval_module.is_session_yolo_enabled(SESSION_KEY) is False
+
+        with patch("cli._cprint"):
+            HermesCLI._toggle_yolo(stand_in)
+
+        assert approval_module.is_session_yolo_enabled(SESSION_KEY) is True
+
+    def test_toggle_yolo_disables_session_bypass_on_second_call(self):
+        stand_in = _make_stand_in()
+        with patch("cli._cprint"):
+            HermesCLI._toggle_yolo(stand_in)  # ON
+            assert approval_module.is_session_yolo_enabled(SESSION_KEY) is True
+            HermesCLI._toggle_yolo(stand_in)  # OFF
+            assert approval_module.is_session_yolo_enabled(SESSION_KEY) is False
+
+    def test_toggle_yolo_does_not_mutate_env_var(self):
+        """Toggling /yolo must not write ``HERMES_YOLO_MODE`` — that path is
+        frozen at import time and would mislead anyone reading the env later
+        (subprocesses, status bars wired to the env, the relaunch flag list)."""
+        stand_in = _make_stand_in()
+        with patch("cli._cprint"):
+            HermesCLI._toggle_yolo(stand_in)
+
+        assert os.environ.get("HERMES_YOLO_MODE") is None
+
+    def test_toggle_yolo_falls_back_to_default_when_session_id_missing(self):
+        """An edge case during CLI bootstrap: a ``/yolo`` triggered before the
+        session id is set should not blow up, and should land under the
+        ``default`` session key so the bypass still takes effect for any code
+        that resolves against the default key."""
+        stand_in = _make_stand_in(session_id="")
+        with patch("cli._cprint"):
+            HermesCLI._toggle_yolo(stand_in)
+
+        assert approval_module.is_session_yolo_enabled("default") is True
+
+    def test_two_independent_sessions_are_isolated(self):
+        """``/yolo`` toggled in one session must not bypass approvals in
+        another session — mirrors the gateway-side invariant."""
+        cli_a = _make_stand_in(session_id="session-yolo-a")
+        cli_b = _make_stand_in(session_id="session-yolo-b")
+
+        try:
+            with patch("cli._cprint"):
+                HermesCLI._toggle_yolo(cli_a)
+
+            assert approval_module.is_session_yolo_enabled("session-yolo-a") is True
+            assert approval_module.is_session_yolo_enabled("session-yolo-b") is False
+        finally:
+            approval_module.clear_session("session-yolo-a")
+            approval_module.clear_session("session-yolo-b")
+
+
+class TestIsSessionYoloActiveHelper:
+    """The status-bar helper must read the live session-yolo state, not the
+    env var (which is the bug class this PR fixes)."""
+
+    def test_helper_reflects_toggle(self):
+        stand_in = _make_stand_in()
+
+        assert HermesCLI._is_session_yolo_active(stand_in) is False
+
+        with patch("cli._cprint"):
+            HermesCLI._toggle_yolo(stand_in)
+
+        assert HermesCLI._is_session_yolo_active(stand_in) is True
+
+        with patch("cli._cprint"):
+            HermesCLI._toggle_yolo(stand_in)
+
+        assert HermesCLI._is_session_yolo_active(stand_in) is False
+
+    def test_helper_honors_frozen_yolo_mode(self):
+        """``hermes --yolo`` sets ``HERMES_YOLO_MODE`` before tool imports, so
+        ``_YOLO_MODE_FROZEN`` ends up True. The status bar should still
+        reflect YOLO on in that case even when the session toggle is off."""
+        stand_in = _make_stand_in()
+
+        with patch.object(approval_module, "_YOLO_MODE_FROZEN", True):
+            assert HermesCLI._is_session_yolo_active(stand_in) is True
+
+
+class TestToggleYoloEndToEnd:
+    """End-to-end: a dangerous command must auto-approve through the same
+    ``check_all_command_guards`` path the terminal tool uses."""
+
+    def test_toggle_yolo_bypasses_dangerous_command_check(self):
+        stand_in = _make_stand_in()
+
+        token = approval_module.set_current_session_key(SESSION_KEY)
+        try:
+            with patch("cli._cprint"):
+                HermesCLI._toggle_yolo(stand_in)  # YOLO ON
+
+            result = approval_module.check_all_command_guards(
+                "rm -rf /tmp/scratch-xyzzy", "local",
+            )
+            assert result["approved"] is True, (
+                f"YOLO toggle should auto-approve dangerous commands, got: {result}"
+            )
+        finally:
+            approval_module.reset_current_session_key(token)
+
+
+class TestIsSessionYoloActiveAttrSafety:
+    """The status-bar helper runs against partially-constructed CLI fixtures
+    (tests use ``HermesCLI.__new__(HermesCLI)`` to skip ``__init__``). It must
+    not raise ``AttributeError`` when ``session_id`` is absent — the
+    status-bar builders swallow exceptions silently and lose every field
+    after the failure, producing a regression that's hard to track back to
+    the helper."""
+
+    def test_helper_survives_missing_session_id_attr(self):
+        # SimpleNamespace WITHOUT session_id mimics __new__-built fixtures.
+        from types import SimpleNamespace
+        no_attr = SimpleNamespace()
+        # Must return False, not raise.
+        assert HermesCLI._is_session_yolo_active(no_attr) is False
+
+
+class TestSessionRotationTransfersYolo:
+    """When the CLI's ``session_id`` rotates mid-run (``/branch``, auto
+    compression continuation), YOLO state keyed under the old id must move
+    to the new id. Otherwise the user's ``/yolo ON`` silently reverts on
+    the next turn — the same UX failure mode this PR set out to fix.
+    Mirrors ``tui_gateway/server.py`` ~line 1297-1305."""
+
+    def test_transfer_moves_yolo_to_new_session(self):
+        stand_in = _make_stand_in(session_id="old-id")
+        try:
+            approval_module.enable_session_yolo("old-id")
+            assert approval_module.is_session_yolo_enabled("old-id") is True
+
+            HermesCLI._transfer_session_yolo(stand_in, "old-id", "new-id")
+
+            assert approval_module.is_session_yolo_enabled("new-id") is True
+            assert approval_module.is_session_yolo_enabled("old-id") is False
+        finally:
+            approval_module.clear_session("old-id")
+            approval_module.clear_session("new-id")
+
+    def test_transfer_is_noop_when_yolo_was_off(self):
+        stand_in = _make_stand_in(session_id="old-id")
+        try:
+            HermesCLI._transfer_session_yolo(stand_in, "old-id", "new-id")
+            assert approval_module.is_session_yolo_enabled("new-id") is False
+            assert approval_module.is_session_yolo_enabled("old-id") is False
+        finally:
+            approval_module.clear_session("old-id")
+            approval_module.clear_session("new-id")
+
+    def test_transfer_is_noop_when_ids_match(self):
+        stand_in = _make_stand_in(session_id="same-id")
+        try:
+            approval_module.enable_session_yolo("same-id")
+            HermesCLI._transfer_session_yolo(stand_in, "same-id", "same-id")
+            # Must NOT have been disabled — same-id == same-id is a no-op,
+            # not a "disable then re-enable" round-trip.
+            assert approval_module.is_session_yolo_enabled("same-id") is True
+        finally:
+            approval_module.clear_session("same-id")
+
+    def test_transfer_handles_empty_inputs_safely(self):
+        stand_in = _make_stand_in(session_id="x")
+        # Both directions of empty input should be safe no-ops; nothing
+        # to transfer from "" / to "".
+        HermesCLI._transfer_session_yolo(stand_in, "", "new")
+        HermesCLI._transfer_session_yolo(stand_in, "old", "")
+        # Neither key should have been touched.
+        assert approval_module.is_session_yolo_enabled("new") is False
+        assert approval_module.is_session_yolo_enabled("old") is False
@@ -1450,9 +1450,19 @@ class TestRunJobConfigLogging:
            "prompt": "hello",
        }

+        # Mock heavy post-yaml work so the test only exercises the warning
+        # path. Without these mocks, _run_job_impl continues into provider
+        # resolution and MCP discovery, both of which can spawn subprocesses
+        # / hit the network and have caused this test to time out on CI
+        # (>30s wall clock) under load. See PR #33661 follow-up.
        with patch("cron.scheduler._hermes_home", tmp_path), \
             patch("cron.scheduler._resolve_origin", return_value=None), \
             patch("dotenv.load_dotenv"), \
+             patch("hermes_cli.runtime_provider.resolve_runtime_provider",
+                   return_value={"provider": "openrouter", "api_key": "x",
+                                 "base_url": "https://example.invalid",
+                                 "api_mode": "chat_completions"}), \
+             patch("tools.mcp_tool.discover_mcp_tools", return_value=[]), \
             patch("run_agent.AIAgent") as mock_agent_cls:
            mock_agent = MagicMock()
            mock_agent.run_conversation.return_value = {"final_response": "ok"}
@@ -1482,6 +1492,11 @@ class TestRunJobConfigLogging:
        with patch("cron.scheduler._hermes_home", tmp_path), \
             patch("cron.scheduler._resolve_origin", return_value=None), \
             patch("dotenv.load_dotenv"), \
+             patch("hermes_cli.runtime_provider.resolve_runtime_provider",
+                   return_value={"provider": "openrouter", "api_key": "x",
+                                 "base_url": "https://example.invalid",
+                                 "api_mode": "chat_completions"}), \
+             patch("tools.mcp_tool.discover_mcp_tools", return_value=[]), \
             patch("run_agent.AIAgent") as mock_agent_cls:
            mock_agent = MagicMock()
            mock_agent.run_conversation.return_value = {"final_response": "ok"}
@@ -1,7 +1,7 @@
 """Tests for the API server bind-address startup guard.

 Validates that is_network_accessible() correctly classifies addresses and
-that connect() refuses to start on non-loopback without API_SERVER_KEY.
+that connect() refuses to start without API_SERVER_KEY.
 """

 import socket
@@ -111,13 +111,14 @@ class TestConnectBindGuard:
        result = await adapter.connect()
        assert result is False

-    def test_allows_loopback_without_key(self):
-        """Loopback with no key should pass the guard."""
+    @pytest.mark.asyncio
+    async def test_refuses_loopback_without_key(self):
+        """Loopback binds are still an auth boundary and require API_SERVER_KEY."""
        adapter = APIServerAdapter(PlatformConfig(enabled=True, extra={"host": "127.0.0.1"}))
        assert adapter._api_key == ""
-        # The guard condition: is_network_accessible(host) AND NOT api_key
-        # For loopback, is_network_accessible is False so the guard does not block.
        assert is_network_accessible(adapter._host) is False
+        result = await adapter.connect()
+        assert result is False

    @pytest.mark.asyncio
    async def test_allows_wildcard_with_key(self):
@@ -851,6 +851,27 @@ async def test_discord_per_user_channel_backfills_too(adapter, monkeypatch):
    assert event.channel_context == "[Recent channel messages]\n[Alice] context"


+@pytest.mark.asyncio
+async def test_discord_participated_thread_backfills_without_mention(adapter, monkeypatch):
+    """Known threads still need recent thread context when mention gating is bypassed."""
+    monkeypatch.setenv("DISCORD_REQUIRE_MENTION", "true")
+    monkeypatch.delenv("DISCORD_FREE_RESPONSE_CHANNELS", raising=False)
+    monkeypatch.delenv("DISCORD_THREAD_REQUIRE_MENTION", raising=False)
+    adapter.config.extra["history_backfill"] = True
+    adapter._fetch_channel_context = AsyncMock(return_value="[Recent channel messages]\n[Alice] thread context")
+
+    thread = FakeThread(channel_id=456, name="follow-up")
+    adapter._threads.mark("456")
+
+    message = make_message(channel=thread, content="follow-up without mention")
+    await adapter._handle_message(message)
+
+    adapter._fetch_channel_context.assert_awaited_once()
+    event = adapter.handle_message.await_args.args[0]
+    assert event.text == "follow-up without mention"
+    assert event.channel_context == "[Recent channel messages]\n[Alice] thread context"
+
+
@pytest.mark.asyncio
 async def test_discord_dm_does_not_backfill(adapter, monkeypatch):
    """DMs skip backfill — every DM triggers the bot, so there's no mention gap."""
@@ -884,3 +905,25 @@ async def test_discord_dm_does_not_backfill(adapter, monkeypatch):
        assert event.channel_context is None


+@pytest.mark.asyncio
+async def test_discord_auto_thread_skips_backfill(adapter, monkeypatch):
+    """Auto-created threads skip backfill — the thread is brand new with no prior context."""
+    monkeypatch.setenv("DISCORD_REQUIRE_MENTION", "true")
+    monkeypatch.setenv("DISCORD_AUTO_THREAD", "true")
+    monkeypatch.delenv("DISCORD_NO_THREAD_CHANNELS", raising=False)
+    monkeypatch.delenv("DISCORD_FREE_RESPONSE_CHANNELS", raising=False)
+    adapter.config.extra["history_backfill"] = True
+
+    fake_thread = FakeThread(channel_id=777, name="auto-thread")
+    adapter._auto_create_thread = AsyncMock(return_value=fake_thread)
+    adapter._fetch_channel_context = AsyncMock(return_value="[Recent channel messages]\n[Alice] noise")
+
+    bot_user = adapter._client.user
+    parent = FakeTextChannel(channel_id=200, name="general")
+    message = make_message(channel=parent, content="hello", mentions=[bot_user])
+    await adapter._handle_message(message)
+
+    adapter._auto_create_thread.assert_awaited_once()
+    adapter._fetch_channel_context.assert_not_awaited()
+
+
@@ -624,6 +624,13 @@ class _FakeTextChannel:
        self.guild = SimpleNamespace(name=guild_name, id=1)
        self.topic = None

+    def history(self, *args, **kwargs):
+        async def _empty():
+            return
+            yield  # pragma: no cover — make this an async generator
+
+        return _empty()
+

 class _FakeThreadChannel(_discord_mod.Thread):
    """isinstance(ch, discord.Thread) → True."""
@@ -636,6 +643,13 @@ class _FakeThreadChannel(_discord_mod.Thread):
        self.topic = None
        self.parent = SimpleNamespace(id=parent_id, name="general", guild=SimpleNamespace(name=guild_name, id=1))

+    def history(self, *args, **kwargs):
+        async def _empty():
+            return
+            yield  # pragma: no cover — make this an async generator
+
+        return _empty()
+

 def _fake_message(channel, *, content="Hello", author_id=42, display_name="Jezza"):
    return SimpleNamespace(
@@ -11,6 +11,7 @@ from gateway.platforms.msgraph_webhook import AIOHTTP_AVAILABLE, MSGraphWebhookA

 def _make_adapter(**extra_overrides) -> MSGraphWebhookAdapter:
    extra = {
+        "host": "127.0.0.1",
        "client_state": "expected-client-state",
        "accepted_resources": ["communications/onlineMeetings"],
    }
@@ -80,6 +81,27 @@ class TestMSGraphValidationHandshake:
        # is_connected is a @property on the base adapter, not a method.
        assert adapter.is_connected is False

+    @pytest.mark.anyio
+    async def test_connect_requires_source_allowlist_on_public_bind(self):
+        if not AIOHTTP_AVAILABLE:
+            pytest.skip("aiohttp not installed")
+        adapter = _make_adapter(host="0.0.0.0", port=0, allowed_source_cidrs=[])
+        connected = await adapter.connect()
+        assert connected is False
+        assert adapter.is_connected is False
+
+    @pytest.mark.anyio
+    async def test_connect_allows_loopback_without_source_allowlist(self):
+        if not AIOHTTP_AVAILABLE:
+            pytest.skip("aiohttp not installed")
+        adapter = _make_adapter(host="127.0.0.1", port=0, allowed_source_cidrs=[])
+        try:
+            connected = await adapter.connect()
+            assert connected is True
+            assert adapter.is_connected is True
+        finally:
+            await adapter.disconnect()
+
    @pytest.mark.anyio
    async def test_validation_token_echo_on_get(self):
        adapter = _make_adapter()
@@ -381,9 +403,9 @@ class TestMSGraphNotifications:

 class TestMSGraphSourceIPAllowlist:
    @pytest.mark.anyio
-    async def test_disabled_by_default_allows_all(self):
-        """Empty allowlist preserves pre-existing behavior (dev tunnels, localhost)."""
-        adapter = _make_adapter()  # no allowed_source_cidrs set
+    async def test_public_bind_without_allowlist_fails_closed(self):
+        """Public binds must not accept requests until a source allowlist is configured."""
+        adapter = _make_adapter(host="0.0.0.0", allowed_source_cidrs=[])
        payload = {
            "value": [
                {
@@ -396,6 +418,24 @@ class TestMSGraphSourceIPAllowlist:
        resp = await adapter._handle_notification(
            _FakeRequest(json_payload=payload, remote="203.0.113.99")
        )
+        assert resp.status == 403
+
+    @pytest.mark.anyio
+    async def test_loopback_bind_without_allowlist_still_accepts_local_requests(self):
+        """Loopback-only listeners may rely on local proxying/tunnels instead of CIDRs."""
+        adapter = _make_adapter(host="127.0.0.1", allowed_source_cidrs=[])
+        payload = {
+            "value": [
+                {
+                    "id": "notif-ip-local",
+                    "resource": "communications/onlineMeetings/m",
+                    "clientState": "expected-client-state",
+                }
+            ]
+        }
+        resp = await adapter._handle_notification(
+            _FakeRequest(json_payload=payload, remote="127.0.0.1")
+        )
        assert resp.status == 202

    @pytest.mark.anyio
@@ -441,6 +481,13 @@ class TestMSGraphSourceIPAllowlist:
        )
        assert resp.status == 403

+    @pytest.mark.anyio
+    async def test_health_endpoint_also_respects_allowlist(self):
+        """The readiness endpoint should not leak counters to arbitrary sources."""
+        adapter = _make_adapter(allowed_source_cidrs=["10.0.0.0/8"])
+        resp = await adapter._handle_health(_FakeRequest(remote="203.0.113.99"))
+        assert resp.status == 403
+
    @pytest.mark.anyio
    async def test_invalid_cidr_entries_are_ignored_at_init(self):
        """Malformed CIDR strings should log a warning and be ignored, not crash."""
@@ -0,0 +1,267 @@
+"""Tests for the planned-stop marker watcher thread (gateway/run.py).
+
+The watcher is the Windows-fallback path for the v0.13.0 session-resume
+feature — on Windows ``asyncio.add_signal_handler`` raises
+NotImplementedError, so the SIGTERM signal handler never runs and the
+shutdown drain (which writes ``resume_pending=True``) is skipped. The
+watcher closes this gap by polling for the planned-stop marker file
+and translating its existence into the same shutdown-handler call a
+real SIGTERM would have produced.
+
+See issue #33778 for the original Windows session-loss bug report.
+"""
+
+import asyncio
+import threading
+import time
+from types import SimpleNamespace
+from unittest.mock import MagicMock
+
+import pytest
+
+from gateway.run import _run_planned_stop_watcher
+
+
+class _FakeRunner:
+    """Stand-in for GatewayRunner — only exposes the two flags the watcher reads."""
+
+    def __init__(self, *, running: bool = True, draining: bool = False):
+        self._running = running
+        self._draining = draining
+
+
+def _make_loop_capturing_calls():
+    """Build a fake asyncio loop whose call_soon_threadsafe records its args."""
+    loop = MagicMock(spec=asyncio.AbstractEventLoop)
+    loop._captured = []
+
+    def fake_call_soon_threadsafe(fn, *args):
+        loop._captured.append((fn, args))
+
+    loop.call_soon_threadsafe = fake_call_soon_threadsafe
+    return loop
+
+
+def test_watcher_fires_shutdown_when_marker_appears(tmp_path, monkeypatch):
+    """When the marker file exists, the watcher must call the shutdown handler."""
+    marker = tmp_path / ".gateway-planned-stop.json"
+
+    # Patch the marker-path resolver so the watcher polls our temp location.
+    from gateway import status as status_mod
+    monkeypatch.setattr(status_mod, "_get_planned_stop_marker_path", lambda: marker)
+
+    runner = _FakeRunner(running=True, draining=False)
+    loop = _make_loop_capturing_calls()
+    shutdown_handler = MagicMock(name="shutdown_signal_handler")
+    stop_event = threading.Event()
+
+    # Drop the marker before the thread starts.
+    marker.write_text('{"target_pid": 1234}', encoding="utf-8")
+
+    watcher = threading.Thread(
+        target=_run_planned_stop_watcher,
+        args=(stop_event, runner, loop, shutdown_handler),
+        kwargs={"poll_interval": 0.05},
+        daemon=True,
+    )
+    watcher.start()
+    watcher.join(timeout=2.0)
+
+    assert not watcher.is_alive(), "Watcher should exit after firing"
+    assert len(loop._captured) == 1, (
+        f"Expected exactly one shutdown invocation, got {loop._captured}"
+    )
+    fn, args = loop._captured[0]
+    assert fn is shutdown_handler
+    # The handler must be called with signal=None (planned stop sentinel).
+    assert args == (None,)
+
+
+def test_watcher_does_not_fire_when_marker_absent(tmp_path, monkeypatch):
+    """No marker = no shutdown call. Watcher just spins until stop_event."""
+    marker = tmp_path / ".gateway-planned-stop.json"
+    # Deliberately do NOT create the marker.
+
+    from gateway import status as status_mod
+    monkeypatch.setattr(status_mod, "_get_planned_stop_marker_path", lambda: marker)
+
+    runner = _FakeRunner(running=True, draining=False)
+    loop = _make_loop_capturing_calls()
+    shutdown_handler = MagicMock()
+    stop_event = threading.Event()
+
+    watcher = threading.Thread(
+        target=_run_planned_stop_watcher,
+        args=(stop_event, runner, loop, shutdown_handler),
+        kwargs={"poll_interval": 0.05},
+        daemon=True,
+    )
+    watcher.start()
+    time.sleep(0.3)  # let it poll a few times
+    stop_event.set()
+    watcher.join(timeout=2.0)
+
+    assert not watcher.is_alive()
+    assert loop._captured == [], (
+        f"No marker present, but watcher fired shutdown: {loop._captured}"
+    )
+    shutdown_handler.assert_not_called()
+
+
+def test_watcher_skips_when_runner_already_draining(tmp_path, monkeypatch):
+    """If shutdown is already in progress, don't re-fire the handler.
+
+    This prevents a race where the SIGTERM handler is mid-drain and the
+    watcher would double-tap the shutdown path. We check ``_draining``
+    so the watcher backs off once any shutdown is in flight.
+    """
+    marker = tmp_path / ".gateway-planned-stop.json"
+    marker.write_text('{"target_pid": 1234}', encoding="utf-8")
+
+    from gateway import status as status_mod
+    monkeypatch.setattr(status_mod, "_get_planned_stop_marker_path", lambda: marker)
+
+    # Already draining — watcher should be a no-op.
+    runner = _FakeRunner(running=False, draining=True)
+    loop = _make_loop_capturing_calls()
+    shutdown_handler = MagicMock()
+    stop_event = threading.Event()
+
+    watcher = threading.Thread(
+        target=_run_planned_stop_watcher,
+        args=(stop_event, runner, loop, shutdown_handler),
+        kwargs={"poll_interval": 0.05},
+        daemon=True,
+    )
+    watcher.start()
+    time.sleep(0.2)
+    stop_event.set()
+    watcher.join(timeout=2.0)
+
+    assert loop._captured == [], "Watcher fired while runner was already draining"
+
+
+def test_watcher_skips_when_runner_not_started(tmp_path, monkeypatch):
+    """If the runner hasn't started, the marker is for a previous instance —
+    we shouldn't shutdown a not-yet-running gateway.
+    """
+    marker = tmp_path / ".gateway-planned-stop.json"
+    marker.write_text('{"target_pid": 9999}', encoding="utf-8")
+
+    from gateway import status as status_mod
+    monkeypatch.setattr(status_mod, "_get_planned_stop_marker_path", lambda: marker)
+
+    runner = _FakeRunner(running=False, draining=False)
+    loop = _make_loop_capturing_calls()
+    shutdown_handler = MagicMock()
+    stop_event = threading.Event()
+
+    watcher = threading.Thread(
+        target=_run_planned_stop_watcher,
+        args=(stop_event, runner, loop, shutdown_handler),
+        kwargs={"poll_interval": 0.05},
+        daemon=True,
+    )
+    watcher.start()
+    time.sleep(0.2)
+    stop_event.set()
+    watcher.join(timeout=2.0)
+
+    assert loop._captured == [], "Watcher fired before runner was running"
+
+
+def test_watcher_responds_to_stop_event_promptly(tmp_path, monkeypatch):
+    """Setting stop_event must exit the watcher within ~poll_interval seconds."""
+    marker = tmp_path / ".gateway-planned-stop.json"
+    from gateway import status as status_mod
+    monkeypatch.setattr(status_mod, "_get_planned_stop_marker_path", lambda: marker)
+
+    runner = _FakeRunner(running=True, draining=False)
+    loop = _make_loop_capturing_calls()
+    stop_event = threading.Event()
+
+    watcher = threading.Thread(
+        target=_run_planned_stop_watcher,
+        args=(stop_event, runner, loop, MagicMock()),
+        kwargs={"poll_interval": 0.1},
+        daemon=True,
+    )
+    watcher.start()
+    time.sleep(0.05)
+    started_stop = time.monotonic()
+    stop_event.set()
+    watcher.join(timeout=2.0)
+    elapsed = time.monotonic() - started_stop
+
+    assert not watcher.is_alive()
+    assert elapsed < 0.5, f"Watcher took {elapsed:.2f}s to honour stop_event"
+
+
+def test_watcher_fires_only_once_when_marker_persists(tmp_path, monkeypatch):
+    """Marker file existing for multiple polls must NOT spam the handler.
+
+    The watcher fires once and exits its loop (the shutdown handler is
+    responsible for consuming the marker on its own thread). If we
+    re-fired on every tick, the handler would be invoked dozens of
+    times before the gateway actually shuts down.
+    """
+    marker = tmp_path / ".gateway-planned-stop.json"
+    marker.write_text('{"target_pid": 1234}', encoding="utf-8")
+
+    from gateway import status as status_mod
+    monkeypatch.setattr(status_mod, "_get_planned_stop_marker_path", lambda: marker)
+
+    runner = _FakeRunner(running=True, draining=False)
+    loop = _make_loop_capturing_calls()
+    stop_event = threading.Event()
+
+    watcher = threading.Thread(
+        target=_run_planned_stop_watcher,
+        args=(stop_event, runner, loop, MagicMock()),
+        kwargs={"poll_interval": 0.05},
+        daemon=True,
+    )
+    watcher.start()
+    # Let the watcher tick several times — but it should exit after the first fire.
+    watcher.join(timeout=1.0)
+
+    assert not watcher.is_alive()
+    assert len(loop._captured) == 1, (
+        f"Watcher fired {len(loop._captured)} times; should fire once "
+        f"and exit (events={loop._captured})"
+    )
+
+
+def test_watcher_tolerates_marker_path_resolution_errors(tmp_path, monkeypatch, caplog):
+    """If _get_planned_stop_marker_path() raises, the watcher logs and continues."""
+    from gateway import status as status_mod
+
+    call_count = [0]
+    def explode():
+        call_count[0] += 1
+        # First call (the one outside the loop, at thread start) is fine —
+        # but subsequent .exists() calls on a corrupt Path could explode.
+        if call_count[0] == 1:
+            return tmp_path / "nonexistent"
+        raise OSError("filesystem failed")
+
+    monkeypatch.setattr(status_mod, "_get_planned_stop_marker_path", explode)
+
+    runner = _FakeRunner(running=True, draining=False)
+    loop = _make_loop_capturing_calls()
+    stop_event = threading.Event()
+
+    watcher = threading.Thread(
+        target=_run_planned_stop_watcher,
+        args=(stop_event, runner, loop, MagicMock()),
+        kwargs={"poll_interval": 0.05},
+        daemon=True,
+    )
+    watcher.start()
+    time.sleep(0.2)
+    stop_event.set()
+    watcher.join(timeout=2.0)
+
+    assert not watcher.is_alive(), "Watcher should still honour stop_event after errors"
+    # No shutdown fired because the marker never reported existence.
+    assert loop._captured == []
@@ -368,6 +368,11 @@ class TestMediaDeliveryPathValidation:
            "gateway.platforms.base.MEDIA_DELIVERY_SAFE_ROOTS",
            tuple(roots),
        )
+        # All tests in this class cover strict-mode behavior (allowlist +
+        # recency window + denylist). Force strict on so they keep
+        # exercising the legacy path even though the public default
+        # flipped to off in 2026-05.
+        monkeypatch.setenv("HERMES_MEDIA_DELIVERY_STRICT", "1")
        # Disable recency-based trust by default so the original allowlist
        # tests continue to exercise the strict-allowlist path. Tests that
        # specifically cover recency trust re-enable it themselves.
@@ -536,6 +541,149 @@ class TestMediaDeliveryPathValidation:
        assert out == [str(fresh.resolve())]


+class TestMediaDeliveryDefaultMode:
+    """Default (non-strict) mode — denylist gates delivery, nothing else.
+
+    Symmetric with inbound delivery: Telegram/Discord/Slack accept any
+    document type the user uploads, and the agent can hand back any file
+    that isn't a credential. Strict mode is opt-in for operators running
+    public-facing gateways.
+    """
+
+    def _patch_roots(self, monkeypatch, *roots):
+        # Empty cache allowlist so the only positive path through
+        # validate_media_delivery_path in these tests is the
+        # default-mode "anything not denied" branch.
+        monkeypatch.setattr(
+            "gateway.platforms.base.MEDIA_DELIVERY_SAFE_ROOTS",
+            tuple(roots),
+        )
+        # Pin strict OFF — the public default. Tests that exercise the
+        # strict path live in TestMediaDeliveryPathValidation.
+        monkeypatch.delenv("HERMES_MEDIA_DELIVERY_STRICT", raising=False)
+        monkeypatch.delenv("HERMES_MEDIA_ALLOW_DIRS", raising=False)
+
+    def test_accepts_stale_file_outside_allowlist(self, tmp_path, monkeypatch):
+        """The motivating case — agent says ``MEDIA:/home/user/notes.md``
+        for an .md it has been working with for hours. Strict mode would
+        reject this (outside allowlist, outside recency window). Default
+        mode delivers it.
+        """
+        self._patch_roots(monkeypatch)
+
+        notes = tmp_path / "notes.md"
+        notes.write_text("# Old notes\n")
+        old_mtime = time.time() - 7200  # 2 hours ago — far outside any window
+        os.utime(notes, (old_mtime, old_mtime))
+
+        assert BasePlatformAdapter.validate_media_delivery_path(str(notes)) == str(notes.resolve())
+
+    def test_accepts_any_extension_not_on_denylist(self, tmp_path, monkeypatch):
+        """No extension allowlist — .md, .txt, .json, .py all deliver."""
+        self._patch_roots(monkeypatch)
+
+        for name in ("report.md", "log.txt", "data.json", "script.py", "blob.bin"):
+            f = tmp_path / name
+            f.write_bytes(b"x")
+            assert BasePlatformAdapter.validate_media_delivery_path(str(f)) == str(f.resolve())
+
+    def test_denylist_still_blocks_credentials(self, tmp_path, monkeypatch):
+        """Default mode is permissive but not naive — credential paths
+        remain blocked. Simulate $HOME so ~/.ssh resolves into tmp_path.
+        """
+        self._patch_roots(monkeypatch)
+
+        fake_home = tmp_path / "home"
+        ssh_dir = fake_home / ".ssh"
+        ssh_dir.mkdir(parents=True)
+        secret = ssh_dir / "id_rsa"
+        secret.write_bytes(b"-----BEGIN ...")
+        monkeypatch.setenv("HOME", str(fake_home))
+
+        assert BasePlatformAdapter.validate_media_delivery_path(str(secret)) is None
+
+    def test_denylist_blocks_system_prefixes(self, tmp_path, monkeypatch):
+        """Files under /etc, /proc, /sys, /root, /boot, /var/{log,lib,run}
+        are denied. We construct the test by patching the denylist root
+        to a tmp dir so we don't need to read /etc.
+        """
+        self._patch_roots(monkeypatch)
+
+        fake_etc = tmp_path / "fake-etc"
+        fake_etc.mkdir()
+        secret = fake_etc / "shadow"
+        secret.write_bytes(b"root:!:0:0::/root:/bin/sh")
+
+        monkeypatch.setattr(
+            "gateway.platforms.base._MEDIA_DELIVERY_DENIED_PREFIXES",
+            (str(fake_etc),),
+        )
+
+        assert BasePlatformAdapter.validate_media_delivery_path(str(secret)) is None
+
+    def test_denylist_blocks_hermes_credentials(self, tmp_path, monkeypatch):
+        """~/.hermes/.env and ~/.hermes/auth.json stay blocked even in
+        default mode. They live under $HOME (not the system prefix list)
+        so this exercises the home-relative denied paths.
+        """
+        self._patch_roots(monkeypatch)
+
+        fake_home = tmp_path / "home"
+        hermes_dir = fake_home / ".hermes"
+        hermes_dir.mkdir(parents=True)
+        env_file = hermes_dir / ".env"
+        env_file.write_text("OPENAI_API_KEY=sk-...")
+        monkeypatch.setenv("HOME", str(fake_home))
+        monkeypatch.setattr(
+            "gateway.platforms.base._HERMES_HOME",
+            hermes_dir,
+        )
+
+        assert BasePlatformAdapter.validate_media_delivery_path(str(env_file)) is None
+
+    def test_strict_mode_envvar_restores_legacy_behavior(self, tmp_path, monkeypatch):
+        """Setting HERMES_MEDIA_DELIVERY_STRICT=1 reactivates the older
+        allowlist+recency logic. A stale file outside the allowlist is
+        rejected.
+        """
+        self._patch_roots(monkeypatch)
+        monkeypatch.setenv("HERMES_MEDIA_DELIVERY_STRICT", "1")
+        monkeypatch.setenv("HERMES_MEDIA_TRUST_RECENT_FILES", "0")
+
+        stale = tmp_path / "old.pdf"
+        stale.write_bytes(b"%PDF-1.4")
+        old_mtime = time.time() - 7200
+        os.utime(stale, (old_mtime, old_mtime))
+
+        assert BasePlatformAdapter.validate_media_delivery_path(str(stale)) is None
+
+    def test_strict_mode_truthy_aliases(self, monkeypatch, tmp_path):
+        """``HERMES_MEDIA_DELIVERY_STRICT=true|yes|on|1`` all enable strict mode."""
+        self._patch_roots(monkeypatch)
+        from gateway.platforms.base import _media_delivery_strict_mode
+
+        for raw in ("1", "true", "TRUE", "yes", "on"):
+            monkeypatch.setenv("HERMES_MEDIA_DELIVERY_STRICT", raw)
+            assert _media_delivery_strict_mode() is True
+
+        for raw in ("0", "false", "no", "off", ""):
+            monkeypatch.setenv("HERMES_MEDIA_DELIVERY_STRICT", raw)
+            assert _media_delivery_strict_mode() is False
+
+    def test_filter_passes_default_files_through(self, tmp_path, monkeypatch):
+        """End-to-end: filter_local_delivery_paths accepts a stale .md in
+        default mode where strict mode would drop it.
+        """
+        self._patch_roots(monkeypatch)
+
+        notes = tmp_path / "notes.md"
+        notes.write_text("# old\n")
+        os.utime(notes, (time.time() - 86400, time.time() - 86400))
+
+        out = BasePlatformAdapter.filter_local_delivery_paths([str(notes)])
+        assert out == [str(notes.resolve())]
+
+
 # ---------------------------------------------------------------------------
 # should_send_media_as_audio
 # ---------------------------------------------------------------------------
@@ -939,6 +939,133 @@ class TestFinalResponseDeliveryGuard:
        assert consumer._final_response_sent is True


+class TestFinalContentDeliveredGuard:
+    """Regression coverage for #25010 — _final_content_delivered must only be
+    set when the final response is actually confirmed delivered to the user,
+    not when a mid-stream edit happened to show partial content.  Prematurely
+    setting this flag causes the gateway to suppress the normal final send,
+    leaving the user with an incomplete partial message."""
+
+    @pytest.mark.asyncio
+    async def test_mid_stream_edit_success_does_not_mark_content_delivered(self):
+        """When the mid-stream edit with finalize=True succeeds but the
+        subsequent finalize edit fails, _final_content_delivered must stay
+        False so the gateway does not suppress its fallback send (#25010).
+
+        Simulates TelegramAdapter which sets REQUIRES_EDIT_FINALIZE=True,
+        requiring a second finalize edit even when content is unchanged."""
+        adapter = MagicMock()
+        adapter.REQUIRES_EDIT_FINALIZE = True  # Telegram adapter behavior
+        # First send (initial streaming message) succeeds
+        # Mid-stream finalize edit succeeds
+        # Final finalize edit FAILS (e.g. flood control on Telegram)
+        adapter.edit_message = AsyncMock(side_effect=[
+            SimpleNamespace(success=True),   # mid-stream edit
+            SimpleNamespace(success=True),   # finalize edit on line 548
+            SimpleNamespace(success=False),  # final finalize on line 580 (FAILS)
+        ])
+        adapter.send = AsyncMock(
+            return_value=SimpleNamespace(success=True, message_id="msg_1"),
+        )
+        adapter.MAX_MESSAGE_LENGTH = 4096
+
+        config = StreamConsumerConfig(edit_interval=0.01, buffer_threshold=5)
+        consumer = GatewayStreamConsumer(adapter, "chat_123", config)
+
+        # Simulate streaming: send initial text, then more text, then done
+        consumer.on_delta("Part one of the response...\n")
+        task = asyncio.create_task(consumer.run())
+        await asyncio.sleep(0.05)
+
+        consumer.on_delta("Part two, the complete final answer.\n")
+        await asyncio.sleep(0.05)
+
+        consumer.finish()
+        await task
+
+        # The key assertion: _final_content_delivered must NOT be True,
+        # because the final edit failed and the complete response was never
+        # confirmed delivered.
+        assert consumer._final_content_delivered is False, (
+            "_final_content_delivered was prematurely set to True — gateway "
+            "will wrongly suppress its fallback send, leaving the user with "
+            "an incomplete partial message (#25010)"
+        )
+        # The gateway must still be allowed to send the complete response
+        assert consumer._final_response_sent is False, (
+            "_final_response_sent must also be False when the final edit failed"
+        )
+
+    @pytest.mark.asyncio
+    async def test_final_edit_success_does_mark_content_delivered(self):
+        """When the final finalize edit succeeds, _final_content_delivered
+        must be True — the normal happy path should still work."""
+        adapter = MagicMock()
+        adapter.edit_message = AsyncMock(return_value=SimpleNamespace(success=True))
+        adapter.send = AsyncMock(
+            return_value=SimpleNamespace(success=True, message_id="msg_1"),
+        )
+        adapter.MAX_MESSAGE_LENGTH = 4096
+
+        config = StreamConsumerConfig(edit_interval=0.01, buffer_threshold=5)
+        consumer = GatewayStreamConsumer(adapter, "chat_123", config)
+
+        consumer.on_delta("The complete response.\n")
+        task = asyncio.create_task(consumer.run())
+        await asyncio.sleep(0.05)
+
+        consumer.finish()
+        await task
+
+        assert consumer._final_content_delivered is True, (
+            "_final_content_delivered must be True when the final edit succeeds"
+        )
+        assert consumer._final_response_sent is True
+
+    @pytest.mark.asyncio
+    async def test_fallback_partial_send_does_not_mark_final_sent(self):
+        """When fallback final send delivers only some chunks before failing,
+        _final_response_sent must stay False so the gateway can still attempt
+        a complete final send (#25010)."""
+        call_count = 0
+
+        async def fake_send(*, chat_id, content, **kwargs):
+            nonlocal call_count
+            call_count += 1
+            if call_count <= 2:
+                return SimpleNamespace(success=True, message_id="msg_1")
+            # Third chunk (fallback continuation) FAILS
+            return SimpleNamespace(success=False, error="flood_control:13.0")
+
+        adapter = MagicMock()
+        adapter.send = AsyncMock(side_effect=fake_send)
+        adapter.edit_message = AsyncMock(
+            return_value=SimpleNamespace(success=False, error="flood_control:13.0"),
+        )
+        adapter.MAX_MESSAGE_LENGTH = 4096
+
+        config = StreamConsumerConfig(edit_interval=0.01, buffer_threshold=5)
+        consumer = GatewayStreamConsumer(adapter, "chat_123", config)
+
+        # Trigger enough delta to enter fallback mode
+        consumer.on_delta("Initial streaming text...\n")
+        task = asyncio.create_task(consumer.run())
+        await asyncio.sleep(0.05)
+
+        # Send a very long text that will trigger overflow/fallback
+        long_text = ("x" * 3000 + "\n") + ("y" * 3000 + "\n") + "Final answer.\n"
+        consumer.on_delta(long_text)
+        await asyncio.sleep(0.1)
+
+        consumer.finish()
+        await task
+
+        assert consumer._final_response_sent is False, (
+            "Partial fallback send must not set _final_response_sent — gateway "
+            "must still be able to deliver the complete response (#25010)"
+        )
+
+
 class TestEditOverflowSplitAndDeliver:
    """When edit_message split-and-delivers an oversized payload across the
    original message + N continuations (Telegram >4096 UTF-16), the consumer
@@ -234,9 +234,12 @@ async def test_streaming_delivery_blocks_media_path_outside_allowed_roots(tmp_pa
        "gateway.platforms.base.MEDIA_DELIVERY_SAFE_ROOTS",
        (allowed_root,),
    )
-    # This test exercises the strict-allowlist path; disable recency trust so
-    # the freshly-written tmp_path file is not auto-accepted by the trust
-    # window. (Recency trust is covered separately in test_platform_base.py.)
+    # This test exercises the strict-allowlist path; force strict mode on
+    # and disable recency trust so the freshly-written tmp_path file is not
+    # auto-accepted by the trust window. (Recency trust is covered separately
+    # in test_platform_base.py. The public default flipped to non-strict in
+    # 2026-05; this test pins strict on explicitly.)
+    monkeypatch.setenv("HERMES_MEDIA_DELIVERY_STRICT", "1")
    monkeypatch.setenv("HERMES_MEDIA_TRUST_RECENT_FILES", "0")
    adapter = SimpleNamespace(
        name="test",
@@ -303,6 +303,88 @@ def test_save_codex_tokens_syncs_credential_pool(tmp_path, monkeypatch):
    assert auth["providers"]["openai-codex"]["tokens"]["access_token"] == "new-at"


+def test_save_codex_tokens_syncs_manual_device_code_entries(tmp_path, monkeypatch):
+    """Re-auth must also refresh ``manual:device_code`` pool entries.
+
+    Regression for #33538: a user who hit #33000 before the #33164 fix landed
+    would have run ``hermes auth add openai-codex`` as a workaround, leaving
+    a pool entry with ``source="manual:device_code"``.  On every subsequent
+    re-auth via setup/model picker, the singleton-seeded ``device_code`` entry
+    got refreshed but the ``manual:device_code`` entry stayed stale, recreating
+    the same 401 token_invalidated symptom that #33164 was supposed to fix.
+
+    An interactive Codex device-code re-auth proves the user owns the ChatGPT
+    account, so it is safe to refresh every device-code-backed entry in the
+    pool — but NOT independent ``manual:api_key`` entries (separate accounts /
+    explicit API keys).
+    """
+    hermes_home = tmp_path / "hermes"
+    hermes_home.mkdir(parents=True, exist_ok=True)
+    (hermes_home / "auth.json").write_text(json.dumps({
+        "version": 1,
+        "providers": {
+            "openai-codex": {
+                "tokens": {"access_token": "old-at", "refresh_token": "old-rt"},
+                "last_refresh": "2026-01-01T00:00:00Z",
+                "auth_mode": "chatgpt",
+            },
+        },
+        "credential_pool": {
+            "openai-codex": [
+                {
+                    "id": "seeded",
+                    "source": "device_code",
+                    "auth_type": "oauth",
+                    "access_token": "old-at",
+                    "refresh_token": "old-rt",
+                },
+                {
+                    "id": "auth-add",
+                    "source": "manual:device_code",
+                    "auth_type": "oauth",
+                    "access_token": "stale-manual-at",
+                    "refresh_token": "stale-manual-rt",
+                    "last_status": "exhausted",
+                    "last_error_code": 401,
+                    "last_error_reason": "token_invalidated",
+                },
+                {
+                    "id": "api-key",
+                    "source": "manual:api_key",
+                    "auth_type": "api_key",
+                    "access_token": "user-api-key",
+                },
+            ],
+        },
+    }))
+    monkeypatch.setenv("HERMES_HOME", str(hermes_home))
+
+    _save_codex_tokens({"access_token": "fresh-at", "refresh_token": "fresh-rt"},
+                       last_refresh="2026-05-28T00:00:00Z")
+
+    auth = json.loads((hermes_home / "auth.json").read_text())
+    pool = auth["credential_pool"]["openai-codex"]
+
+    # Singleton-seeded device_code entry: refreshed and error markers cleared.
+    seeded = next(e for e in pool if e["source"] == "device_code")
+    assert seeded["access_token"] == "fresh-at"
+    assert seeded["refresh_token"] == "fresh-rt"
+
+    # manual:device_code entry: ALSO refreshed (the new behavior).
+    manual_dc = next(e for e in pool if e["source"] == "manual:device_code")
+    assert manual_dc["access_token"] == "fresh-at"
+    assert manual_dc["refresh_token"] == "fresh-rt"
+    assert manual_dc["last_refresh"] == "2026-05-28T00:00:00Z"
+    assert manual_dc["last_status"] is None
+    assert manual_dc["last_error_code"] is None
+    assert manual_dc["last_error_reason"] is None
+
+    # manual:api_key entry: untouched — independent credential.
+    api_key = next(e for e in pool if e["source"] == "manual:api_key")
+    assert api_key["access_token"] == "user-api-key"
+    assert "refresh_token" not in api_key or api_key.get("refresh_token") is None
+
+
 def test_import_codex_cli_tokens(tmp_path, monkeypatch):
    codex_home = tmp_path / "codex-cli"
    codex_home.mkdir(parents=True, exist_ok=True)
@@ -330,6 +330,107 @@ def test_xai_loopback_login_manual_paste_state_mismatch_raises(monkeypatch):
    assert exc.value.code == "xai_state_mismatch"


+def test_xai_loopback_login_manual_paste_bare_code_succeeds(monkeypatch):
+    """Bare-code paste (state=None) must complete login under manual_paste.
+
+    xAI's consent page renders the authorization code in-page rather than
+    redirecting through 127.0.0.1, so on remote/headless setups the only
+    value the user can obtain is the opaque code with no ``state=``
+    parameter. ``_parse_pasted_callback`` correctly returns
+    ``state=None`` for that input. The login flow must accept this case
+    (PKCE still protects the exchange); historically it raised
+    ``xai_state_mismatch``. Regression for the bare-code branch of #26923.
+    """
+    monkeypatch.setattr(
+        auth_mod, "_xai_oauth_discovery",
+        lambda *_a, **_k: {
+            "authorization_endpoint": "https://auth.x.ai/oauth2/authorize",
+            "token_endpoint": "https://auth.x.ai/oauth2/token",
+        },
+    )
+    monkeypatch.setattr(
+        auth_mod, "_prompt_manual_callback_paste",
+        lambda _ru: {
+            "code": "bare-opaque-code",
+            "state": None,
+            "error": None,
+            "error_description": None,
+        },
+    )
+
+    def _fake_token_post(*_a, **_k):
+        return _StubTokenResponse(
+            {
+                "access_token": "at",
+                "refresh_token": "rt",
+                "id_token": "",
+                "expires_in": 3600,
+                "token_type": "Bearer",
+            }
+        )
+
+    monkeypatch.setattr(auth_mod.httpx, "post", _fake_token_post)
+
+    with contextlib.redirect_stdout(io.StringIO()):
+        creds = auth_mod._xai_oauth_loopback_login(manual_paste=True)
+
+    assert creds["tokens"]["access_token"] == "at"
+    assert creds["tokens"]["refresh_token"] == "rt"
+
+
+def test_xai_loopback_login_loopback_path_rejects_missing_state(monkeypatch):
+    """Loopback (manual_paste=False) must NOT accept ``state=None``.
+
+    The bare-code relaxation only applies to the manual-paste path,
+    where the user demonstrably has no way to supply ``state``. The
+    HTTP-server path always sees ``state`` populated from the real
+    callback query string, so missing state there means something is
+    wrong (a malformed callback, an attacker-supplied request) and
+    must still raise ``xai_state_mismatch``.
+    """
+    monkeypatch.setattr(
+        auth_mod, "_xai_oauth_discovery",
+        lambda *_a, **_k: {
+            "authorization_endpoint": "https://auth.x.ai/oauth2/authorize",
+            "token_endpoint": "https://auth.x.ai/oauth2/token",
+        },
+    )
+
+    class _StubServer:
+        def shutdown(self):
+            return None
+
+        def server_close(self):
+            return None
+
+    monkeypatch.setattr(
+        auth_mod, "_xai_start_callback_server",
+        lambda *_a, **_k: (
+            _StubServer(),
+            None,
+            {"code": "fake", "state": None, "error": None,
+             "error_description": None},
+            "http://127.0.0.1:56121/callback",
+        ),
+    )
+    monkeypatch.setattr(
+        auth_mod, "_xai_wait_for_callback",
+        lambda *_a, **_k: {
+            "code": "fake",
+            "state": None,
+            "error": None,
+            "error_description": None,
+        },
+    )
+    monkeypatch.setattr(auth_mod, "_xai_validate_loopback_redirect_uri", lambda _u: None)
+    monkeypatch.setattr(auth_mod, "_print_loopback_ssh_hint", lambda *_a, **_k: None)
+
+    with contextlib.redirect_stdout(io.StringIO()):
+        with pytest.raises(auth_mod.AuthError) as exc:
+            auth_mod._xai_oauth_loopback_login(manual_paste=False, open_browser=False)
+    assert exc.value.code == "xai_state_mismatch"
+
+
 def test_xai_loopback_login_manual_paste_missing_code_raises(monkeypatch):
    """Empty paste must surface as ``xai_code_missing``, not crash."""
    monkeypatch.setattr(
@@ -363,6 +464,163 @@ def test_xai_loopback_login_manual_paste_missing_code_raises(monkeypatch):
    assert exc.value.code == "xai_code_missing"


+def test_xai_loopback_login_timeout_falls_back_to_manual_paste(monkeypatch):
+    """Loopback timeout should offer the existing manual-paste path."""
+    monkeypatch.setattr(
+        auth_mod, "_xai_oauth_discovery",
+        lambda *_a, **_k: {
+            "authorization_endpoint": "https://auth.x.ai/oauth2/authorize",
+            "token_endpoint": "https://auth.x.ai/oauth2/token",
+        },
+    )
+
+    class _StubServer:
+        def shutdown(self):
+            return None
+
+        def server_close(self):
+            return None
+
+    class _StubThread:
+        def join(self, timeout=None):
+            return None
+
+    monkeypatch.setattr(
+        auth_mod,
+        "_xai_start_callback_server",
+        lambda: (
+            _StubServer(),
+            _StubThread(),
+            {
+                "code": None,
+                "state": None,
+                "error": None,
+                "error_description": None,
+            },
+            "http://127.0.0.1:56121/callback",
+        ),
+    )
+
+    captured: dict = {"state": None, "prompt_calls": 0}
+    original_build = auth_mod._xai_oauth_build_authorize_url
+
+    def _capture(**kwargs):
+        captured["state"] = kwargs["state"]
+        return original_build(**kwargs)
+
+    monkeypatch.setattr(auth_mod, "_xai_oauth_build_authorize_url", _capture)
+
+    def _raise_timeout(*_a, **_k):
+        raise auth_mod.AuthError(
+            "xAI authorization timed out waiting for the local callback.",
+            provider="xai-oauth",
+            code="xai_callback_timeout",
+        )
+
+    monkeypatch.setattr(auth_mod, "_xai_wait_for_callback", _raise_timeout)
+
+    def _fake_prompt(_redirect_uri):
+        captured["prompt_calls"] += 1
+        return {
+            "code": "manual-auth-code",
+            "state": captured["state"],
+            "error": None,
+            "error_description": None,
+        }
+
+    monkeypatch.setattr(auth_mod, "_prompt_manual_callback_paste", _fake_prompt)
+    monkeypatch.setattr(
+        auth_mod.sys, "stdin", type("StubStdin", (), {"isatty": lambda self: True})()
+    )
+    monkeypatch.setattr(
+        auth_mod.httpx,
+        "post",
+        lambda *_a, **_k: _StubTokenResponse(
+            {
+                "access_token": "at-timeout",
+                "refresh_token": "rt-timeout",
+                "id_token": "",
+                "expires_in": 3600,
+                "token_type": "Bearer",
+            }
+        ),
+    )
+
+    buf = io.StringIO()
+    with contextlib.redirect_stdout(buf):
+        creds = auth_mod._xai_oauth_loopback_login(manual_paste=False)
+
+    rendered = buf.getvalue()
+    assert "xAI loopback callback timed out." in rendered
+    assert "--manual-paste" in rendered
+    assert captured["prompt_calls"] == 1
+    assert creds["tokens"]["access_token"] == "at-timeout"
+    assert creds["tokens"]["refresh_token"] == "rt-timeout"
+
+
+def test_xai_loopback_login_timeout_noninteractive_reraises(monkeypatch):
+    """Non-interactive stdin must keep the original timeout error."""
+    monkeypatch.setattr(
+        auth_mod, "_xai_oauth_discovery",
+        lambda *_a, **_k: {
+            "authorization_endpoint": "https://auth.x.ai/oauth2/authorize",
+            "token_endpoint": "https://auth.x.ai/oauth2/token",
+        },
+    )
+
+    class _StubServer:
+        def shutdown(self):
+            return None
+
+        def server_close(self):
+            return None
+
+    class _StubThread:
+        def join(self, timeout=None):
+            return None
+
+    monkeypatch.setattr(
+        auth_mod,
+        "_xai_start_callback_server",
+        lambda: (
+            _StubServer(),
+            _StubThread(),
+            {
+                "code": None,
+                "state": None,
+                "error": None,
+                "error_description": None,
+            },
+            "http://127.0.0.1:56121/callback",
+        ),
+    )
+
+    monkeypatch.setattr(
+        auth_mod,
+        "_xai_wait_for_callback",
+        lambda *_a, **_k: (_ for _ in ()).throw(
+            auth_mod.AuthError(
+                "xAI authorization timed out waiting for the local callback.",
+                provider="xai-oauth",
+                code="xai_callback_timeout",
+            )
+        ),
+    )
+    monkeypatch.setattr(
+        auth_mod.sys, "stdin", type("StubStdin", (), {"isatty": lambda self: False})()
+    )
+    monkeypatch.setattr(
+        auth_mod,
+        "_prompt_manual_callback_paste",
+        lambda *_a, **_k: pytest.fail("manual-paste fallback should not run"),
+    )
+
+    with contextlib.redirect_stdout(io.StringIO()):
+        with pytest.raises(auth_mod.AuthError) as exc:
+            auth_mod._xai_oauth_loopback_login(manual_paste=False)
+    assert exc.value.code == "xai_callback_timeout"
+
+
 # ---------------------------------------------------------------------------
 # _print_loopback_ssh_hint — now also mentions --manual-paste
 # ---------------------------------------------------------------------------
@@ -667,6 +667,42 @@ def test_get_nous_auth_status_checks_credential_pool(tmp_path, monkeypatch):
    assert "example.com" in str(status.get("portal_base_url", ""))


+def test_get_nous_auth_status_pool_opaque_key_is_not_portal_login(tmp_path, monkeypatch):
+    from hermes_cli.auth import get_nous_auth_status, invalidate_nous_auth_status_cache
+
+    hermes_home = tmp_path / "hermes"
+    hermes_home.mkdir(parents=True, exist_ok=True)
+    (hermes_home / "auth.json").write_text(json.dumps({
+        "version": 1, "providers": {},
+    }))
+    monkeypatch.setenv("HERMES_HOME", str(hermes_home))
+    invalidate_nous_auth_status_cache()
+
+    from agent.credential_pool import PooledCredential, load_pool
+    pool = load_pool("nous")
+    entry = PooledCredential.from_dict("nous", {
+        "access_token": "",
+        "agent_key": "opaque-agent-key",
+        "agent_key_expires_at": "2099-01-01T00:00:00+00:00",
+        "label": "manual opaque key",
+        "auth_type": "api_key",
+        "source": "manual",
+        "base_url": "https://inference.example.com/v1",
+        "inference_base_url": "https://inference.example.com/v1",
+    })
+    pool.add_entry(entry)
+
+    status = get_nous_auth_status()
+
+    assert status["logged_in"] is False
+    assert status["inference_credential_present"] is True
+    assert status["credential_source"] == "pool:manual opaque key"
+    assert status.get("access_token") is None
+    assert status.get("portal_base_url") is None
+    assert status.get("inference_base_url") == "https://inference.example.com/v1"
+    invalidate_nous_auth_status_cache()
+
+
 def test_get_nous_auth_status_auth_store_fallback(tmp_path, monkeypatch):
    """get_nous_auth_status() falls back to auth store when credential
    pool is empty.
@@ -1023,12 +1059,19 @@ class TestLoginNousSkipKeepsCurrent:
            lambda *a, **kw: prompt_returns,
        )
        monkeypatch.setattr(models_mod, "get_pricing_for_provider", lambda p: {})
-        monkeypatch.setattr(models_mod, "check_nous_free_tier", lambda: None)
+        free_tier_calls = []
+
+        def _check_nous_free_tier(**kwargs):
+            free_tier_calls.append(kwargs)
+            return None
+
+        monkeypatch.setattr(models_mod, "check_nous_free_tier", _check_nous_free_tier)
        monkeypatch.setattr(
            models_mod, "partition_nous_models_by_tier",
            lambda ids, p, free_tier=False: (ids, []),
        )
        monkeypatch.setattr(ns, "prompt_enable_tool_gateway", lambda cfg: None)
+        return free_tier_calls

    def test_skip_keep_current_preserves_provider_and_model(self, tmp_path, monkeypatch):
        """User picks Skip → config.yaml untouched, Nous creds still saved."""
@@ -1070,7 +1113,7 @@ class TestLoginNousSkipKeepsCurrent:
        hermes_home, config_path, auth_path = self._setup_home_with_openrouter(
            tmp_path, monkeypatch,
        )
-        self._patch_login_internals(
+        free_tier_calls = self._patch_login_internals(
            monkeypatch, prompt_returns="xiaomi/mimo-v2-pro",
        )

@@ -1083,6 +1126,7 @@ class TestLoginNousSkipKeepsCurrent:
        cfg_after = yaml.safe_load(config_path.read_text())
        assert cfg_after["model"]["provider"] == "nous"
        assert cfg_after["model"]["default"] == "xiaomi/mimo-v2-pro"
+        assert free_tier_calls == [{"force_fresh": True}]

        auth_after = json.loads(auth_path.read_text())
        assert auth_after["active_provider"] == "nous"
@@ -144,7 +144,13 @@ class TestCmdUpdateBranchFallback:
        mock_run.side_effect = _make_run_side_effect(
            branch="main", verify_ok=True, commit_count="1"
        )
-        with patch.object(hm, "_is_termux_env", return_value=False):
+        # The web UI build runs through _run_with_idle_timeout now (issue
+        # #33788) so it no longer appears in subprocess.run's call list.
+        # Mock it so the test doesn't actually shell out to ``tsc``.
+        import subprocess as _subprocess
+        build_ok = _subprocess.CompletedProcess([], 0, stdout="", stderr="")
+        with patch.object(hm, "_is_termux_env", return_value=False), \
+             patch.object(hm, "_run_with_idle_timeout", return_value=build_ok) as mock_idle:
            cmd_update(mock_args)

        npm_calls = [
@@ -153,10 +159,11 @@ class TestCmdUpdateBranchFallback:
            if call.args and call.args[0][0] == "/usr/bin/npm"
        ]

-        # cmd_update runs npm commands in three locations:
-        #   1. repo root  — slash-command / TUI bridge deps
-        #   2. ui-tui/    — Ink TUI deps
-        #   3. web/       — install + "npm run build" for the web frontend
+        # cmd_update runs npm commands in four locations:
+        #   1. repo root  — slash-command / TUI bridge deps  (subprocess.run)
+        #   2. ui-tui/    — Ink TUI deps                     (subprocess.run)
+        #   3. web/       — npm install                      (subprocess.run)
+        #   4. web/       — npm run build                    (_run_with_idle_timeout)
        #
        # Repo-root and ui-tui installs intentionally omit `--silent` and run
        # without `capture_output` so optional postinstall scripts (e.g.
@@ -175,11 +182,18 @@ class TestCmdUpdateBranchFallback:
            (update_flags, PROJECT_ROOT / "ui-tui"),
        ]
        if len(npm_calls) > 2:
+            # Only the web/ install is left in subprocess.run; the build moved
+            # to _run_with_idle_timeout to make Vite progress visible (#33788).
            assert npm_calls[2:] == [
                (["/usr/bin/npm", "ci", "--silent"], PROJECT_ROOT / "web"),
-                (["/usr/bin/npm", "run", "build"], PROJECT_ROOT / "web"),
            ]

+        # The web UI build itself went through the streaming helper.
+        mock_idle.assert_called_once()
+        idle_args, idle_kwargs = mock_idle.call_args
+        assert idle_args[0] == ["/usr/bin/npm", "run", "build"]
+        assert idle_kwargs["cwd"] == PROJECT_ROOT / "web"
+
        # Regression for #18840: repo root + ui-tui installs must stream
        # output (capture_output=False) so postinstall progress is visible
        # to the user.
@@ -481,4 +481,221 @@ def test_uninstall_access_denied_declined_keeps_task_and_cleans_files(monkeypatc
    out = capsys.readouterr().out
    assert "Skipped elevation" in out
    assert "UAC is Windows' admin approval prompt" in out
-    assert "Scheduled Task still registered" in out
+    assert "Scheduled Task still registered" in out
+
+
+# ---------------------------------------------------------------------------
+# stop() drain semantics — issue #33778
+#
+# Background: on Windows, asyncio.add_signal_handler raises NotImplementedError,
+# so the gateway's SIGTERM handler (which drains in-flight agents and writes
+# resume_pending=True) never fires when `hermes gateway stop` kills the
+# process. The fix: stop() writes the planned_stop_marker first, waits for
+# the gateway's marker-watcher thread to drain + exit cleanly, then escalates
+# to taskkill if drain times out.
+# ---------------------------------------------------------------------------
+
+
+def test_stop_writes_planned_stop_marker_before_killing(monkeypatch):
+    """stop() must write the planned-stop marker BEFORE any kill signal.
+
+    Without this, the gateway's drain loop never runs on Windows and
+    sessions silently lose context across restarts.
+    """
+    pid = 99999
+    events = []
+
+    monkeypatch.setattr(gateway_windows, "_assert_windows", lambda: None)
+    monkeypatch.setattr(gateway_windows, "is_task_registered", lambda: False)
+
+    # Stub the marker write so we can record the order of operations.
+    from gateway import status as status_mod
+
+    def fake_write_marker(target_pid):
+        events.append(("write_marker", target_pid))
+        return True
+
+    def fake_pid_exists(check_pid):
+        # Drain succeeds: pid "exits" right after the marker write.
+        return ("write_marker", pid) not in events
+
+    monkeypatch.setattr(status_mod, "write_planned_stop_marker", fake_write_marker)
+    monkeypatch.setattr(status_mod, "_pid_exists", fake_pid_exists)
+    monkeypatch.setattr(status_mod, "get_running_pid", lambda: pid)
+
+    def fake_kill(**kwargs):
+        events.append(("kill", kwargs.get("force", False)))
+        return 0
+
+    monkeypatch.setattr("hermes_cli.gateway.kill_gateway_processes", fake_kill)
+    monkeypatch.setattr("hermes_cli.gateway._get_restart_drain_timeout", lambda: 5.0)
+
+    gateway_windows.stop()
+
+    # Marker MUST be written before any kill.
+    kinds = [e[0] for e in events]
+    assert "write_marker" in kinds, "stop() never wrote the planned-stop marker"
+    marker_idx = kinds.index("write_marker")
+    kill_idx = kinds.index("kill") if "kill" in kinds else len(kinds)
+    assert marker_idx < kill_idx, (
+        f"stop() killed before writing the marker (events={events})"
+    )
+
+
+def test_stop_waits_for_graceful_drain_before_force_kill(monkeypatch):
+    """When drain succeeds, stop() should NOT force-kill the gateway.
+
+    drained=True means the gateway exited cleanly after seeing the
+    marker — escalating to taskkill /F afterwards would be wasted
+    work and may emit confusing "killed N processes" output.
+    """
+    pid = 88888
+    events = []
+
+    monkeypatch.setattr(gateway_windows, "_assert_windows", lambda: None)
+    monkeypatch.setattr(gateway_windows, "is_task_registered", lambda: False)
+
+    from gateway import status as status_mod
+    monkeypatch.setattr(status_mod, "write_planned_stop_marker", lambda p: True)
+
+    # Simulate the gateway exiting cleanly after one poll tick.
+    poll_count = [0]
+    def fake_pid_exists(check_pid):
+        poll_count[0] += 1
+        return poll_count[0] < 2  # alive on first poll, gone on second
+    monkeypatch.setattr(status_mod, "_pid_exists", fake_pid_exists)
+    monkeypatch.setattr(status_mod, "get_running_pid", lambda: pid)
+
+    def fake_kill(**kwargs):
+        events.append(("kill", kwargs.get("force", False)))
+        return 0
+    monkeypatch.setattr("hermes_cli.gateway.kill_gateway_processes", fake_kill)
+    monkeypatch.setattr("hermes_cli.gateway._get_restart_drain_timeout", lambda: 5.0)
+
+    gateway_windows.stop()
+
+    # kill_gateway_processes is still called as the no-op sweep, but
+    # NOT with force=True — drain succeeded, gateway is already gone.
+    assert events == [("kill", False)], (
+        f"After clean drain, force kill should be disabled (events={events})"
+    )
+
+
+def test_stop_escalates_to_force_kill_when_drain_times_out(monkeypatch):
+    """When drain times out, stop() MUST escalate to force=True.
+
+    Drain timeout = gateway is stuck or unresponsive. Without the
+    taskkill /T /F escalation, the gateway stays alive and the next
+    `hermes gateway start` fails with "another instance is running".
+    """
+    pid = 77777
+    events = []
+
+    monkeypatch.setattr(gateway_windows, "_assert_windows", lambda: None)
+    monkeypatch.setattr(gateway_windows, "is_task_registered", lambda: False)
+
+    from gateway import status as status_mod
+    monkeypatch.setattr(status_mod, "write_planned_stop_marker", lambda p: True)
+    # PID never exits — drain times out.
+    monkeypatch.setattr(status_mod, "_pid_exists", lambda check_pid: True)
+    monkeypatch.setattr(status_mod, "get_running_pid", lambda: pid)
+
+    def fake_kill(**kwargs):
+        events.append(("kill", kwargs.get("force", False)))
+        return 1
+    monkeypatch.setattr("hermes_cli.gateway.kill_gateway_processes", fake_kill)
+    # Tiny drain timeout to keep the test fast.
+    monkeypatch.setattr("hermes_cli.gateway._get_restart_drain_timeout", lambda: 1.0)
+
+    gateway_windows.stop()
+
+    # When drain times out, kill is invoked with force=True so taskkill /T /F
+    # walks the process tree.
+    assert events == [("kill", True)], (
+        f"After drain timeout, kill must use force=True (events={events})"
+    )
+
+
+def test_stop_no_running_gateway_skips_drain(monkeypatch):
+    """When no gateway is running, skip the drain wait entirely."""
+    events = []
+
+    monkeypatch.setattr(gateway_windows, "_assert_windows", lambda: None)
+    monkeypatch.setattr(gateway_windows, "is_task_registered", lambda: False)
+
+    from gateway import status as status_mod
+    monkeypatch.setattr(status_mod, "get_running_pid", lambda: None)
+
+    def fake_write_marker(target_pid):
+        events.append(("write_marker", target_pid))
+        return True
+    monkeypatch.setattr(status_mod, "write_planned_stop_marker", fake_write_marker)
+    monkeypatch.setattr(status_mod, "_pid_exists", lambda check_pid: False)
+
+    def fake_kill(**kwargs):
+        events.append(("kill", kwargs.get("force", False)))
+        return 0
+    monkeypatch.setattr("hermes_cli.gateway.kill_gateway_processes", fake_kill)
+    monkeypatch.setattr("hermes_cli.gateway._get_restart_drain_timeout", lambda: 5.0)
+
+    gateway_windows.stop()
+
+    # With no PID to drain, no marker is written.  Kill sweep still runs
+    # (defensive — covers the case where a stray gateway is alive without
+    # a PID file).  force=True because drained=False.
+    assert ("write_marker", None) not in events
+    assert all(e[0] != "write_marker" for e in events), (
+        f"Should not write marker when no PID is running (events={events})"
+    )
+    assert events == [("kill", True)]
+
+
+def test_drain_helper_handles_invalid_pid(monkeypatch):
+    """_drain_gateway_pid returns False for invalid PIDs without crashing."""
+    assert gateway_windows._drain_gateway_pid(0, 5.0) is False
+    assert gateway_windows._drain_gateway_pid(-1, 5.0) is False
+
+
+def test_drain_helper_returns_true_when_pid_exits_quickly(monkeypatch):
+    """_drain_gateway_pid polls _pid_exists until it returns False."""
+    pid = 66666
+    poll_count = [0]
+
+    def fake_pid_exists(check_pid):
+        poll_count[0] += 1
+        return poll_count[0] < 3  # alive twice, then gone
+
+    from gateway import status as status_mod
+    monkeypatch.setattr(status_mod, "write_planned_stop_marker", lambda p: True)
+    monkeypatch.setattr(status_mod, "_pid_exists", fake_pid_exists)
+
+    assert gateway_windows._drain_gateway_pid(pid, drain_timeout=5.0) is True
+
+
+def test_drain_helper_returns_false_on_timeout(monkeypatch):
+    """_drain_gateway_pid returns False when the PID never exits."""
+    from gateway import status as status_mod
+    monkeypatch.setattr(status_mod, "write_planned_stop_marker", lambda p: True)
+    monkeypatch.setattr(status_mod, "_pid_exists", lambda check_pid: True)
+
+    assert gateway_windows._drain_gateway_pid(55555, drain_timeout=1.0) is False
+
+
+def test_drain_helper_still_waits_if_marker_write_fails(monkeypatch):
+    """Marker-write failures are swallowed; drain still polls for PID exit.
+
+    If the marker can't be written (disk full, permission error), the
+    gateway can't drain — but the wait still happens so a slow-shutdown
+    gateway from a different code path (e.g. SIGTERM working on this
+    platform after all) still gets observed cleanly.
+    """
+    pid = 44444
+    def fake_write(target_pid):
+        raise OSError("disk full")
+
+    from gateway import status as status_mod
+    monkeypatch.setattr(status_mod, "write_planned_stop_marker", fake_write)
+    monkeypatch.setattr(status_mod, "_pid_exists", lambda check_pid: False)
+
+    # Returns True because _pid_exists immediately says "gone".
+    assert gateway_windows._drain_gateway_pid(pid, drain_timeout=5.0) is True
@@ -237,7 +237,7 @@ class TestConfigWriting:
        monkeypatch.setattr(
            tools_config,
            "get_nous_subscription_features",
-            lambda config: SimpleNamespace(
+            lambda config, **kwargs: SimpleNamespace(
                features={"image_gen": SimpleNamespace(managed_by_nous=True)}
            ),
        )
@@ -158,8 +158,11 @@ def test_build_models_payload_returns_expected_shape():


 def test_build_models_payload_does_not_call_provider_model_ids():
-    """Curated lists must come from list_authenticated_providers, not
-    provider_model_ids — that would pull TTS/embeddings/etc.
+    """``build_models_payload`` is a thin shape adapter — it delegates the
+    actual curation to ``list_authenticated_providers`` (which DOES call
+    ``cached_provider_model_ids`` internally for live discovery, with disk
+    caching). ``build_models_payload`` itself must not call the live fetcher
+    directly; the test pins that boundary.
    """
    rows = [{"slug": "nous", "name": "Nous", "models": ["hermes-4-405b"],
             "total_models": 1, "is_current": False, "is_user_defined": False,
@@ -5,7 +5,9 @@ from __future__ import annotations
 import concurrent.futures
 import os
 import sqlite3
+import sys
 import time
+import types
 import unittest.mock
 from pathlib import Path

@@ -49,6 +51,43 @@ def test_init_creates_expected_tables(kanban_home):
    assert {"tasks", "task_links", "task_comments", "task_events"} <= names


+def test_connect_honors_kanban_busy_timeout_env(kanban_home, monkeypatch):
+    """All kanban connections should use the explicit busy-timeout knob.
+
+    A worker stampede should wait for SQLite's writer lock instead of failing
+    immediately with ``database is locked`` during first-connect/WAL/schema
+    setup.  The timeout must be queryable via PRAGMA so CLI, gateway, and tool
+    connections behave the same way.
+    """
+    monkeypatch.setenv("HERMES_KANBAN_BUSY_TIMEOUT_MS", "123456")
+
+    with kb.connect() as conn:
+        row = conn.execute("PRAGMA busy_timeout").fetchone()
+
+    assert row[0] == 123456
+
+
+def test_cross_process_init_lock_uses_windows_byte_range_lock(tmp_path, monkeypatch):
+    """Windows must use a real process lock, not a no-op sidecar open."""
+    calls: list[tuple[int, int, int]] = []
+    fake_msvcrt = types.SimpleNamespace(
+        LK_LOCK=1,
+        LK_UNLCK=2,
+        locking=lambda fd, mode, nbytes: calls.append((fd, mode, nbytes)),
+    )
+    monkeypatch.setattr(kb, "_IS_WINDOWS", True)
+    monkeypatch.setitem(sys.modules, "msvcrt", fake_msvcrt)
+
+    db_path = tmp_path / "kanban.db"
+    with kb._cross_process_init_lock(db_path):
+        assert calls == [(calls[0][0], fake_msvcrt.LK_LOCK, 1)]
+
+    assert [call[1:] for call in calls] == [
+        (fake_msvcrt.LK_LOCK, 1),
+        (fake_msvcrt.LK_UNLCK, 1),
+    ]
+
+
 def test_connect_rejects_tls_record_in_sqlite_header(tmp_path, monkeypatch):
    """Kanban should classify TLS-looking page-0 clobbers before WAL setup."""
    home = tmp_path / ".hermes"
@@ -3278,6 +3317,44 @@ def test_connect_refuses_corrupt_existing_file(tmp_path):
        kb.connect(db_path=db_path)


+def test_repeated_corrupt_open_reuses_single_backup(tmp_path):
+    """Repeated quarantines of the same corrupt bytes must not amplify disk usage.
+
+    Regression for the gateway dispatcher's 5-min retry loop on shared kanban
+    DBs across multi-profile fleets: each retry on an unchanged corrupt file
+    used to create a fresh ``.corrupt.<timestamp>.bak`` until disk filled. The
+    content-addressed backup name is deterministic in the DB's sha256, so
+    N retries of the same bytes share one backup.
+    """
+    db_path = tmp_path / "kanban.db"
+    original = _write_corrupt_db(db_path)
+
+    backups: set[Path] = set()
+    for _ in range(10):
+        kb._INITIALIZED_PATHS.discard(str(db_path.resolve()))
+        with pytest.raises(kb.KanbanDbCorruptError) as excinfo:
+            kb.connect(db_path=db_path)
+        assert excinfo.value.backup_path is not None
+        backups.add(excinfo.value.backup_path)
+
+    assert len(backups) == 1, f"expected 1 deterministic backup, got {len(backups)}"
+    (backup,) = backups
+    assert backup.exists()
+    assert backup.read_bytes() == original
+
+    # Mutate the corrupt bytes — fingerprint changes, separate backup preserved.
+    with db_path.open("r+b") as f:
+        f.seek(4096)
+        f.write(b"\xAB" * 64)
+    kb._INITIALIZED_PATHS.discard(str(db_path.resolve()))
+    with pytest.raises(kb.KanbanDbCorruptError) as excinfo2:
+        kb.connect(db_path=db_path)
+    second_backup = excinfo2.value.backup_path
+    assert second_backup is not None
+    assert second_backup != backup
+    assert second_backup.exists()
+
+
 def test_locked_healthy_db_does_not_classify_as_corrupt(tmp_path, monkeypatch):
    """A transient lock during the probe must not produce a .corrupt backup
    and must not be reported as :class:`KanbanDbCorruptError`. Raw sqlite
@@ -2,6 +2,7 @@

 from unittest.mock import patch, MagicMock

+from hermes_cli.nous_account import NousPortalAccountInfo
 from hermes_cli.models import (
    OPENROUTER_MODELS, fetch_openrouter_models, model_ids, detect_provider_for_model,
    is_nous_free_tier, partition_nous_models_by_tier,
@@ -308,6 +309,15 @@ class TestDetectProviderForModel:
 class TestIsNousFreeTier:
    """Tests for is_nous_free_tier — account tier detection."""

+    def test_paid_service_access_allowed_true_is_not_free(self):
+        assert is_nous_free_tier({"paid_service_access": {"allowed": True}}) is False
+
+    def test_paid_service_access_allowed_false_is_free(self):
+        assert is_nous_free_tier({"paid_service_access": {"allowed": False}}) is True
+
+    def test_paid_service_access_paid_access_fallback(self):
+        assert is_nous_free_tier({"paid_service_access": {"paid_access": False}}) is True
+
    def test_paid_plus_tier(self):
        assert is_nous_free_tier({"subscription": {"plan": "Plus", "tier": 2, "monthly_charge": 20}}) is False

@@ -657,39 +667,58 @@ class TestCheckNousFreeTierCache:
    def teardown_method(self):
        _models_mod._free_tier_cache = None

-    @patch("hermes_cli.models.fetch_nous_account_tier")
-    @patch("hermes_cli.models.is_nous_free_tier", return_value=True)
-    def test_result_is_cached(self, mock_is_free, mock_fetch):
-        """Second call within TTL returns cached result without API call."""
-        mock_fetch.return_value = {"subscription": {"monthly_charge": 0}}
-        with patch("hermes_cli.auth.get_provider_auth_state", return_value={"access_token": "tok"}), \
-             patch("hermes_cli.auth.resolve_nous_runtime_credentials"):
-            result1 = check_nous_free_tier()
-            result2 = check_nous_free_tier()
+    @patch("hermes_cli.nous_account.get_nous_portal_account_info")
+    def test_result_is_cached(self, mock_account):
+        """Second call within TTL returns cached result without account lookup."""
+        mock_account.return_value = NousPortalAccountInfo(
+            logged_in=True,
+            source="jwt",
+            fresh=False,
+            paid_service_access=False,
+        )
+        result1 = check_nous_free_tier()
+        result2 = check_nous_free_tier()

        assert result1 is True
        assert result2 is True
-        assert mock_fetch.call_count == 1
+        assert mock_account.call_count == 1

-    @patch("hermes_cli.models.fetch_nous_account_tier")
-    @patch("hermes_cli.models.is_nous_free_tier", return_value=False)
-    def test_cache_expires_after_ttl(self, mock_is_free, mock_fetch):
-        """After TTL expires, the API is called again."""
-        mock_fetch.return_value = {"subscription": {"monthly_charge": 20}}
-        with patch("hermes_cli.auth.get_provider_auth_state", return_value={"access_token": "tok"}), \
-             patch("hermes_cli.auth.resolve_nous_runtime_credentials"):
-            result1 = check_nous_free_tier()
-            assert mock_fetch.call_count == 1
+    @patch("hermes_cli.nous_account.get_nous_portal_account_info")
+    def test_cache_expires_after_ttl(self, mock_account):
+        """After TTL expires, account info is resolved again."""
+        mock_account.return_value = NousPortalAccountInfo(
+            logged_in=True,
+            source="jwt",
+            fresh=False,
+            paid_service_access=True,
+        )
+        result1 = check_nous_free_tier()
+        assert mock_account.call_count == 1

-            cached_result, cached_at = _models_mod._free_tier_cache
-            _models_mod._free_tier_cache = (cached_result, cached_at - _FREE_TIER_CACHE_TTL - 1)
+        cached_result, cached_at = _models_mod._free_tier_cache
+        _models_mod._free_tier_cache = (cached_result, cached_at - _FREE_TIER_CACHE_TTL - 1)

-            result2 = check_nous_free_tier()
-            assert mock_fetch.call_count == 2
+        result2 = check_nous_free_tier()
+        assert mock_account.call_count == 2

        assert result1 is False
        assert result2 is False

+    @patch("hermes_cli.nous_account.get_nous_portal_account_info")
+    def test_force_fresh_bypasses_cache(self, mock_account):
+        mock_account.return_value = NousPortalAccountInfo(
+            logged_in=True,
+            source="account_api",
+            fresh=True,
+            paid_service_access=True,
+        )
+
+        assert check_nous_free_tier() is False
+        assert check_nous_free_tier(force_fresh=True) is False
+
+        assert mock_account.call_count == 2
+        mock_account.assert_called_with(force_fresh=True)
+
    def test_cache_ttl_is_short(self):
        """TTL should be short enough to catch upgrades quickly (<=5 min)."""
        assert _FREE_TIER_CACHE_TTL <= 300
@@ -0,0 +1,547 @@
+"""Tests for normalized Nous Portal account entitlement helpers."""
+
+from __future__ import annotations
+
+import base64
+import json
+import time
+from typing import Any
+
+import pytest
+
+from hermes_cli.nous_account import (
+    NousPaidServiceAccessInfo,
+    NousPortalAccountInfo,
+    format_nous_portal_entitlement_message,
+    get_nous_portal_account_info,
+    reset_nous_portal_account_info_cache,
+)
+
+
+def _jwt(claims: dict[str, Any]) -> str:
+    def _part(payload: dict[str, Any]) -> str:
+        raw = json.dumps(payload, separators=(",", ":")).encode()
+        return base64.urlsafe_b64encode(raw).decode().rstrip("=")
+
+    return f"{_part({'alg': 'none', 'typ': 'JWT'})}.{_part(claims)}.sig"
+
+
+def _state(token: str) -> dict[str, Any]:
+    return {
+        "access_token": token,
+        "portal_base_url": "https://portal.example.test",
+        "client_id": "hermes-cli",
+    }
+
+
+def _account_payload(
+    *,
+    allowed: bool,
+    subscription: dict[str, Any] | None,
+    subscription_credits: float,
+    purchased_credits: float,
+) -> dict[str, Any]:
+    return {
+        "user": {
+            "email": "alice@example.test",
+            "privy_did": "did:privy:alice",
+        },
+        "organisation": {
+            "id": "org_123",
+        },
+        "subscription": subscription,
+        "purchased_credits_remaining": purchased_credits,
+        "paid_service_access": {
+            "allowed": allowed,
+            "paid_access": allowed,
+            "reason": "usable_credits" if allowed else "no_usable_credits",
+            "organisation_id": "org_123",
+            "effective_at_ms": 123456789,
+            "has_active_subscription": subscription is not None,
+            "active_subscription_is_paid": bool(
+                subscription and subscription.get("monthly_charge", 0) > 0
+            ),
+            "subscription_tier": subscription.get("tier") if subscription else None,
+            "subscription_monthly_charge": (
+                subscription.get("monthly_charge") if subscription else None
+            ),
+            "subscription_credits_remaining": subscription_credits,
+            "purchased_credits_remaining": purchased_credits,
+            "total_usable_credits": subscription_credits + purchased_credits,
+        },
+    }
+
+
+@pytest.fixture(autouse=True)
+def _reset_cache():
+    reset_nous_portal_account_info_cache()
+    yield
+    reset_nous_portal_account_info_cache()
+
+
+def test_valid_jwt_with_paid_access_true(monkeypatch):
+    token = _jwt(
+        {
+            "sub": "user_123",
+            "org_id": "org_123",
+            "client_id": "hermes-cli",
+            "product_id": "nous-hermes-agent",
+            "nous_client": "hermes-agent",
+            "exp": int(time.time()) + 900,
+            "paid_access": True,
+            "subscription_tier": 2,
+        }
+    )
+    monkeypatch.setattr("hermes_cli.auth.get_provider_auth_state", lambda provider: _state(token))
+
+    info = get_nous_portal_account_info()
+
+    assert info.source == "jwt"
+    assert info.fresh is False
+    assert info.logged_in is True
+    assert info.user_id == "user_123"
+    assert info.org_id == "org_123"
+    assert info.product_id == "nous-hermes-agent"
+    assert info.paid_service_access is True
+    assert info.is_paid is True
+    assert info.is_free_tier is False
+
+
+def test_valid_jwt_with_paid_access_false(monkeypatch):
+    token = _jwt(
+        {
+            "sub": "user_123",
+            "org_id": "org_123",
+            "exp": int(time.time()) + 900,
+            "paid_access": False,
+        }
+    )
+    monkeypatch.setattr("hermes_cli.auth.get_provider_auth_state", lambda provider: _state(token))
+
+    info = get_nous_portal_account_info()
+
+    assert info.source == "jwt"
+    assert info.paid_service_access is False
+    assert info.is_paid is False
+    assert info.is_free_tier is True
+
+
+def test_valid_jwt_missing_paid_access_is_unknown_not_paid(monkeypatch):
+    token = _jwt(
+        {
+            "sub": "user_123",
+            "org_id": "org_123",
+            "exp": int(time.time()) + 900,
+        }
+    )
+    monkeypatch.setattr("hermes_cli.auth.get_provider_auth_state", lambda provider: _state(token))
+
+    info = get_nous_portal_account_info()
+
+    assert info.source == "jwt"
+    assert info.paid_service_access is None
+    assert info.is_paid is False
+    assert info.is_free_tier is False
+
+
+def test_expired_jwt_falls_back_to_fresh_account(monkeypatch):
+    token = _jwt(
+        {
+            "sub": "user_123",
+            "org_id": "org_123",
+            "exp": int(time.time()) - 60,
+            "paid_access": False,
+        }
+    )
+    payload = _account_payload(
+        allowed=True,
+        subscription={
+            "plan": "Tier 2",
+            "tier": 2,
+            "monthly_charge": 20,
+            "current_period_end": "2026-05-01T00:00:00.000Z",
+            "credits_remaining": 12.25,
+            "rollover_credits": 3.5,
+        },
+        subscription_credits=12.25,
+        purchased_credits=7.75,
+    )
+    monkeypatch.setattr("hermes_cli.auth.get_provider_auth_state", lambda provider: _state(token))
+    monkeypatch.setattr("hermes_cli.auth.resolve_nous_access_token", lambda: "fresh-token")
+    monkeypatch.setattr("hermes_cli.nous_account._fetch_nous_account_info", lambda *a, **kw: payload)
+
+    info = get_nous_portal_account_info()
+
+    assert info.source == "account_api"
+    assert info.fresh is True
+    assert info.paid_service_access is True
+    assert info.subscription is not None
+    assert info.subscription.monthly_charge == 20
+    assert info.paid_service_access_info is not None
+    assert info.paid_service_access_info.total_usable_credits == 20
+
+
+@pytest.mark.parametrize(
+    ("payload", "expected_paid"),
+    [
+        (
+            _account_payload(
+                allowed=True,
+                subscription={
+                    "plan": "Tier 2",
+                    "tier": 2,
+                    "monthly_charge": 20,
+                    "current_period_end": "2026-05-01T00:00:00.000Z",
+                    "credits_remaining": 12.25,
+                    "rollover_credits": 3.5,
+                },
+                subscription_credits=12.25,
+                purchased_credits=7.75,
+            ),
+            True,
+        ),
+        (
+            _account_payload(
+                allowed=False,
+                subscription={
+                    "plan": "Tier 2",
+                    "tier": 2,
+                    "monthly_charge": 20,
+                    "current_period_end": "2026-05-01T00:00:00.000Z",
+                    "credits_remaining": 0,
+                    "rollover_credits": 0,
+                },
+                subscription_credits=0,
+                purchased_credits=0,
+            ),
+            False,
+        ),
+        (
+            _account_payload(
+                allowed=True,
+                subscription=None,
+                subscription_credits=0,
+                purchased_credits=7.75,
+            ),
+            True,
+        ),
+        (
+            _account_payload(
+                allowed=False,
+                subscription=None,
+                subscription_credits=0,
+                purchased_credits=0,
+            ),
+            False,
+        ),
+    ],
+)
+def test_fresh_account_payload_normalization(monkeypatch, payload, expected_paid):
+    token = _jwt({"sub": "user_123", "org_id": "org_123", "exp": int(time.time()) + 900})
+    monkeypatch.setattr("hermes_cli.auth.get_provider_auth_state", lambda provider: _state(token))
+    monkeypatch.setattr("hermes_cli.auth.resolve_nous_access_token", lambda: "fresh-token")
+    monkeypatch.setattr("hermes_cli.nous_account._fetch_nous_account_info", lambda *a, **kw: payload)
+
+    info = get_nous_portal_account_info(force_fresh=True)
+
+    assert isinstance(info, NousPortalAccountInfo)
+    assert info.source == "account_api"
+    assert info.fresh is True
+    assert info.email == "alice@example.test"
+    assert info.privy_did == "did:privy:alice"
+    assert info.org_id == "org_123"
+    assert info.paid_service_access is expected_paid
+    assert info.is_paid is expected_paid
+    assert info.is_free_tier is (not expected_paid)
+
+
+def test_force_fresh_uses_account_api_even_when_jwt_is_valid(monkeypatch):
+    token = _jwt(
+        {
+            "sub": "user_123",
+            "org_id": "org_123",
+            "exp": int(time.time()) + 900,
+            "paid_access": False,
+        }
+    )
+    payload = _account_payload(
+        allowed=True,
+        subscription=None,
+        subscription_credits=0,
+        purchased_credits=5,
+    )
+    monkeypatch.setattr("hermes_cli.auth.get_provider_auth_state", lambda provider: _state(token))
+    monkeypatch.setattr("hermes_cli.auth.resolve_nous_access_token", lambda: "fresh-token")
+    monkeypatch.setattr("hermes_cli.nous_account._fetch_nous_account_info", lambda *a, **kw: payload)
+
+    info = get_nous_portal_account_info(force_fresh=True)
+
+    assert info.source == "account_api"
+    assert info.paid_service_access is True
+
+
+def test_no_oauth_token_reports_inference_key_present(monkeypatch):
+    monkeypatch.setattr("hermes_cli.auth.get_provider_auth_state", lambda provider: {})
+
+    class _Entry:
+        label = "manual-nous"
+        access_token = ""
+        agent_key = "opaque-runtime-key"
+        agent_key_expires_at = "2099-01-01T00:00:00+00:00"
+        expires_at = None
+        inference_base_url = "https://inference.example.test/v1"
+        base_url = "https://inference.example.test/v1"
+        priority = 0
+
+        @property
+        def runtime_api_key(self):
+            return self.agent_key
+
+        @property
+        def runtime_base_url(self):
+            return self.inference_base_url
+
+    class _Pool:
+        def has_credentials(self):
+            return True
+
+        def entries(self):
+            return [_Entry()]
+
+    monkeypatch.setattr("agent.credential_pool.load_pool", lambda provider: _Pool())
+
+    info = get_nous_portal_account_info()
+
+    assert info.logged_in is False
+    assert info.source == "inference_key"
+    assert info.inference_credential_present is True
+    assert info.credential_source == "pool:manual-nous"
+    assert info.paid_service_access is None
+
+
+def test_pool_oauth_entry_uses_jwt_snapshot(monkeypatch):
+    token = _jwt(
+        {
+            "sub": "user_123",
+            "org_id": "org_123",
+            "client_id": "hermes-cli",
+            "exp": int(time.time()) + 900,
+            "paid_access": True,
+        }
+    )
+    monkeypatch.setattr("hermes_cli.auth.get_provider_auth_state", lambda provider: {})
+
+    class _Entry:
+        label = "dashboard device_code"
+        auth_type = "oauth"
+        access_token = token
+        refresh_token = "refresh-token"
+        agent_key = "opaque-runtime-key"
+        agent_key_expires_at = "2099-01-01T00:00:00+00:00"
+        expires_at = "2099-01-01T00:00:00+00:00"
+        portal_base_url = "https://portal.example.test"
+        inference_base_url = "https://inference.example.test/v1"
+        base_url = "https://inference.example.test/v1"
+        priority = 0
+
+        @property
+        def runtime_api_key(self):
+            return self.agent_key
+
+        @property
+        def runtime_base_url(self):
+            return self.inference_base_url
+
+    class _Pool:
+        def has_credentials(self):
+            return True
+
+        def entries(self):
+            return [_Entry()]
+
+    monkeypatch.setattr("agent.credential_pool.load_pool", lambda provider: _Pool())
+
+    info = get_nous_portal_account_info()
+
+    assert info.logged_in is True
+    assert info.source == "jwt"
+    assert info.paid_service_access is True
+    assert info.credential_source == "pool:dashboard device_code"
+
+
+def test_pool_oauth_entry_force_fresh_uses_account_api(monkeypatch):
+    token = _jwt(
+        {
+            "sub": "user_123",
+            "org_id": "org_123",
+            "exp": int(time.time()) + 900,
+            "paid_access": False,
+        }
+    )
+    payload = _account_payload(
+        allowed=True,
+        subscription=None,
+        subscription_credits=0,
+        purchased_credits=3,
+    )
+    monkeypatch.setattr("hermes_cli.auth.get_provider_auth_state", lambda provider: {})
+    monkeypatch.setattr("hermes_cli.nous_account._fetch_nous_account_info", lambda *a, **kw: payload)
+
+    class _Entry:
+        label = "dashboard device_code"
+        auth_type = "oauth"
+        access_token = token
+        refresh_token = "refresh-token"
+        agent_key = "opaque-runtime-key"
+        agent_key_expires_at = "2099-01-01T00:00:00+00:00"
+        expires_at = "2099-01-01T00:00:00+00:00"
+        portal_base_url = "https://portal.example.test"
+        inference_base_url = "https://inference.example.test/v1"
+        base_url = "https://inference.example.test/v1"
+        priority = 0
+
+        @property
+        def runtime_api_key(self):
+            return self.agent_key
+
+        @property
+        def runtime_base_url(self):
+            return self.inference_base_url
+
+    class _Pool:
+        def has_credentials(self):
+            return True
+
+        def entries(self):
+            return [_Entry()]
+
+    monkeypatch.setattr("agent.credential_pool.load_pool", lambda provider: _Pool())
+
+    info = get_nous_portal_account_info(force_fresh=True)
+
+    assert info.logged_in is True
+    assert info.source == "account_api"
+    assert info.fresh is True
+    assert info.paid_service_access is True
+    assert info.credential_source == "pool:dashboard device_code"
+
+
+def test_entitlement_message_returns_none_for_paid_access():
+    info = NousPortalAccountInfo(
+        logged_in=True,
+        source="account_api",
+        fresh=True,
+        paid_service_access=True,
+        portal_base_url="https://portal.example.test",
+    )
+
+    assert format_nous_portal_entitlement_message(info, capability="paid models") is None
+
+
+def test_entitlement_message_for_inference_key_without_portal_login():
+    info = NousPortalAccountInfo(
+        logged_in=False,
+        source="inference_key",
+        fresh=False,
+        inference_credential_present=True,
+        portal_base_url="https://portal.example.test",
+    )
+
+    message = format_nous_portal_entitlement_message(
+        info,
+        capability="managed tools",
+    )
+
+    assert message is not None
+    assert "Nous inference credentials are configured" in message
+    assert "cannot verify your Nous Portal paid access" in message
+    assert "Log in with `hermes model`" in message
+
+
+def test_entitlement_message_for_active_paid_subscription_with_no_credits():
+    info = NousPortalAccountInfo(
+        logged_in=True,
+        source="account_api",
+        fresh=True,
+        paid_service_access=False,
+        portal_base_url="https://portal.example.test",
+        paid_service_access_info=NousPaidServiceAccessInfo(
+            allowed=False,
+            reason="no_usable_credits",
+            has_active_subscription=True,
+            active_subscription_is_paid=True,
+            subscription_credits_remaining=0,
+            purchased_credits_remaining=0,
+            total_usable_credits=0,
+        ),
+    )
+
+    message = format_nous_portal_entitlement_message(
+        info,
+        capability="managed tools",
+    )
+
+    assert message is not None
+    assert "credits are exhausted" in message
+    assert "managed tools" in message
+    assert "https://portal.example.test/billing" in message
+
+
+def test_entitlement_message_for_no_subscription_or_credits():
+    info = NousPortalAccountInfo(
+        logged_in=True,
+        source="account_api",
+        fresh=True,
+        paid_service_access=False,
+        portal_base_url="https://portal.example.test",
+        paid_service_access_info=NousPaidServiceAccessInfo(
+            allowed=False,
+            reason="no_usable_credits",
+            has_active_subscription=False,
+            subscription_credits_remaining=0,
+            purchased_credits_remaining=0,
+            total_usable_credits=0,
+        ),
+    )
+
+    message = format_nous_portal_entitlement_message(info, capability="paid models")
+
+    assert message is not None
+    assert "no active subscription or usable credits" in message
+    assert "Subscribe or add credits" in message
+
+
+def test_entitlement_message_for_unknown_entitlement_is_explicit():
+    info = NousPortalAccountInfo(
+        logged_in=True,
+        source="error",
+        fresh=False,
+        paid_service_access=None,
+        portal_base_url="https://portal.example.test",
+        error="account_api_timeout",
+    )
+
+    message = format_nous_portal_entitlement_message(info, capability="Tool Gateway")
+
+    assert message is not None
+    assert "could not verify" in message
+    assert "account_api_timeout" in message
+    assert "Run `hermes model`" in message
+
+
+def test_entitlement_message_for_account_missing():
+    info = NousPortalAccountInfo(
+        logged_in=True,
+        source="account_api",
+        fresh=True,
+        paid_service_access=False,
+        paid_service_access_info=NousPaidServiceAccessInfo(
+            allowed=False,
+            reason="account_missing",
+        ),
+    )
+
+    message = format_nous_portal_entitlement_message(info, capability="Tool Gateway")
+
+    assert message is not None
+    assert "could not find a Nous Portal account or organisation" in message
@@ -1,14 +1,25 @@
 """Tests for Nous subscription feature detection."""

+from hermes_cli.nous_account import NousPortalAccountInfo
 from hermes_cli import nous_subscription as ns


+def _account(*, logged_in: bool, paid: bool | None = None) -> NousPortalAccountInfo:
+    return NousPortalAccountInfo(
+        logged_in=logged_in,
+        source="jwt" if logged_in else "none",
+        fresh=False,
+        paid_service_access=paid,
+    )
+
+
 def test_get_nous_subscription_features_recognizes_direct_exa_backend(monkeypatch):
    env = {"EXA_API_KEY": "exa-test"}

    monkeypatch.setattr(ns, "get_env_value", lambda name: env.get(name, ""))
-    monkeypatch.setattr(ns, "get_nous_auth_status", lambda: {})
-    monkeypatch.setattr(ns, "managed_nous_tools_enabled", lambda: False)
+    monkeypatch.setattr(
+        ns, "get_nous_portal_account_info", lambda: _account(logged_in=False)
+    )
    monkeypatch.setattr(ns, "_toolset_enabled", lambda config, key: key == "web")
    monkeypatch.setattr(ns, "_has_agent_browser", lambda: False)
    monkeypatch.setattr(ns, "resolve_openai_audio_api_key", lambda: "")
@@ -23,11 +34,34 @@ def test_get_nous_subscription_features_recognizes_direct_exa_backend(monkeypatc
    assert features.web.current_provider == "exa"


+def test_get_nous_subscription_features_force_fresh_forwards_account_request(monkeypatch):
+    calls = []
+
+    def fake_account_info(*, force_fresh=False):
+        calls.append(force_fresh)
+        return _account(logged_in=True, paid=True)
+
+    monkeypatch.setattr(ns, "get_env_value", lambda name: "")
+    monkeypatch.setattr(ns, "get_nous_portal_account_info", fake_account_info)
+    monkeypatch.setattr(ns, "_toolset_enabled", lambda config, key: False)
+    monkeypatch.setattr(ns, "_has_agent_browser", lambda: False)
+    monkeypatch.setattr(ns, "resolve_openai_audio_api_key", lambda: "")
+    monkeypatch.setattr(ns, "has_direct_modal_credentials", lambda: False)
+    monkeypatch.setattr(ns, "is_managed_tool_gateway_ready", lambda vendor: False)
+
+    features = ns.get_nous_subscription_features({}, force_fresh=True)
+
+    assert features.account_info is not None
+    assert features.account_info.paid_service_access is True
+    assert calls == [True]
+
+
 def test_get_nous_subscription_features_prefers_managed_modal_in_auto_mode(monkeypatch):
    monkeypatch.setattr("tools.tool_backend_helpers.managed_nous_tools_enabled", lambda: True)
    monkeypatch.setattr(ns, "get_env_value", lambda name: "")
-    monkeypatch.setattr(ns, "get_nous_auth_status", lambda: {"logged_in": True})
-    monkeypatch.setattr(ns, "managed_nous_tools_enabled", lambda: True)
+    monkeypatch.setattr(
+        ns, "get_nous_portal_account_info", lambda: _account(logged_in=True, paid=True)
+    )
    monkeypatch.setattr(ns, "_toolset_enabled", lambda config, key: key == "terminal")
    monkeypatch.setattr(ns, "_has_agent_browser", lambda: False)
    monkeypatch.setattr(ns, "resolve_openai_audio_api_key", lambda: "")
@@ -46,8 +80,9 @@ def test_get_nous_subscription_features_prefers_managed_modal_in_auto_mode(monke

 def test_get_nous_subscription_features_marks_browser_use_as_managed_when_gateway_ready(monkeypatch):
    monkeypatch.setattr(ns, "get_env_value", lambda name: "")
-    monkeypatch.setattr(ns, "get_nous_auth_status", lambda: {"logged_in": True})
-    monkeypatch.setattr(ns, "managed_nous_tools_enabled", lambda: True)
+    monkeypatch.setattr(
+        ns, "get_nous_portal_account_info", lambda: _account(logged_in=True, paid=True)
+    )
    monkeypatch.setattr(ns, "_toolset_enabled", lambda config, key: key == "browser")
    monkeypatch.setattr(ns, "_has_agent_browser", lambda: True)
    monkeypatch.setattr(ns, "resolve_openai_audio_api_key", lambda: "")
@@ -78,8 +113,9 @@ def test_get_nous_subscription_features_uses_direct_browserbase_when_no_managed_
    }

    monkeypatch.setattr(ns, "get_env_value", lambda name: env.get(name, ""))
-    monkeypatch.setattr(ns, "get_nous_auth_status", lambda: {"logged_in": True})
-    monkeypatch.setattr(ns, "managed_nous_tools_enabled", lambda: True)
+    monkeypatch.setattr(
+        ns, "get_nous_portal_account_info", lambda: _account(logged_in=True, paid=True)
+    )
    monkeypatch.setattr(ns, "_toolset_enabled", lambda config, key: key == "browser")
    monkeypatch.setattr(ns, "_has_agent_browser", lambda: True)
    monkeypatch.setattr(ns, "resolve_openai_audio_api_key", lambda: "")
@@ -103,8 +139,9 @@ def test_get_nous_subscription_features_prefers_camofox_over_managed_browser_use
    env = {"CAMOFOX_URL": "http://localhost:9377"}

    monkeypatch.setattr(ns, "get_env_value", lambda name: env.get(name, ""))
-    monkeypatch.setattr(ns, "get_nous_auth_status", lambda: {"logged_in": True})
-    monkeypatch.setattr(ns, "managed_nous_tools_enabled", lambda: True)
+    monkeypatch.setattr(
+        ns, "get_nous_portal_account_info", lambda: _account(logged_in=True, paid=True)
+    )
    monkeypatch.setattr(ns, "_toolset_enabled", lambda config, key: key == "browser")
    monkeypatch.setattr(ns, "_has_agent_browser", lambda: False)
    monkeypatch.setattr(ns, "resolve_openai_audio_api_key", lambda: "")
@@ -133,8 +170,9 @@ def test_get_nous_subscription_features_requires_agent_browser_for_browserbase(m
    }

    monkeypatch.setattr(ns, "get_env_value", lambda name: env.get(name, ""))
-    monkeypatch.setattr(ns, "get_nous_auth_status", lambda: {})
-    monkeypatch.setattr(ns, "managed_nous_tools_enabled", lambda: False)
+    monkeypatch.setattr(
+        ns, "get_nous_portal_account_info", lambda: _account(logged_in=False)
+    )
    monkeypatch.setattr(ns, "_toolset_enabled", lambda config, key: key == "browser")
    monkeypatch.setattr(ns, "_has_agent_browser", lambda: False)
    monkeypatch.setattr(ns, "resolve_openai_audio_api_key", lambda: "")
@@ -155,8 +193,9 @@ def test_get_nous_subscription_features_does_not_treat_quoted_false_as_gateway_o
    env = {"EXA_API_KEY": "exa-test"}

    monkeypatch.setattr(ns, "get_env_value", lambda name: env.get(name, ""))
-    monkeypatch.setattr(ns, "get_nous_auth_status", lambda: {"logged_in": True})
-    monkeypatch.setattr(ns, "managed_nous_tools_enabled", lambda: True)
+    monkeypatch.setattr(
+        ns, "get_nous_portal_account_info", lambda: _account(logged_in=True, paid=True)
+    )
    monkeypatch.setattr(ns, "_toolset_enabled", lambda config, key: key == "web")
    monkeypatch.setattr(ns, "_has_agent_browser", lambda: False)
    monkeypatch.setattr(ns, "resolve_openai_audio_api_key", lambda: "")
@@ -450,6 +450,122 @@ def test_xai_adapter_retry_refreshes_current_pool_entry(tmp_path, monkeypatch):
    assert retry.bearer == "new-access-token"


+def test_xai_adapter_retry_rotates_pool_entry_on_429(tmp_path, monkeypatch):
+    """429 from xAI must rotate to the next pool entry, not attempt refresh.
+
+    Pre-fix (#28932) ``get_retry_credential`` only fired on 401, so a 429
+    rate-limit response flowed back to the client unchanged AND the
+    rate-limited bearer stayed active for the next request — defeating
+    the whole point of pool rotation.
+
+    Post-fix: 429 lands on ``mark_exhausted_and_rotate`` (no refresh —
+    that's irrelevant for rate limits), stamps the 1-hour cooldown
+    via ``EXHAUSTED_TTL_429_SECONDS`` on the offending key, and
+    returns the next available credential.
+    """
+    monkeypatch.setenv("HERMES_HOME", str(tmp_path))
+
+    # Two pool entries so rotation has somewhere to go.
+    auth_path = tmp_path / "auth.json"
+    auth_path.write_text(json.dumps({
+        "version": 1,
+        "providers": {},
+        "credential_pool": {
+            "xai-oauth": [
+                {
+                    "id": "xai-first",
+                    "label": "xai-first",
+                    "auth_type": "oauth",
+                    "priority": 0,
+                    "source": "manual:xai_pkce",
+                    "access_token": "first-access-token",
+                    "refresh_token": "first-refresh-token",
+                    "base_url": "https://api.x.ai/v1",
+                },
+                {
+                    "id": "xai-second",
+                    "label": "xai-second",
+                    "auth_type": "oauth",
+                    "priority": 1,
+                    "source": "manual:xai_pkce",
+                    "access_token": "second-access-token",
+                    "refresh_token": "second-refresh-token",
+                    "base_url": "https://api.x.ai/v1",
+                },
+            ]
+        },
+    }))
+
+    # Refresh must NOT be called on the 429 path — guard against
+    # the fix accidentally trying to refresh-on-rate-limit.
+    def _refresh_must_not_run(*args, **kwargs):
+        raise AssertionError("refresh_xai_oauth_pure must not run on 429")
+
+    monkeypatch.setattr("hermes_cli.auth.refresh_xai_oauth_pure", _refresh_must_not_run)
+
+    adapter = XAIGrokAdapter()
+    failed = adapter.get_credential()
+    assert failed.bearer == "first-access-token", "starting bearer should be the first entry"
+
+    retry = adapter.get_retry_credential(
+        failed_credential=failed,
+        status_code=429,
+    )
+
+    assert retry is not None, "429 must rotate to next pool entry"
+    assert retry.bearer == "second-access-token", (
+        f"expected rotation to second entry, got {retry.bearer!r}"
+    )
+
+
+def test_xai_adapter_retry_returns_none_on_429_when_pool_exhausted(tmp_path, monkeypatch):
+    """Single-entry pool: 429 has nowhere to rotate to → return None
+    so the 429 flows back to the client unchanged (existing behavior
+    preserved)."""
+    monkeypatch.setenv("HERMES_HOME", str(tmp_path))
+    _write_xai_pool_entry(tmp_path)  # single entry
+
+    def _refresh_must_not_run(*args, **kwargs):
+        raise AssertionError("refresh_xai_oauth_pure must not run on 429")
+
+    monkeypatch.setattr("hermes_cli.auth.refresh_xai_oauth_pure", _refresh_must_not_run)
+
+    adapter = XAIGrokAdapter()
+    failed = adapter.get_credential()
+    retry = adapter.get_retry_credential(
+        failed_credential=failed,
+        status_code=429,
+    )
+
+    assert retry is None, (
+        "single-entry pool: 429 must return None so the response "
+        "flows back to the client unchanged"
+    )
+
+
+def test_xai_adapter_retry_returns_none_for_unrelated_status(tmp_path, monkeypatch):
+    """Non-{401, 429} statuses must NOT trigger any retry — pool
+    untouched, no refresh attempted, return None immediately."""
+    monkeypatch.setenv("HERMES_HOME", str(tmp_path))
+    _write_xai_pool_entry(tmp_path)
+
+    def _refresh_must_not_run(*args, **kwargs):
+        raise AssertionError("refresh_xai_oauth_pure must not run on non-retry status")
+
+    monkeypatch.setattr("hermes_cli.auth.refresh_xai_oauth_pure", _refresh_must_not_run)
+
+    adapter = XAIGrokAdapter()
+    failed = adapter.get_credential()
+    for status in (200, 400, 403, 500, 502, 503):
+        retry = adapter.get_retry_credential(
+            failed_credential=failed,
+            status_code=status,
+        )
+        assert retry is None, (
+            f"status {status} must not trigger retry, got {retry!r}"
+        )
+
+
 # ---------------------------------------------------------------------------
 # Server: path filtering + forwarding
 #
@@ -0,0 +1,126 @@
+"""Regression tests for the Android psutil compatibility installer."""
+
+from __future__ import annotations
+
+import io
+import shutil
+import tarfile
+from pathlib import Path
+from unittest.mock import patch
+
+import pytest
+
+from hermes_cli.psutil_android import (
+    MARKER,
+    REPLACEMENT,
+    PSUTIL_URL,
+    PsutilAndroidInstallError,
+    prepare_patched_psutil_sdist,
+)
+
+
+def _add_dir(tf: tarfile.TarFile, name: str) -> None:
+    info = tarfile.TarInfo(name)
+    info.type = tarfile.DIRTYPE
+    info.mode = 0o755
+    tf.addfile(info)
+
+
+def _add_file(tf: tarfile.TarFile, name: str, content: str) -> None:
+    payload = content.encode("utf-8")
+    info = tarfile.TarInfo(name)
+    info.size = len(payload)
+    info.mode = 0o644
+    tf.addfile(info, io.BytesIO(payload))
+
+
+def _build_psutil_archive(archive: Path, *, malicious_symlink: bool) -> None:
+    with tarfile.open(archive, "w:gz") as tf:
+        _add_dir(tf, "psutil-7.2.2")
+        if malicious_symlink:
+            link = tarfile.TarInfo("psutil-7.2.2/psutil")
+            link.type = tarfile.SYMTYPE
+            link.linkname = "../../outside"
+            tf.addfile(link)
+        else:
+            _add_dir(tf, "psutil-7.2.2/psutil")
+        _add_file(
+            tf,
+            "psutil-7.2.2/psutil/_common.py",
+            f"{MARKER}\n",
+        )
+
+
+def test_prepare_patched_psutil_sdist_rejects_symlink_member(tmp_path):
+    """A symlink member must be rejected before any file payload is written."""
+    archive = tmp_path / "evil.tar.gz"
+    _build_psutil_archive(archive, malicious_symlink=True)
+
+    destination = tmp_path / "extract"
+    with pytest.raises(PsutilAndroidInstallError, match="Unsupported archive member type"):
+        prepare_patched_psutil_sdist(archive, destination)
+
+    assert not (tmp_path / "outside" / "_common.py").exists()
+
+
+def test_install_psutil_android_compat_uses_patched_tree(tmp_path):
+    """Updater path should install from the patched temporary sdist tree."""
+    archive = tmp_path / "psutil.tar.gz"
+    _build_psutil_archive(archive, malicious_symlink=False)
+
+    from hermes_cli import main as hermes_main
+
+    captured: dict[str, object] = {}
+
+    def fake_urlretrieve(url: str, dest: Path):
+        assert url == PSUTIL_URL
+        shutil.copyfile(archive, dest)
+        return str(dest), None
+
+    def fake_run_install(cmd: list[str], *, env=None):
+        src_root = Path(cmd[-1])
+        captured["cmd"] = cmd
+        captured["env"] = env
+        captured["common_py"] = (src_root / "psutil" / "_common.py").read_text(
+            encoding="utf-8"
+        )
+
+    with patch("urllib.request.urlretrieve", side_effect=fake_urlretrieve), \
+         patch.object(hermes_main, "_run_install_with_heartbeat", side_effect=fake_run_install):
+        hermes_main._install_psutil_android_compat(
+            ["uv", "pip"],
+            env={"HERMES_TEST": "1"},
+        )
+
+    assert captured["cmd"][:4] == ["uv", "pip", "install", "--no-build-isolation"]
+    assert captured["env"] == {"HERMES_TEST": "1"}
+    assert REPLACEMENT in str(captured["common_py"])
+
+
+def test_install_psutil_android_script_uses_patched_tree(tmp_path, monkeypatch, capsys):
+    """Standalone installer script should reuse the same safe patched tree."""
+    archive = tmp_path / "psutil.tar.gz"
+    _build_psutil_archive(archive, malicious_symlink=False)
+
+    import scripts.install_psutil_android as installer
+
+    def fake_urlretrieve(url: str, dest: Path):
+        assert url == PSUTIL_URL
+        shutil.copyfile(archive, dest)
+        return str(dest), None
+
+    def fake_subprocess_run(cmd: list[str]):
+        src_root = Path(cmd[-1])
+        patched = (src_root / "psutil" / "_common.py").read_text(encoding="utf-8")
+        assert REPLACEMENT in patched
+        return type("RunResult", (), {"returncode": 0})()
+
+    monkeypatch.setattr(installer.sys, "argv", ["install_psutil_android.py"])
+    monkeypatch.setattr(installer, "_resolve_install_cmd", lambda *_args: ["python", "-m", "pip"])
+
+    with patch("urllib.request.urlretrieve", side_effect=fake_urlretrieve), \
+         patch.object(installer.subprocess, "run", side_effect=fake_subprocess_run):
+        assert installer.main() == 0
+
+    captured = capsys.readouterr()
+    assert "psutil installed via Android compatibility shim" in captured.out
@@ -0,0 +1,67 @@
+"""Coverage for _run_with_idle_timeout — the streaming subprocess helper.
+
+Kept in a dedicated test file because the tests spawn real ``subprocess.Popen``
+instances; pytest-isolate runs each test file in its own worker process, so
+isolating these here prevents real-Popen state from racing with the
+``subprocess.run`` / ``_run_with_idle_timeout`` patches used by
+``test_web_ui_build.py``.
+
+Added for issue #33788: ``hermes update`` got stuck at "webui-build" because
+``npm run build`` ran with ``capture_output=True`` and no timeout. The helper
+fixes both halves — streams output AND idle-kills the process.
+"""
+
+import sys as _sys
+import time
+
+from hermes_cli.main import _run_with_idle_timeout
+
+
+def test_streams_output_and_returns_zero_on_success(tmp_path):
+    script = tmp_path / "ok.py"
+    script.write_text("print('line one'); print('line two')\n")
+    result = _run_with_idle_timeout(
+        [_sys.executable, str(script)], cwd=tmp_path, idle_timeout_seconds=10
+    )
+    assert result.returncode == 0
+    assert "line one" in result.stdout
+    assert "line two" in result.stdout
+
+
+def test_propagates_nonzero_exit(tmp_path):
+    script = tmp_path / "fail.py"
+    script.write_text("import sys; print('boom', file=sys.stderr); sys.exit(7)\n")
+    result = _run_with_idle_timeout(
+        [_sys.executable, str(script)], cwd=tmp_path, idle_timeout_seconds=10
+    )
+    assert result.returncode == 7
+    # stderr is merged into stdout in the helper.
+    assert "boom" in result.stdout
+
+
+def test_kills_process_on_idle_timeout(tmp_path):
+    # Sleeps without printing — exactly the failure mode users see when
+    # `npm run build` stalls. Idle timeout must terminate it.
+    script = tmp_path / "stall.py"
+    script.write_text("import time; time.sleep(30)\n")
+
+    start = time.monotonic()
+    result = _run_with_idle_timeout(
+        [_sys.executable, str(script)],
+        cwd=tmp_path,
+        idle_timeout_seconds=1,
+    )
+    elapsed = time.monotonic() - start
+    # Should have died well before the 30s sleep completes.
+    assert elapsed < 15
+    assert result.returncode != 0
+    assert "produced no output" in result.stdout
+
+
+def test_returns_127_when_binary_missing(tmp_path):
+    result = _run_with_idle_timeout(
+        ["/nonexistent/binary/does/not/exist"],
+        cwd=tmp_path,
+        idle_timeout_seconds=5,
+    )
+    assert result.returncode == 127
@@ -0,0 +1,230 @@
+"""Regression test for #28181 — kanban worker SIGTERM must terminate the process.
+
+The single-query signal handler in cli.py (``_signal_handler_q``) raises
+``KeyboardInterrupt`` to unwind the main thread on SIGTERM/SIGHUP. That works
+for interactive ``hermes chat -q`` invocations, but kanban workers spawned by
+the dispatcher are likely to have a non-daemon thread alive (terminal_tool's
+``_wait_for_process``, custom plugin background workers, etc.). With
+``KeyboardInterrupt`` only the main thread unwinds; the non-daemon thread
+keeps the process alive after the gateway has already restarted, the kanban
+dispatcher's ``_pid_alive`` check returns True forever, and the task stays
+``running`` indefinitely.
+
+The fix: when the process is a dispatcher-spawned worker (``HERMES_KANBAN_TASK``
+env var set), flush logging + stdout/stderr and call ``os._exit(0)`` instead.
+The kernel reclaims the PID immediately, and ``detect_crashed_workers``
+reclaims the stale claim on the next dispatcher tick.
+
+These tests use a synthetic Python script that mirrors the cli.py signal
+handler shape so we can exercise the exit-path contract without booting the
+full CLI (which needs a real provider config).
+"""
+from __future__ import annotations
+
+import os
+import signal
+import subprocess
+import sys
+import textwrap
+import time
+
+import pytest
+
+
+def _synthetic_worker_script() -> str:
+    """A standalone script that mirrors cli.py's single-query SIGTERM handler.
+
+    Keeping the synthetic copy here means the test exercises the exact handler
+    shape without needing the full hermes_cli boot path (config, providers,
+    skills, etc.). If the production handler in cli.py drifts, the test
+    that loads the real handler (test_real_handler_uses_os_exit) will catch it.
+    """
+    return textwrap.dedent(
+        """
+        import os, signal, sys, threading, time
+
+        # Non-daemon thread that blocks forever — simulates the worker
+        # thread that would prevent orderly Python shutdown after
+        # KeyboardInterrupt unwinds main.
+        stuck = threading.Event()
+        threading.Thread(target=stuck.wait, daemon=False).start()
+
+        def handler(signum, frame):
+            # Mirrors cli.py:_signal_handler_q. Real handler sleeps 1.5s; the
+            # test uses a short grace so it runs fast.
+            try:
+                time.sleep(0.05)
+            except Exception:
+                pass
+            if os.environ.get("HERMES_KANBAN_TASK"):
+                try:
+                    if hasattr(signal, "SIGALRM"):
+                        signal.signal(signal.SIGALRM, lambda *_: os._exit(0))
+                        signal.alarm(2)
+                except Exception:
+                    pass
+                sys.stdout.flush()
+                sys.stderr.flush()
+                os._exit(0)
+            raise KeyboardInterrupt()
+
+        signal.signal(signal.SIGTERM, handler)
+        print("READY", flush=True)
+        try:
+            threading.Event().wait()
+        except KeyboardInterrupt:
+            sys.exit(0)
+        """
+    )
+
+
+def _is_alive_like_dispatcher(pid: int) -> bool:
+    """Mirrors hermes_cli/kanban_db.py:_pid_alive on Linux.
+
+    A zombie is treated as dead — the dispatcher's _pid_alive checks
+    /proc/<pid>/status for State: Z. We replicate that here so a clean
+    os._exit followed by zombie-state is correctly counted as dead.
+    """
+    if pid <= 0:
+        return False
+    try:
+        os.kill(pid, 0)
+    except ProcessLookupError:
+        return False
+    except PermissionError:
+        return True
+    if sys.platform == "linux":
+        try:
+            with open(f"/proc/{pid}/status") as f:
+                for line in f:
+                    if line.startswith("State:"):
+                        if "Z" in line.split(":", 1)[1]:
+                            return False
+                        break
+        except (FileNotFoundError, PermissionError, OSError):
+            pass
+    return True
+
+
+def _spawn_synthetic(env_overrides: dict) -> subprocess.Popen:
+    env = dict(os.environ)
+    env.update(env_overrides)
+    proc = subprocess.Popen(
+        [sys.executable, "-u", "-c", _synthetic_worker_script()],
+        env=env,
+        stdout=subprocess.PIPE,
+        stderr=subprocess.PIPE,
+        start_new_session=True,
+    )
+    # Wait for "READY" so we know the signal handler is installed.
+    assert proc.stdout is not None
+    deadline = time.time() + 5.0
+    while time.time() < deadline:
+        line = proc.stdout.readline()
+        if line and line.startswith(b"READY"):
+            return proc
+    proc.kill()
+    raise RuntimeError("synthetic worker never signalled READY")
+
+
+def _cleanup(proc: subprocess.Popen) -> None:
+    try:
+        os.killpg(os.getpgid(proc.pid), signal.SIGKILL)
+    except (ProcessLookupError, PermissionError):
+        pass
+    try:
+        proc.communicate(timeout=2)
+    except subprocess.TimeoutExpired:
+        proc.kill()
+
+
+@pytest.mark.skipif(
+    sys.platform == "win32",
+    reason="SIGTERM semantics differ on Windows; kanban dispatcher is POSIX-only",
+)
+def test_sigterm_with_kanban_task_env_terminates_quickly():
+    """With HERMES_KANBAN_TASK set, SIGTERM should kill the process in <2s
+    even when a non-daemon thread is still alive."""
+    proc = _spawn_synthetic({"HERMES_KANBAN_TASK": "t_test_28181"})
+    try:
+        t0 = time.time()
+        os.kill(proc.pid, signal.SIGTERM)
+
+        # Should die in <2s. The handler sleeps ~50ms, then os._exit(0)
+        # is immediate. Give generous headroom for slow CI runners.
+        deadline = t0 + 2.0
+        while time.time() < deadline:
+            if not _is_alive_like_dispatcher(proc.pid):
+                elapsed = time.time() - t0
+                assert elapsed < 2.0
+                return
+            time.sleep(0.02)
+        pytest.fail(
+            f"process still alive 2s after SIGTERM with HERMES_KANBAN_TASK set "
+            f"(dispatcher would keep extending claim) — fix regressed"
+        )
+    finally:
+        _cleanup(proc)
+
+
+@pytest.mark.skipif(
+    sys.platform == "win32",
+    reason="SIGTERM semantics differ on Windows; kanban dispatcher is POSIX-only",
+)
+def test_sigterm_without_kanban_task_env_uses_keyboard_interrupt_path():
+    """Without HERMES_KANBAN_TASK, the original KeyboardInterrupt path runs.
+
+    This is the contrast case proving the fix is gated on the env var: in
+    interactive ``hermes chat -q`` (no env var), behavior is unchanged. The
+    process MAY hang under non-daemon threads, but that's not a kanban-worker
+    concern. We just verify the handler logs the KeyboardInterrupt branch
+    rather than os._exit'ing.
+    """
+    proc = _spawn_synthetic({})
+    try:
+        os.kill(proc.pid, signal.SIGTERM)
+        # Wait a moment for the handler to react.
+        time.sleep(0.5)
+        # The process may or may not be dead depending on whether the
+        # KeyboardInterrupt unwinds cleanly. The behavioral guarantee is
+        # only that the env-gated path didn't fire.
+        try:
+            # Drain stdout up to whatever's available.
+            if proc.stdout is not None:
+                proc.stdout.close()
+            if proc.stderr is not None:
+                proc.stderr.close()
+        except Exception:
+            pass
+    finally:
+        _cleanup(proc)
+
+
+def test_real_handler_uses_os_exit_for_kanban_workers():
+    """Source-level invariant: cli.py's _signal_handler_q must call
+    os._exit(0) when HERMES_KANBAN_TASK is set.
+
+    Catches the case where someone refactors the handler and accidentally
+    drops the env-gated exit, restoring the bug. Reading cli.py directly is
+    cheap and avoids the heavy CLI import.
+    """
+    import pathlib
+
+    cli_path = (
+        pathlib.Path(__file__).resolve().parent.parent.parent / "cli.py"
+    )
+    src = cli_path.read_text()
+    # Locate the handler body.
+    start = src.find("def _signal_handler_q(signum, frame):")
+    assert start != -1, "cli.py is missing _signal_handler_q"
+    # Look ahead for the env-gated os._exit call within ~80 lines.
+    body = src[start : start + 4000]
+    assert "HERMES_KANBAN_TASK" in body, (
+        "_signal_handler_q must gate its kanban-worker exit path on "
+        "HERMES_KANBAN_TASK — see #28181"
+    )
+    assert "os._exit(0)" in body, (
+        "_signal_handler_q must call os._exit(0) for kanban workers — "
+        "raising KeyboardInterrupt orphans the process when non-daemon "
+        "threads are alive (see #28181)"
+    )
@@ -651,3 +651,95 @@ def test_browse_skills_dedup_uses_identifier_not_name(monkeypatch):
        "browse_skills() must not deduplicate browse-sh skills with the same name "
        "but different identifiers"
    )
+
+
+# ---------------------------------------------------------------------------
+# Regression: full identifier must be recoverable from `hermes skills search`
+# even when the slug is too long to fit the terminal width (issue #33674).
+# ---------------------------------------------------------------------------
+
+# A real browse-sh-style slug whose trailing -XXXXXX hash matters for install
+_LONG_SLUG = "browse-sh/weather.gov/get-forecast-1uezib"
+
+_LONG_RESULT = type("R", (), {
+    "name": "get-forecast",
+    "description": "Fetch the forecast",
+    "source": "browse-sh",
+    "trust_level": "community",
+    "identifier": _LONG_SLUG,
+})()
+
+
+def test_do_search_identifier_column_does_not_truncate_long_slug():
+    """The Identifier column must use overflow='fold', not the default ellipsis.
+
+    Renders into a deliberately narrow Console; the full slug (including the
+    trailing -1uezib hash) must still appear in the output. Before the fix,
+    Rich would render `browse-sh/weather…` and lose the hash.
+    """
+    from hermes_cli.skills_hub import do_search
+
+    sink = StringIO()
+    # Narrow width forces Rich to apply overflow rules — exactly the scenario
+    # the issue reports. width=40 is too small for the slug; we want the slug
+    # wrapped (not ellipsis-truncated).
+    console = Console(file=sink, force_terminal=False, color_system=None, width=40)
+
+    with patch("tools.skills_hub.unified_search", return_value=[_LONG_RESULT]), \
+         patch("tools.skills_hub.create_source_router", return_value={}), \
+         patch("tools.skills_hub.GitHubAuth"):
+        do_search("weather", console=console)
+
+    output = sink.getvalue()
+
+    # The fix is working when the Identifier column wraps the slug across
+    # multiple lines (folded chunks) rather than emitting ONE line with an
+    # ellipsis. Extract every chunk that appears in the rightmost cell of
+    # the table by walking lines that look like table rows ("│ ... │") and
+    # taking the last `│...│` cell. Concatenating those chunks must yield
+    # the full slug.
+    chunks = []
+    for line in output.splitlines():
+        # Table data rows start and end with the box-drawing vertical bar.
+        if not line.startswith("│") or not line.rstrip().endswith("│"):
+            continue
+        # Last `│ ... │` cell on the row is the Identifier column.
+        last_cell = line.rstrip().rsplit("│", 2)[-2].strip()
+        if last_cell:
+            chunks.append(last_cell)
+    reconstructed = "".join(chunks)
+    assert _LONG_SLUG in reconstructed, (
+        f"Expected full slug {_LONG_SLUG!r} to be recoverable from the "
+        f"folded Identifier column; got chunks {chunks!r}\n"
+        f"Full output:\n{output}"
+    )
+    # And the truncating ellipsis must NOT appear in the Identifier column.
+    # Rich uses U+2026 HORIZONTAL ELLIPSIS for the default overflow="ellipsis".
+    assert "\u2026" not in reconstructed, (
+        f"Identifier column still ellipsis-truncated: {reconstructed!r}"
+    )
+
+
+def test_do_search_json_flag_emits_full_identifiers(capsys):
+    """`--json` must print a parseable array with full identifiers and skip the table."""
+    from hermes_cli.skills_hub import do_search
+
+    sink = StringIO()
+    console = Console(file=sink, force_terminal=False, color_system=None, width=40)
+
+    with patch("tools.skills_hub.unified_search", return_value=[_LONG_RESULT]), \
+         patch("tools.skills_hub.create_source_router", return_value={}), \
+         patch("tools.skills_hub.GitHubAuth"):
+        do_search("weather", console=console, as_json=True)
+
+    # JSON goes to stdout via print(), not the Rich console sink.
+    captured = capsys.readouterr().out
+    import json as _json
+    payload = _json.loads(captured)
+    assert isinstance(payload, list) and len(payload) == 1
+    assert payload[0]["identifier"] == _LONG_SLUG
+    assert payload[0]["name"] == "get-forecast"
+    assert payload[0]["source"] == "browse-sh"
+    # Table render must be suppressed — sink should be empty (no "Searching for:" header).
+    assert "Searching for:" not in sink.getvalue()
+
@@ -83,6 +83,56 @@ def test_show_status_reports_nous_auth_error(monkeypatch, capsys, tmp_path):
    assert "Key exp:" in output


+def test_show_status_reports_nous_inference_key_without_portal_login(monkeypatch, capsys, tmp_path):
+    from hermes_cli import status as status_mod
+    from hermes_cli.nous_account import NousPortalAccountInfo
+    import hermes_cli.auth as auth_mod
+    import hermes_cli.gateway as gateway_mod
+
+    monkeypatch.setattr(status_mod, "get_env_path", lambda: tmp_path / ".env", raising=False)
+    monkeypatch.setattr(status_mod, "get_hermes_home", lambda: tmp_path, raising=False)
+    monkeypatch.setattr(status_mod, "load_config", lambda: {"model": "gpt-5.4"}, raising=False)
+    monkeypatch.setattr(status_mod, "resolve_requested_provider", lambda requested=None: "openai-codex", raising=False)
+    monkeypatch.setattr(status_mod, "resolve_provider", lambda requested=None, **kwargs: "openai-codex", raising=False)
+    monkeypatch.setattr(status_mod, "provider_label", lambda provider: "OpenAI Codex", raising=False)
+    monkeypatch.setattr(
+        auth_mod,
+        "get_nous_auth_status",
+        lambda: {
+            "logged_in": False,
+            "inference_credential_present": True,
+            "credential_source": "pool:manual opaque key",
+            "inference_base_url": "https://inference.example.com/v1",
+            "agent_key_expires_at": "2099-01-01T00:00:00+00:00",
+        },
+        raising=False,
+    )
+    monkeypatch.setattr(
+        status_mod,
+        "get_nous_portal_account_info",
+        lambda: NousPortalAccountInfo(
+            logged_in=False,
+            source="inference_key",
+            fresh=False,
+            inference_credential_present=True,
+            inference_base_url="https://inference.example.com/v1",
+        ),
+        raising=False,
+    )
+    monkeypatch.setattr(status_mod, "managed_nous_tools_enabled", lambda: False, raising=False)
+    monkeypatch.setattr(auth_mod, "get_codex_auth_status", lambda: {}, raising=False)
+    monkeypatch.setattr(auth_mod, "get_qwen_auth_status", lambda: {}, raising=False)
+    monkeypatch.setattr(auth_mod, "get_xai_oauth_auth_status", lambda: {}, raising=False)
+    monkeypatch.setattr(gateway_mod, "find_gateway_pids", lambda exclude_pids=None: [], raising=False)
+
+    status_mod.show_status(SimpleNamespace(all=False, deep=False))
+
+    output = capsys.readouterr().out
+    assert "Nous Portal   ✗ not logged in (Nous inference key configured)" in output
+    assert "Inference:  https://inference.example.com/v1" in output
+    assert "Nous inference credentials are configured" in output
+
+
 # ---------------------------------------------------------------------------
 # Helpers shared by xAI OAuth status tests
 # ---------------------------------------------------------------------------
@@ -2,6 +2,7 @@

 from types import SimpleNamespace

+from hermes_cli.nous_account import NousPaidServiceAccessInfo, NousPortalAccountInfo
 from hermes_cli.nous_subscription import NousFeatureState, NousSubscriptionFeatures


@@ -124,6 +125,59 @@ def test_show_status_hides_nous_subscription_section_when_feature_flag_is_off(mo
    assert "Nous Tool Gateway" not in out


+def test_show_status_reports_exhausted_nous_credits(monkeypatch, capsys, tmp_path):
+    monkeypatch.setattr("hermes_cli.status.managed_nous_tools_enabled", lambda: False)
+    from hermes_cli import status as status_mod
+    import hermes_cli.auth as auth_mod
+
+    _patch_common_status_deps(monkeypatch, status_mod, tmp_path)
+    monkeypatch.setattr(
+        auth_mod,
+        "get_nous_auth_status",
+        lambda: {
+            "logged_in": False,
+            "access_token": "jwt",
+            "portal_base_url": "https://portal.example.test",
+            "error": "credits exhausted",
+            "error_code": "insufficient_credits",
+        },
+        raising=False,
+    )
+    monkeypatch.setattr(
+        status_mod,
+        "get_nous_portal_account_info",
+        lambda: NousPortalAccountInfo(
+            logged_in=True,
+            source="account_api",
+            fresh=True,
+            paid_service_access=False,
+            portal_base_url="https://portal.example.test",
+            paid_service_access_info=NousPaidServiceAccessInfo(
+                allowed=False,
+                reason="no_usable_credits",
+                has_active_subscription=True,
+                active_subscription_is_paid=True,
+                subscription_credits_remaining=0,
+                purchased_credits_remaining=0,
+                total_usable_credits=0,
+            ),
+        ),
+        raising=False,
+    )
+    monkeypatch.setattr(status_mod, "load_config", lambda: {"model": {"provider": "nous"}}, raising=False)
+    monkeypatch.setattr(status_mod, "resolve_requested_provider", lambda requested=None: "nous", raising=False)
+    monkeypatch.setattr(status_mod, "resolve_provider", lambda requested=None, **kwargs: "nous", raising=False)
+    monkeypatch.setattr(status_mod, "provider_label", lambda provider: "Nous Portal", raising=False)
+
+    status_mod.show_status(SimpleNamespace(all=False, deep=False))
+
+    out = capsys.readouterr().out
+    assert "Nous Tool Gateway" in out
+    assert "credits are exhausted" in out
+    assert "https://portal.example.test/billing" in out
+    assert "free-tier Nous account" not in out
+
+
 def test_show_status_reports_empty_lmstudio_listing_as_reachable(monkeypatch, capsys, tmp_path):
    from hermes_cli import status as status_mod

--- a/Show More
+++ b/Show More